Date of Original Version

2009

Type

Book Chapter

Abstract or Table of Contents

The current paper examines a particular approach to morphosyntactic analysis that has been elaborated in the context of the CHILDES (Child Language Data Exchange System) database. Readers unfamiliar with this database and its role in child language acquisition research may find it useful to download and study the materials (manuals, programs, and database) that are available for free over the web at http://childes.psy.cmu.edu. However, before doing this, users should read the "Ground Rules" for proper usage of the system. This database now contains over 44 million spoken words from 28 different languages. In fact, CHILDES is the largest corpus of conversational spoken language data currently in existence. In terms of size, the next largest collection of conversational data is the British National Corpus with 5 million words. What makes CHILDES a single corpus is the fact that all of the data in the system are consistently coded using a single transcript format called CHAT. Moreover, for several languages, all of the corpora have been tagged for part of speech using an automatic tagging program called MOR.

Comments

To appear in In Behrens, H. Trends in corpus research: Finding structure in data. Amsterdam: Benjamins.

Share

COinS