Cognitive Science
Jamie Klein

Role of Child-Directed Speech Constructions in Early Language

An Analysis of the CHILDES Corpus

  • Faculty Advisor

Published On

July 2018

Originally Published

NURJ 2017-18
Honors Thesis


The theory of Construction Grammar in linguistics –– that form and meaning pairings are the structural basis of language –– is increasingly discussed in the academic community. One of its main tenets include the notion that syntactic constructions provide semantic information,  which is meaning that is generalized from the particular verbs with which the structures are commonly associated. In order to investigate the explanatory power of the theory, this research project focused on methods for annotating, organizing, and analyzing large samples of child-directed speech for such constructions. Utilizing two resources, the CHILDES corpus and the FrameNet project, over 10,000 lines of transcribed child-directed speech were analyzed and coded for target verbs appearing in ditransitive, caused-motion, and intransitive motion constructions. Results of the corpus analysis show that the caused-motion and intransitive motion constructions do, in fact, exhibit higher frequency of certain verbs, as expected under a constructionist account.


The study of language acquisition and development is central to cognitive science. Over the past several decades, many theories have attempted to explain how children grasp something as complex as language skills. Chomsky (1968) and others have hypothesized that, given the complex and highly abstract principles of grammar, it must be the case that humans are born with an innate universal grammar. In contrast, usage-based theorists assert that humans have the ability, through powerful learning and development mechanisms, to gain language and communication skills (Tomasello, 2009). Proponents of this view argue that these general learning mechanisms are beneficial in the process of acquiring language. Because humans are not equipped with any innate language tools, they instead utilize their experience communicating to learn and perfect the craft.

With this theory in mind, I conducted research regarding the patterns of speech that occur when adults speak directly to very young children. This research project attempts to uncover information about constructions, form and meaning pairings that are theorized to be the basis of all language, and their potential ability to aid in child language learning (in the absence of a universal grammar) (Goldberg, 1995).

Review of Literature

Considerable attention has been devoted to studying child language acquisition in the fields of cognitive science, linguistics, and developmental psychology. Beginning in the 1960s and through the mid-1980s, traditional approaches to studying language-learning were generativist, and proposed a universal grammar and a strict distinction between lexical and syntactic representations (Chomsky, 1981). Theories of this nature stressed the seemingly impossible task of acquiring not only the necessary vocabulary to communicate, but also the extremely complex lattice of grammatical structures necessary to develop language skills (Chomsky, 1968). Traditional views like these are still held in the field, but other theorists have hypothesized alternative usage-based theories of language acquisition. Tomasello, Goldberg, and many others have posited that children’s language-learning abilities are strong enough that acquisition can occur simply through learning patterns in the grammar.

Contemporary grammarians, studying sets of simple sentences, have found that certain constructions exist systematically, independent of the specific verbs and words included in given sentences. Thus, the constructionist theory of linguistics states that constructions themselves carry meaning, beyond the meaning of component words of the sentence, and that children can utilize these forms to learn meaning. “Constructions” have since been defined as a syntactic structure whose composition (form and/or meaning) does not originate from other existing constructions in the language. The formal definition is as follows:

“Sentence C is a construction iffdef C is a form-meaning pair <Fi, Si> such that some aspect of Fi or some aspect of Si is not strictly predictable from C’s component parts or from other previously established constructions” (Goldberg, p. 4, 1995).

Even traditionally, constructions in grammar were often referenced and utilized, not as a meaningful principle of language itself, but rather, as a byproduct of the language process. Generalizations across linguistic patterns were said to have come from other, more general principles (Chomsky, 1957). The Construction Grammar approach argues that constructions are themselves theoretical (and meaningful) entities, and thus deserve to be studied as such (Fillmore, Kaye, and O’Conner, 1988).

This paper is specifically focused on argument structure constructions, a specific type of construction (Goldberg, 1995). The English argument structure constructions include the ditransitive, caused-motion, intransitive motion, conative, resultative, and the “way” construction, to name a few. The properties of several of these argument structure constructions are as follows:

Ditransitive: “X causes Y to receive Z” - Subject Verb Object 1 Object 2

Caused Motion: “X causes Y to move to Z” - Subject Verb Object Oblique

Resultative: “X causes Y to become Z” - Subject Verb Object Clausal Comp

Intrans. Motion: “X moves to Y” - Subject Verb Oblique

Conative: “X directs action at Y” - Subject Verb Oblique[at]

These varying syntactic structures, some of which will be explicated further below, can each be paired with the same verb, yet convey different meanings. This demonstrates that particular constructions express systematic semantic differences (Goldberg, 1995). Therefore, the collection of constructions in the language can be seen as a highly structured framework of interrelated linguistic information, from which learners of the language can draw.

Each argument structure construction and its features will be explained briefly, in order to aid in understanding the importance of the theory. The ditransitive construction evokes a central sense of “transfer between a volitional agent and a willing recipient” (Goldberg, p. 141, 1995). Regardless of the particular lexical items identified in particular sentences, the semantic structure of the ditransitive construction is consistently related to “transfer.” A second construction that carries its own semantic sense is the caused-motion structure. This structure generally evokes a subject directly causing an object to move in a direction designated by an oblique phrase. In the case of the ditransitive, various extensions beyond this basic format exist for the caused-motion construction. However, all forms still maintain their semantic sense when the component lexical items shift, or seemingly break from the structural form into idiosyncrasy. The final common construction used in this analysis is the conative construction. This construction expresses a directed action along a path. In the conative case, the action must only be intended, not necessarily completed (Goldberg, 1995). This distinction is represented lexically by the oblique in the (Subject Verb Oblique) structure, as conative constructions always consist of the prepositional “at” in that position (Goldberg, 1995).

The current hypothesis in the field is that the high frequency of particular verbs in certain constructions helps children attend to the correlation between the meaning of that certain verb in the structural pattern, and the structure itself (Goldberg, Casenhiser, and Sethuraman, 2004). Additionally, past research suggests that pronoun frequency in child-directed speech significantly aids children’s acquisition. Therefore, it will be interesting to measure in the current study as an indicator of symbolic use of language-utilizing joint attention (Tomasello, 2009).

Overall, children’s general powerful cognitive mechanisms, such as processing massive amounts of input, and joint attention abilities, coupled with construction semantics, summarize the construction grammar theory with regard to child language acquisition.


In order to investigate the Construction Grammar theory regarding constructions’ frequency and meaning, I conducted a corpus analysis of the CHILDES database from Carnegie Mellon University. The database is composed of over 130 different corpora of child-adult speech data (MacWhinney, 1996). Approximately 28 separate research teams have contributed transcript data to the English CHILDES corpus. Each team’s contribution is browsable, and is easily viewed online. For the current study, I opted to use the work of Roger Brown and his students, who contributed English conversational data of three North American children, Adam, Eve, and Sarah, and of their respective mothers. The advantage of the Brown corpus is that, in addition to being analyzed by the CHILDES CHAT system, it has been tagged for parts of speech. I annotated and analyzed a total of 10,000 lines of child-parent conversations from one transcript of both the “Adam” and “Eve” samples.

The design of the annotation methodology began with the selection of particular verbs to record. Previous analysis by Alishahi and Stevenson (2008) found that the most frequently occurring relevant verbs in Brown’s corpus were: go, put, get, make, look, take, play, come, eat, fall, sit, see, and give. In order to test the “learning from input” argument, I used these frequently occurring words as the target verbs. The analysis of the Adam and Eve transcripts in the CHILDES corpus was conducted by searching for these target words in child-directed speech. When found, the features of the sentences from which the verbs originated were then annotated.

The format of each annotation included the:

  • exact text of the target sentence –– verbatim copy of sentence text
  • target verb observed –– recording which of the thirteen target verbs was identified in the sentence
  • FrameNet frame for that verb –– searching the Berkeley FrameNet project Lexical Unit Index; the appropriate frame for target verb was identified
  • part of speech tag for the sentence –– the Brown CHILDES transcript provided a part of speech tag for each line
  • syntactic structure of the sentence –– the sentence syntax was recorded, sometimes with the aid of the Stanford Natural Language Parser
  • FrameNet frame elements of each phrase/word in the sentence –– the component words of each sentence were analyzed using FrameNet to identify relevant elements to the target verb
  • sentence type –– identified the sentence type as declarative, interrogative or imperative

(<source> Line # "Sentence"

(target-verb <verb-root>)

(frame <frame>)

(pos-tag (<plist>))

(syntax (<syntax>))

(chunks (<chunks (pos "phrase" FN-Role)>))

(stype <stype>))

The analysis of the 10,000 lines of child-directed speech data yielded 160 annotations of child-directed speech involving these verbs.

The next step of the methodology involved running the coded annotations through a simple LISP program to parse and compile the data. The annotations, originally in list form, were then organized into hash tables, from which a CSV file could easily be created. The function that organized the information appropriately and succinctly began by defining a structure as a container with certain slots. The structure included the similar fields as in the annotations themselves, such as a verb field, frame field, syntax field (valence structure), text field, chunks field, and sentence-type field.

The chunk field was further structured as a list to include the three elements of the chunks line, part of speech, frame element role, and text itself. The code could then read through the annotation list and associate the annotation element with the appropriate structural counterpart (including creating a data structure for each chunk). The code then produces a hash table of keys (syntactic patterns/frames) and values (counts), outputting various sets of data, in order to analyze frames by syntax, or syntax by frame. A short script outputs and formats this data into an excel spreadsheet for convenient analysis. The value added from utilizing this digital method was extremely significant. The input-output efficiency of this method made chunking and checking the data simple. The next step was to statistically analyze the distribution of these various structures after accumulation. Using corpus analysis techniques, I investigated verb frequency, frame appearance, syntactic patterns, and construction occurrences.


Verb frequency was first investigated in the analysis stage. There were 160 total sentences in the study, and 13 verbs were investigated. Frequency of each verb within the data set ranged from two instances for “make,” to 33 instances for “go.” Overall, 26 frames were represented by the 13 verbs over 160 sentences. This means that on average, each verb was expressed by two frames. The range of frames per verb was from 4-to-1, with “play” being represented by four different frames, and several verbs only having one frame, including give, eat, and fall. Two was the median number of frames per verb. The distribution of all 26 frames is shown in Figure 1.

Figure 1.

From the corpus data, the four argument structure constructions analyzed were the intransitive, caused-motion, ditransitive, and conative constructions. Of the child-directed utterances collected, constructions were found in 82 of 160. Each construction appeared in the corpus with varying frequency, ranging from 36 instances (ditransitive) to only two instances (conative). A complete distribution is shown in Figure 2.

Figure 2.

Due to the low frequency of the conative construction in the data –– which was expected –– no specific analyses of this construction were conducted. The other three constructions were further analyzed for the particular verb patterns with which they appear. This is discussed further later in the analysis. Beyond descriptive statistics for the data set, an analysis of pronoun frequency in the child-directed speech instances demonstrated a high frequency of pronoun use. Out of 160 sentences, 109 had a pronoun present, or a rate of 68.1 percent. Moreover, pronouns were the subject of the sentence 104 out of the 109 pronouns instances. 95.4 percent of the time that pronouns were featured, a pronoun was the subject of the sentence. These 104 instances of pronouns as subjects represented 65.0 percent of all cases (N = 160). There were also 14 sentences that featured both a pronoun as the subject, and object of the sentence.

Sentences were also analyzed for the common English constructions of ditransitive, caused-motion, intransitive motion, and conative. The distribution of each of these constructions in the data included the ditransitive appearing 36 times (22.5 percent), the caused-motion construction appearing 11 times (6.9 percent), the intransitive motion construction appearing 33 times (20.6 percent), and the conative construction appearing two times (1.3 percent). This leaves 78 sentences in the data set that were either incomplete, or did not conform to an analyzed construction (48.7 percent), often due to the informal and incomplete nature of mothers’ utterances to their children.

The core analysis of the research was to see which verbs, if any, generalize to these various argument structure constructions. Beginning with the intransitive motion construction, which has been found in the past to generalize from the verb “go,” results showed a significant co-occurrence with this verb in the sample data. While 10 instances of the “go” verb were found in the 33 intransitive motion construction instances, only three were found among the 36 ditransitive construction instances, and only one “go” appeared among the 11 caused-motion constructions. Additionally, within the sample of intransitive constructions from the data, the most frequently occurring verb was “go” with 10 instances. Other verbs that appeared often, but less so, were “sit” with eight instances, as well as “come” and “look” which each appeared four times. Using a chi-squared analysis technique, the trend of “go” appearing more frequently in the intransitive construction than other verbs was shown to be statistically significant at the P < 0.05 level (P (1) = 0.0401).

Similar results were found for the caused-motion construction, where usage of the verb “put” was investigated. Out of the 11 instances of the caused-motion construction, six sentences featured the “put” case. Other verbs that appeared in the caused-motion construction were “see,” which appeared twice; and “take,” “go,” and “give,” each of which appeared only once. Additionally, in the ditransitive construction, only two of the 36 sentences had “put.” Zero “put” cases appeared in the 33 intransitive motion cases. This pattern was also found to be statistically significant. A chi-squared test found the co-occurrence of the “put” verb used with the caused-motion construction to be significant at the P < 0.0001 level (P (1) < 0.0001).

For the ditransitive construction, which historically has never been explicitly linked to a specific verb from which it generalizes, the data demonstrated no particular verb affinity (Goldberg, Casenhiser, and Sethuraman, 2004). Ten different verbs (and 12 different frames) appeared in the 36 instances of the construction, indicating more variety than the other constructions studied. No particular verb’s frequency co-occurring with this construction was significant. Most notably of these was “get,” which had 10 instances (P (1) = 0.2602). Figure 3 shows the range in instances of each verb in this construction. Due to the low frequency of the conative construction, there were not enough instances to conduct a full analysis on this argument structure construction. Overall, significant results of the research proved to be the pronoun analysis, as well as the construction analysis, which showed that two argument structure constructions correlated with their appropriate verb generalizations: the intransitive motion construction with “go,” and the caused-motion construction with “put.”

Figure 3.


The results of this study highlight several interesting aspects. High rates of pronoun use confirmed work by Tomasello and Dodson (1998), who have theorized in the past about the importance of pronouns in child language acquisition due to their symbolic properties (Tomasello, 2009). Pronouns’ inherent requirement of joint-attention demonstrates their use to children, and is a main reason they appear so frequently (75 percent of the time according to Tomasello, and 68.1 percent of the time in the current study) in child-directed speech. The high rate of pronouns as subjects is also interesting, as was hypothesized by Wells. In the current study, of the 109 instances of pronoun-use in child-directed speech sentences, 104 featured a pronominal subject (95.4 percent).

In terms of the construction analysis, the data also supported the hypothesis, demonstrating significant verb generalization co-occurrences between the “go” verb and its intransitive motion construction counterpart, and the “put” case and its counterpart, the caused-motion construction. Existence of these co-occurrences provides further evidence that: a) argument structure constructions do, in fact, carry semantic information; b) the semantic information associated with these particular constructions becomes associated through heightened frequency in child-directed speech; and c) children then utilize this information to not only acquire structures, but also acquire syntactic forms in general. These results provide evidence that the strategies are consistent with those hypothesized in usage-based theories of language in general.

The current research also aims to facilitate further future research, through accumulation of the novel annotated corpus of child-directed speech. Opportunities for different types of research have been created through this additional analysis (including FrameNet annotation) of the CHILDES database, such as one current study at Northwestern University, which utilizes the annotations from my research in order to investigate the role of analogical generalization in construction learning utilizing the SAGE computational model (McFate, Klein, and Forbus, in press).

The current study was limited in a number of ways. First, research was based on particular resources, like the CHILDES corpus and FrameNet database. While these resources are generally beneficial and thorough, the analysis was subject to incorrect frames for certain words, or mislabeling. Additionally, given the manual nature of the annotation and analysis methods, and restricted time, the amount of data collected was limited to two children’s corpora. Though the data proved statistically powerful, analysis of the conative construction (a lower frequency structure) was not possible. Had more data been collected and analyzed for the study, the conative construction may have been found frequently enough to investigate whether it generalized from a certain verb.

With a broader set of data, potentially aided by automated methods of collection, a richer analysis could investigate other constructions, such as the conative or way-constructions. Additionally, research examining other types of speech could observe constructions other than those featured in the current study, such as the resultative, which does not appear as often in child-directed speech.


Jamie Klein graduated from Northwestern University in June of 2017 with honors. After studying cognitive science, film, and marketing communications in his four years at Northwestern, he chose to conduct research and pursue an honors thesis in the areas of linguistics and child development. Since graduating, Jamie has moved to New York City, where he plans on applying his academic interests in consumer behavior and psychology by working in the technology industry.


Alishahi, Afra, and Suzanne Stevenson. "A Computational Model of Early Argument Structure Acquisition." Cognitive Science: A Multidisciplinary Journal 32, no. 5 (2008): 789-834. doi:10.1080/03640210801929287.

Barak, Libby, Afsaneh Fazly, and Suzanne Stevenson. "Modeling the Emergence of an Exemplar Verb in Construction Learning." Proceedings of the Annual Meeting of the Cognitive Science Society 35, no. 35 (January 1, 2013): 1815-820.

Bolinger, Dwight. "Entailment and the meaning of structures." Glossa 2, no. 2 (1968): 119-127.

Childers, Jane B., and Michael Tomasello. "The Role of Pronouns in Young Children’s Acquisition of the English Transitive Construction." Developmental Psychology 37, no. 6 (2001): 739-48. doi:10.1037//0012-1649.37.6.739.

Chomsky, Noam. Syntactic Structures. The Hague: Mouton, 1957.

Chomsky, Noam. Language and Mind. New York: Harcourt Brace Jovanovich, 1968.

Chomsky, Noam. Lectures on Binding and Government. Dordrecht: Foris, 1981.

Connor, Michael, Yael Gertner, Cynthia Fisher, and Dan Roth. "Baby SRL: Modeling early language acquisition." Proceedings of the Twelfth Conference on Computational Natural Language Learning, 2008, 81-88. doi:10.3115/1596324.1596339.

Fillmore, Charles J., Paul Kay, and Mary Catherine O’Connor. "Regularity and Idiomaticity in Grammatical Constructions: The Case of Let Alone." Language 64, no. 3 (1988): 501-38. doi:10.2307/414531.

Goldberg, Adele E. Constructions: A Construction Grammar Approach to Argument Structure. Chicago: University of Chicago Press, 1995.

Goldberg, Adele E. "Constructions: A New Theoretical Approach to Language." Trends in Cognitive Sciences 7, no. 5 (2003): 219-24. doi:10.1016/s1364-6613(03)00080-9.

Goldberg, Adele E., Devin M. Casenhiser, and Nitya Sethuraman. "Learning Argument Structure Generalizations." Cognitive Linguistics 15, no. 3 (2004): 289-316. doi:10.1515/cogl.2004.011.

McFate, Clifton. "Analogical Generalization of Linguistic Constructions." Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16): 4309-310.

MacWhinney, Brian. The CHILDES project: tools for analyzing talk. Mahwah , NJ: Lawrence Erlbaum Associates, 2000.

MacWhinney, Brian. "The CHILDES system." American Journal of Speech-Language Pathology 5, no. 1 (February 1996): 5-14.

Ninio, Anat. "Pathbreaking verbs in syntactic development and the question of prototypical transitivity." Journal of Child Language 26, no. 3 (1999): 619-53. doi:10.1017/s0305000999003931.

Ruppenhofer, Josef, Michael Ellsworth, Miriam R. L. Petruck, Christopher R. Johnson, and Jan Scheffczyk. FrameNet II: Extended Theory and Practice. Berkeley, CA: University of California, 2010.

Searle, John R. Intentionality: An Essay in the Philosophy of Mind. Cambridge: Cambridge University Press, 1983.

Tomasello, Michael. Constructing a Language: A Usage-Based Theory of Language Acquisition. Cambridge, MA: Harvard University Press, 2003.

Williams, Alexander. Arguments in Syntax and Semantics. Cambridge: Cambridge University Press, 2015.