MODOMA: Providing a Computational Laboratory for Language Acquisition Experiments

David Shakouri

Leiden University Centre for Linguistics (LUCL), Leiden University; Leiden Institute for Brain and Cognition (LIBC), Leiden University

This presentation discusses experiments performed using a multi-agent computational laboratory environment for language acquisition experiments, which has been named MODOMA, an acronym for Moeder-Dochter-Machine (Dutch for ‘Mother-Daughter-Machine’). This system is based on the interaction between two agents: (1) a mother language model based on Delilah (cf. Cremers, Hijzelendoorn, & Reckman, 2014), the Leiden parser and generator of Dutch, and (2) a daughter language model, which sets out to learn the mother language. Both agents implement a model of many aspects of grammar such as syntactic as well as semantic representations. Crucially, the daughter agent performs unsupervised learning: The daughter does not have access to the internal linguistic knowledge of the mother agent but only to the language exemplars the mother produces during the conversation. To this end, the daughter language model employs a hybrid approach combining statistical as well as rule-based techniques to acquire the target language that constitutes (a fragment of) Dutch. As soon as the daughter agent has acquired new grammatical knowledge, this is used to take part in the conversation with the mother. The presented experiments illustrate how the MODOMA can be used to acquire explicit abstract grammatical knowledge and substantiate that the MODOMA project resulted in a viable and functionable tool for language acquisition simulations.

The properties of the MODOMA provide novel possibilities for modelling language acquisition from a natural language processing perspective. Moreover, both the mother and daughter agents are knowledge-based language models: All aspects of the system are parametrized and can straightforwardly be consulted by researchers employing the system. As most computational models of language acquisition are based on corpus data such as the CHILDES database (MacWhinney, 2014), the multi-agent conversational framework such that two language models take part in an interaction but only one of these agents provides samples of the target language while the other acquires the mother language, enables addressing new research questions by conducting computational language acquisition experiments. Crucially, this design allows the adult language model to give feedback to the daughter.

In particular, the presented experiments illustrate that the MODOMA provides an additional tool to perform language acquisition experiments from a cognitive perspective adding to experimentations with on the one hand human children and on the other hand adult subjects. For example, several experimentations demonstrated that by analyzing training and test data generated during interactions the daughter agent successfully acquires discrete grammatical categories such as content and function words, noun, adjective and verb, which are subsequently added to her grammatical knowledge. As part of the hybrid approach, classification of mother exemplars is performed. It is argued that this procedure can be used to acquire structures that are similar to grammatical categories proposed by linguists for natural languages. Thus, it is established that non-trivial grammatical knowledge has been acquired.

Cremers, C. L. J. M., Hijzelendoorn, P. M., & Reckman, H. G. B. (2014). Meaning versus grammar. Leiden: Leiden University Press.
MacWhinney, B. (2014). The CHILDES project: Tools for analyzing talk (3rd ed.). New York, NY: Routledge.