Compositionality in Emergent Languages: Evaluating the Metrics

Urtė Jakubauskaitė

Institute for Logic, Language and Computation, University of Amsterdam

Raquel G. Alhama

Institute for Logic, Language and Computation, University of Amsterdam

Phong Le

Amazon Alexa

Compositionality is a key characteristic of natural languages, suggesting that complex linguistic units' meanings are derived from their components' meanings. An open question is whether we can simulate the emergence of a language that exhibits compositionality to some degree. To address that question, studies in language emergence have used referential games, in which agents work together on a task, sharing a common reward system based on their collective performance. Although this approach offers flexibility, it also introduces a challenge: humans cannot readily interpret the messages that agents exchange. So, how can we ensure that these emergent languages are genuinely compositional?

Prior work has employed a range of metrics to quantify compositionality; most commonly, topographic similarity (Brighton & Kirby, 2006), positional disentanglement, and bag-of-symbols disentanglement (Chaabouni et al., 2020). However, since the metrics are used over uninterpretable languages, we do not know to what extent they are really sensitive to compositional aspects of the messages, or whether there are any other features that are captured by these metrics. In other words, the metrics themselves have not been empirically evaluated.

To address this issue, we present a novel dataset consisting of a set of (referential) games, paired with a set of grammars that can generate languages to describe items in the games. Crucially, the grammars range from those that generate messages with minimal compositionality to those that are highly compositional. By applying compositionality metrics to our dataset, we can determine whether the metrics can differentiate grammars with varying levels of compositionality. Additionally, our dataset includes grammars that make use of other linguistic factors, such as case marking and polysemy, which have not received attention before. With this novel approach, our ongoing work will allow us to find out the relative sensitivity and robustness of existing metrics, and will pave the way for the design and evaluation of novel metrics that capture specific aspects of emergent languages.

Brighton, Henry & Simon Kirby. 2006. Understanding Linguistic Evolution by Visualizing the Emergence of Topographic Mappings. Artificial life 12.2: 229–242.

Chaabouni, Rahma, Eugene Kharitonov, Diane Bouchacourt, Emmanuel Dupoux & Marco Baroni. 2020. Compositionality and Generalization in Emergent Languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4427–4442, Online. Association for Computational Linguistics.