Cross-genre and -topic authorship recognition on Dutch human and LLM-generated essays

Hans van Halteren

Radboud University

Twenty years ago, we published results for authorship recognition on a small corpus of Dutch texts (van Halteren 2004; van Halteren, Baayen, Tweedie, Haverkort, Neijt 2005). This ABC-NL1 corpus contains 72 texts with lengths in the order of 1000 words, with eight students writing nine prompted texts of varying genres and topics: argumentative non-fiction on Big Brother, the unification of Europe, and smoking; descriptive non-fiction on soccer, the millennium change, and the most recently read book; and fiction on Little Red Riding Hood, a murder at the university, and a chivalry romance. At that time, several previously accepted attribution methods turned out to be wanting and our own recognition system appeared to be a good step forward, although still not perfect. Now, twenty years later, that system has evolved (van Halteren 2022) and it is time to check how much we have really progressed.

But we want to take this venture one step further. The field is – obviously – currently very interested in LLM-generated texts. For this reason, we extended the corpus with four more “authors”. All four are instantiations of GPT-4o as present in the public version of ChatGPT in May 2024. To have the new texts fit into ABC-NL1, the prompts indicate the authors to be 20-year old students at Radboud University in 1999. After that, the prompts differ as to gender (male, female) and curriculum (Dutch, Physics). Initially, we also intended to include text from Bing Copilot and Google Gemini. However, we did not manage to make these systems produce texts of the desired length.

We will subject all 108 (=12*9) texts to authorship recognition and compare the recognition quality to that in 2004. We will also attempt to identify distinguishing features. In addition, we give special attention to the differences between human-authored and LLM-generated texts. We will use a broad spectrum of features, consisting on the one hand of more or less standard authorship features, such as character and word n-grams, syntactic structures and vocabulary richness (van Halteren 2022), and on the other hand some information content and dispersal features used in last year’s CLIN Shared Task (Fives et al. 2024).

References
Fivez, P., Daelemans, W., Van de Cruys, T., Kashnitsky, Y., Chamezopoulos, S., Mohammadi, H., ... & van Halteren, H. (2024). The CLIN33 Shared Task on the Detection of Text Generated by Large Language Models. Computational Linguistics in the Netherlands Journal, 13, 233-259.
Van Halteren, H. (2004). Linguistic Profiling for Authorship Recognition and Verification. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), pages 199–206, Barcelona, Spain.
Van Halteren, H. (2022). Automatic Authorship Investigation. In Guillén-Nieto, V., & Stein, D. (Eds.). (2022). Language as evidence: Doing forensic linguistics. Springer Nature.
Van Halteren, H., Baayen, H., Tweedie, F., Haverkort, M., & Neijt, A. (2005). New machine learning methods demonstrate the existence of a human stylome. Journal of Quantitative Linguistics, 12(1), 65-77.
© 2024 CLIN 34 Organisators. All rights reserved. Contact us via email.