Evaluating LLM-Generated Topic Names via Text Reconstruction

Andriy Kosar

Textgain; CLiPS, University of Antwerp

Guy De Pauw

Textgain

Walter Daelemans

CLiPS, University of Antwerp

Automatically generating topic names for texts using large language models (LLMs) has emerged as a new promising way of topic detection. However, evaluating the quality of these LLM-generated topic names by comparing these to human-named references poses a challenge. Human-named topics can vary based on background and preferences, and obtaining human-annotated text is costly. To address these challenges, we propose leveraging the inherent human ability to interpret and reconstruct information from a topic based on their background and knowledge, by mimicking this natural process through an automated system.
We introduce a novel, model-agnostic evaluation method that leverages LLMs to reconstruct the original text from the generated topic, then compares the reconstructed text to the original. To measure the differences between the original and reconstructed texts, we evaluate the applicability of metrics like BLEU, BERTScore, and Cosine similarity. This approach favors topics that preserve the most important information while counteracting hallucination. In addition to this reconstruction-based metric, we evaluate other commonly used methods like perplexity and coherence for assessing topic quality for generated topic names. Results show that the reconstruction metric provides complementary insights beyond traditional topic quality metrics by directly measuring information preservation and relevance to text.
This work highlights the importance of evaluating generated topic names from multiple angles – going beyond readability to prioritize information completeness and mitigate hallucination. The proposed model-agnostic approach offers a robust automatic evaluation technique for LLM-generated topic names. Additionally, we demonstrate that this reconstruction-based evaluation method can be extended to other forms of semantic compression generated by LLMs, including summaries, headlines, and keywords.
© 2024 CLIN 34 Organisators. All rights reserved. Contact us via email.