Addressing LLM-related Measurement Error in Social Science Modeling Research
Qixiang Fang, Javier Garcia Bernardo, Erik-Jan van Kesteren
Utrecht University
The advent of powerful large language models (LLMs) has revolutionised the collection of information related to social science constructs, such as personality traits, political attitudes, and values, from textual and tabular data. While NLP researchers typically focus on predicting scores for these constructs, social scientists aim to use these predictions to understand underlying mechanisms and draw inferences about populations. Traditionally, social science constructs are measured using survey questions and scales, allowing for the assessment and correction of measurement quality in terms of reliability and validity. However, the applicability of these measurement quality assessments to LLM predictions remains unclear, as does their impact on subsequent social science modelling.
This study has two primary objectives. First, we review existing literature to identify practices for addressing LLM-related measurement error, particularly within the social science context. Second, we synthesise these findings with existing measurement modelling literature to propose a comprehensive framework for making robust inferences using LLM-based measurements in social sciences. By bridging the gap between LLM prediction capabilities and social science inference requirements, our framework aims to enhance the reliability and validity of social science research outcomes in the era of LLMs.