Stance detection in conversational L1 and L2 English
Chenglinzi Yang, Jet Hoek, Wilbert Spooren
Centre for Language Studies, Radboud University
Exploring stance in spoken interaction reveals the intricate interplay between interlocutors’ viewpoints, judgments, and sociocultural dynamics. Despite a widespread research interest across native English dominated forms of public discourse, such as news and social media, a significant gap exists in exploring variations among non-native English speakers with diverse sociolinguistic backgrounds. While existing studies have highlighted linguistically and culturally mediated differences in manners of expressing opinions, it is crucial to expand stance research on speech activities by different second language (L2) speakers of English. Recent advancements in NLP tools prove effective in extracting stance features on a large scale but have rarely been implemented in L2 context. Therefore, this ongoing study introduces a comprehensive stance detection (SD) framework aimed at automatically classifying stances in conversations among different L2 English speakers in comparison with their L1 English counterparts. By systematically comparing how speakers from various sociocultural backgrounds express their opinions, our investigation helps untangle stance-taking in dynamic interactions as a sociocultural construct. Given the absence of professionally annotated L2 spoken English corpora for clause-level SD, we develop novel classifiers for automatic SD on transcribed L2 speech, trained on our manual annotations with systematic labelling guidelines.
Our research draws upon five linguistic backgrounds represented in the Louvain Corpus of Native English Conversation (LOCNEC) and the Louvain International Database of Spoken English Interlanguage (LINDSEI), including Chinese, Dutch, French, and Greek groups. Both corpora offer comparable datasets featuring three identical speaking tasks across different groups. While most research focuses on sentence-level or document-level stance identification, we approach stance at a more fine-grained clausal level to capture minimal stance units in dynamic conversational interactions. To achieve automation, we leverage a state-of-the-art pre-trained model, namely RoBERTa, and fine-tune it with expert annotated stance labels unique to our corpus. Although RoBERTa excels in subjectivity detection by capturing contextualised word representations of L1 English, its application in SD, particularly in L2 English research, remains unexploited. The data to be annotated for fine-tuning constitute 10% of the total dataset, equivalent to 5 interviews per speaker group and a total of 25 interviews. This involves a two-step stance annotation, capturing stance and polarity within elementary discourse units (EDUs). It first distinguishes between subjective (e.g., judgments, beliefs, conclusions, or intentional acts) and objective fragments. Subsequently, for those subjective units, the study identifies speakers’ stance polarity as positive, negative, or neutral towards a certain proposition.
Overall, we aim to introduce a novel SD pipeline tailored to L1 and L2 conversational speech by integrating expert annotation guidelines and advanced NLP techniques. We anticipate varied levels of subjectivity and polarisation in expressing opinions across speaker groups, providing evidence for explaining the intricate interplay between sociocultural factors and linguistic expressions. The study not only contributes to the theoretical understandings of stance annotation but also advances methods for automatically analysing stance in conversational speech. This opens a new avenue for detecting intricate layers of stance to further query stance-taking variations, particularly for an under-researched realm of L2 speech.