What's Mine becomes Yours: Defining, Annotating and Detecting Context-Dependent Paraphrases in News Interview Dialogs

Anna Wegmann

Utrecht University

Tijs van den Broek

Vrije Universiteit Amsterdam

Dong Nguyen

Utrecht University

Repeating or paraphrasing what the previous speaker said has time and again been found to be important in human-to-human or human-to-computer dialogs: It encourages elaboration and introspection in counseling, can help deescalate conflicts in crisis negotiations, can have a positive impact on relationships, can increase the perceived response quality of dialog systems and generally provides tangible understanding-checks to ground what both speakers agree on.
In NLP, paraphrases have received wide-spread attention: researchers have created numerous paraphrase datasets, developed methods to automatically identify paraphrases, used paraphrase datasets to train semantic sentence representations, and benchmark modern LLMs. However, most previous work (1) has focused on context-independent paraphrases, i.e., those that are semantically equivalent independent from the given context, and has not investigated paraphrases in dialog specifically, (2) has classified paraphrases at the level of full texts although paraphrases often only occur in portions of larger paragraphs, (3) uses only one to three annotations per paraphrase pair and (4) only annotate pairs of texts that are “likely” to include paraphrases using heuristics like lexical similarity even though, we can not expect lexical similarity to be high for all or even most paraphrases.
We address all four limitations with this work. First, we are, the first to focus on operationalizing, annotating and automatically detecting context-dependent paraphrases in dialog. Second, instead of only classifying whether a text paraphrases (part) of a previous text, we identify the text spans that constitute the paraphrase pair. Third, we collect a larger number of annotations of up to 20 per item in line with typical efforts to address plausible human label variation. Even though context-dependent paraphrase identification in dialog might at first seem straight forward with a clear ground truth, similar to other “objective” tasks in NLP, human annotators (plausibly) disagree on labels. For example, consider the following interaction: Speaker 1: “The money will help.”, Speaker 2: “It cant hurt.”. “[The money] can't hurt” can be interpreted in at least two different ways, as a statement with approximately the same meaning as “the money will help” or as an opposing statement meaning the money actually will not help but at least “It can't hurt” either. Fourth, instead of using heuristics to select text pairs for annotation, we choose a setting with a higher than average expected occurrence of paraphrases: transcripts of NPR and CNN news interviews. In (news) interviews, paraphrasing or more generally active listening is encouraged.
In short, we operationalize context-dependent paraphrases in dialog with a definition and an iteratively developed hands-on training for annotators. Then, annotators classify paraphrases and identify the spans of text that constitute the paraphrase. We release a dataset with 5581 annotations on 600 utterance pairs from NPR and CNN news interviews. We use in-context learning on generative models like Llama 2 or GPT-4 and fine-tune a DeBERTa token classifier to detect paraphrases in dialog. In-context approaches perform better at classification, while the token classifier tends to provide better text spans. We hope to contribute to the reliable detection of paraphrases in dialog.