Enhancing Human-Computer Interaction: Leveraging Miscommunication Detection in Chatbot Dialogues for Conversation Breakdown Prevention

Gabriella Bollici

TNO

Nowadays, artificial intelligence-based chatbots are extensively utilized in both public and private sectors to provide personalized assistance and address customer inquiries, leading to a growing demand for continuous refinement of these systems to better align with both organizational and user requirements. However, generating conversation poses challenges and frequently results in scenarios where the agent produces a response that the user struggles to address adequately or that creates friction between the user and the agent. These system failures, known as breakdowns, can be defined as failures of the system to correctly understand the intended meaning of the user’s communication, and may cause users to abandon the conversation. Therefore, identifying and solving these breakdowns is considered a key element for chatbot development. Dialogue breakdown analysis, a subfield of Natural Language Processing, focuses on identifying points in conversations where users struggle to continue. While other projects have previously achieved notable results in detecting and classifying breakdowns, this study focuses on improving explainability in this domain by determining and classifying their causes, known as miscommunications.

This investigation is based on The Dialogue Breakdown Detection Challenge, a workshop aimed at uniting various methods for dialogue breakdown detection and discussing potential evaluation metrics. It specifically targets the detection of miscommunications, defined as a situation when a dialogue system gives a user an inappropriate reply. For this purpose, we incorporate the ABC-Eval dataset, which provides valuable labels and information regarding miscommunications, enriching the training process and enhancing the model's ability to accurately identify breakdown-inducing utterances. In this way, beyond determining whether a breakdown has occurred or not, we delve deeper by identifying the type of dialogue behavior or miscommunication that has caused it, contributing to improving the understanding of breakdown causes. The project's ulterior objective is to provide support the EU-FarmBook Horizon Europe project, which aims to create a digital platform for sharing agricultural and forestry knowledge. Given the limited presence of chatbots in the agriculture sector, this research focuses on enhancing chatbot systems to improve human-computer interaction and ensure easy access to information for all potential users, including a diverse range of farmers, foresters, and advisors with varying social, cultural and technical backgrounds.

The project will comprise three NLP approaches, each corresponding to a different classification model used to identify dialogue breakdowns: LSTM, BERT, and LLMs. These models will be compared based on their classification performance in detecting instances of dialogue breakdowns. Additionally, the Large Language Models (LLMs) will be used to classify various dialogue behaviors, with the goal of improving the interpretability of the system and assessing the usefulness of categorizing miscommunications to better understand the causes of dialogue breakdowns. Nine dialogue behaviors contained in the ABC-Eval dataset will be classified: empathetic responses, lack of empathy, common sense understanding, contradiction, incorrect factual information, self-contradiction, contradiction by the partner, redundancy, and instances of ignoring or providing irrelevant information. For the evaluation segment, we selected classification-related metrics such as accuracy, precision, recall, and F1 score, as well as distribution-related metrics like Jensen-Shannon Divergence and Mean Squared Error.