Zobraziť v EDS

Measure only what is measurable: Towards conversation requirements for evaluating task-oriented dialogue systems

Uložené v:

Podrobná bibliografia
Názov:	Measure only what is measurable: Towards conversation requirements for evaluating task-oriented dialogue systems
Autori:	Van Miltenburg, Emiel, Braggaar, Anouck, Croes, Emmelyn, Kunneman, Florian, Liebrecht, Christine, Martijn, Gabriella
Informácie o vydavateľovi:	Association for Computational Linguistics (ACL), 2025.
Rok vydania:	2025
Predmety:	evaluation metrics, prompt engineering, model assessment, benchmarking, natural language generation
Popis:	Chatbots for customer service have been widely studied in many different fields, ranging from Natural Language Processing (NLP) to Communication Science. These fields have developed different evaluation practices to assess chatbot performance (e.g., fluency, task success) and to measure the impact of chatbot usage on the user's perception of the organisation controlling the chatbot (e.g., brand attitude) as well as their willingness to enter a business transaction or to continue to use the chatbot in the future (i.e., purchase intention, reuse intention). While NLP researchers have developed many automatic measures of success, other fields mainly use questionnaires to compare different chatbots. This paper explores the extent to which we can bridge the gap between the two, and proposes a research agenda to further explore this question.
Druh dokumentu:	Conference object
Jazyk:	English
Prístupová URL adresa:	https://research.tilburguniversity.edu/en/publications/cfa9a956-f20e-4859-9de4-5c9ddcc60fd4
Rights:	CC BY
Prístupové číslo:	edsair.dris...01181..71ff5a285cf19c0854c7ae2292a409f1
Databáza:	OpenAIRE

View record at OpenAIRE

Nájsť tento článok vo Web of Science

Popis
Abstrakt:	Chatbots for customer service have been widely studied in many different fields, ranging from Natural Language Processing (NLP) to Communication Science. These fields have developed different evaluation practices to assess chatbot performance (e.g., fluency, task success) and to measure the impact of chatbot usage on the user's perception of the organisation controlling the chatbot (e.g., brand attitude) as well as their willingness to enter a business transaction or to continue to use the chatbot in the future (i.e., purchase intention, reuse intention). While NLP researchers have developed many automatic measures of success, other fields mainly use questionnaires to compare different chatbots. This paper explores the extent to which we can bridge the gap between the two, and proposes a research agenda to further explore this question.