Measures for explainable AI: Explanation goodness, user satisfaction, mental models, curiosity, trust, and human-AI performance

If a user is presented an AI system that portends to explain how it works, how do we know whether the explanation works and the user has achieved a pragmatic understanding of the AI? This question entails some key concepts of measurement such as explanation goodness and trust. We present methods for...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Frontiers in computer science (Lausanne) Ročník 5
Hlavní autoři: Hoffman, Robert R., Mueller, Shane T., Klein, Gary, Litman, Jordan
Médium: Journal Article
Jazyk:angličtina
Vydáno: Frontiers Media S.A 06.02.2023
Témata:
ISSN:2624-9898, 2624-9898
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:If a user is presented an AI system that portends to explain how it works, how do we know whether the explanation works and the user has achieved a pragmatic understanding of the AI? This question entails some key concepts of measurement such as explanation goodness and trust. We present methods for enabling developers and researchers to: (1) Assess the a priori goodness of explanations, (2) Assess users' satisfaction with explanations, (3) Reveal user's mental model of an AI system, (4) Assess user's curiosity or need for explanations, (5) Assess whether the user's trust and reliance on the AI are appropriate, and finally, (6) Assess how the human-XAI work system performs. The methods we present derive from our integration of extensive research literatures and our own psychometric evaluations. We point to the previous research that led to the measurement scales which we aggregated and tailored specifically for the XAI context. Scales are presented in sufficient detail to enable their use by XAI researchers. For Mental Model assessment and Work System Performance, XAI researchers have choices. We point to a number of methods, expressed in terms of methods' strengths and weaknesses, and pertinent measurement issues.
ISSN:2624-9898
2624-9898
DOI:10.3389/fcomp.2023.1096257