Is the information provided by large language models valid in educating patients about adolescent idiopathic scoliosis? An evaluation of content, clarity, and empathy

Saved in:
Bibliographic Details
Title: Is the information provided by large language models valid in educating patients about adolescent idiopathic scoliosis? An evaluation of content, clarity, and empathy
Authors: Lang, Siegmund, Vitale, Jacopo Antonino, Galbusera, Fabio, Fekete, Tamás F., Charles, Yann Philippe, Haddad, Sleiman, boissiere, louis, Núñez-Pereira, Susana
Contributors: Institut Català de la Salut, [Lang S] Department of Trauma Surgery, University Hospital Regensburg, Regensburg, Germany. Department of Spine Surgery, Schulthess Klinik, Zurich, Switzerland. [Vitale J, Galbusera F] Spine Center, Schulthess Klinik, Zurich, Switzerland. [Fekete T] Department of Spine Surgery, Schulthess Klinik, Zurich, Switzerland. [Boissiere L] Spine Unit Orthopaedic Department, Hôpital Pellegrin Bordeaux, Bordeaux, France. [Charles YP] Dept. of Spine Surgery, Hôpitaux Universitaires de Strasbourg, Université de Strasbourg, Strasbourg, France. [Núñez Pereira S, Haddad S] Unitat de Cirurgia de la Columna Vertebral, Vall d’Hebron Hospital Universitari, Barcelona, Spain, Vall d'Hebron Barcelona Hospital Campus
Source: Scientia
Publisher Information: Springer, 2025.
Publication Year: 2025
Subject Terms: Educació dels pacients, Metge i pacient, PSIQUIATRÍA Y PSICOLOGÍA::conducta y mecanismos de la conducta::psicología social::relaciones interpersonales::relaciones profesional-paciente::relaciones médico-paciente, Intel·ligència artificial, Escoliosi, DENOMINACIONES DE GRUPOS::personas::Grupos de Edad::adolescente, DISEASES::Musculoskeletal Diseases::Bone Diseases::Spinal Diseases::Spinal Curvatures::Scoliosis, FENÓMENOS Y PROCESOS::conceptos matemáticos::algoritmos::inteligencia artificial, Adolescents, PSYCHIATRY AND PSYCHOLOGY::Behavior and Behavior Mechanisms::Psychology, Social::Interpersonal Relations::Professional-Patient Relations::Physician-Patient Relations, NAMED GROUPS::Persons::Age Groups::Adolescent, PHENOMENA AND PROCESSES::Mathematical Concepts::Algorithms::Artificial Intelligence, ENFERMEDADES::enfermedades musculoesqueléticas::enfermedades óseas::enfermedades de la columna vertebral::desviaciones de la columna vertebral::escoliosis, HEALTH CARE::Health Care Facilities, Manpower, and Services::Health Services::Preventive Health Services::Health Education::Patient Education as Topic, ATENCIÓN DE SALUD::instalaciones, servicios y personal de asistencia sanitaria::servicios de salud::servicios preventivos de salud::educación en salud::educación de pacientes como asunto
Description: Purpose Large language models (LLM) have the potential to bridge knowledge gaps in patient education and enrich patient-surgeon interactions. This study evaluated three chatbots for delivering empathetic and precise adolescent idiopathic scoliosis (AIS) related information and management advice. Specifically, we assessed the accuracy, clarity, and relevance of the information provided, aiming to determine the effectiveness of LLMs in addressing common patient queries and enhancing their understanding of AIS. Methods We sourced 20 webpages for the top frequently asked questions (FAQs) about AIS and formulated 10 critical questions based on them. Three advanced LLMs—ChatGPT 3.5, ChatGPT 4.0, and Google Bard—were selected to answer these questions, with responses limited to 200 words. The LLMs’ responses were evaluated by a blinded group of experienced deformity surgeons (members of the European Spine Study Group) from seven European spine centers. A pre-established 4-level rating system from excellent to unsatisfactory was used with a further rating for clarity, comprehensiveness, and empathy on the 5-point Likert scale. If not rated 'excellent', the raters were asked to report the reasons for their decision for each question. Lastly, raters were asked for their opinion towards AI in healthcare in general in six questions. Results The responses among all LLMs were ‘excellent’ in 26% of responses, with ChatGPT-4.0 leading (39%), followed by Bard (17%). ChatGPT-4.0 was rated superior to Bard and ChatGPT 3.5 (p = 0.003). Discrepancies among raters were significant (p 3.0 on 5.0) and did not demonstrate any differences among LLMs. However, GPT-3.5 struggled with language suitability and empathy, while Bard’s responses were overly detailed and less empathetic. Overall, raters found that 9% of answers were off-topic and 22% contained clear mistakes. Conclusion Our study offers crucial insights into the strengths and weaknesses of current LLMs in AIS patient and parent education, highlighting the promise of advancements like ChatGPT-4.o and Gemini alongside the need for continuous improvement in empathy, contextual understanding, and language appropriateness.
Escoliosis idiopática del adolescente; Modelos de lenguaje extenso; Educación para el paciente
Escoliosi idiopàtica de l'adolescent; Models de llenguatge extens; Educació pel pacient
Adolescent idiopathic scoliosis; Large language models; Patient education
Open Access funding enabled and organized by Projekt DEAL.
Document Type: Article
File Description: application/pdf
Language: English
DOI: 10.1007/s43390-024-00955-3
Access URL: http://hdl.handle.net/11351/12820
Rights: CC BY
Accession Number: edsair.od......3991..8594d33840a8d11d10f4f2571d221729
Database: OpenAIRE
Description
Abstract:Purpose Large language models (LLM) have the potential to bridge knowledge gaps in patient education and enrich patient-surgeon interactions. This study evaluated three chatbots for delivering empathetic and precise adolescent idiopathic scoliosis (AIS) related information and management advice. Specifically, we assessed the accuracy, clarity, and relevance of the information provided, aiming to determine the effectiveness of LLMs in addressing common patient queries and enhancing their understanding of AIS. Methods We sourced 20 webpages for the top frequently asked questions (FAQs) about AIS and formulated 10 critical questions based on them. Three advanced LLMs—ChatGPT 3.5, ChatGPT 4.0, and Google Bard—were selected to answer these questions, with responses limited to 200 words. The LLMs’ responses were evaluated by a blinded group of experienced deformity surgeons (members of the European Spine Study Group) from seven European spine centers. A pre-established 4-level rating system from excellent to unsatisfactory was used with a further rating for clarity, comprehensiveness, and empathy on the 5-point Likert scale. If not rated 'excellent', the raters were asked to report the reasons for their decision for each question. Lastly, raters were asked for their opinion towards AI in healthcare in general in six questions. Results The responses among all LLMs were ‘excellent’ in 26% of responses, with ChatGPT-4.0 leading (39%), followed by Bard (17%). ChatGPT-4.0 was rated superior to Bard and ChatGPT 3.5 (p = 0.003). Discrepancies among raters were significant (p 3.0 on 5.0) and did not demonstrate any differences among LLMs. However, GPT-3.5 struggled with language suitability and empathy, while Bard’s responses were overly detailed and less empathetic. Overall, raters found that 9% of answers were off-topic and 22% contained clear mistakes. Conclusion Our study offers crucial insights into the strengths and weaknesses of current LLMs in AIS patient and parent education, highlighting the promise of advancements like ChatGPT-4.o and Gemini alongside the need for continuous improvement in empathy, contextual understanding, and language appropriateness.<br />Escoliosis idiopática del adolescente; Modelos de lenguaje extenso; Educación para el paciente<br />Escoliosi idiopàtica de l'adolescent; Models de llenguatge extens; Educació pel pacient<br />Adolescent idiopathic scoliosis; Large language models; Patient education<br />Open Access funding enabled and organized by Projekt DEAL.
DOI:10.1007/s43390-024-00955-3