Comparing performances of french orthopaedic surgery residents with the artificial intelligence ChatGPT-4/4o in the French diploma exams of orthopaedic and trauma surgery.

Uloženo v:
Podrobná bibliografie
Název: Comparing performances of french orthopaedic surgery residents with the artificial intelligence ChatGPT-4/4o in the French diploma exams of orthopaedic and trauma surgery.
Autoři: Maraqa N; Service de Chirurgie Orthopédique et Traumatologique, Hôpital Trousseau, CHRU de Tours, Faculté de Médecine, Université de Tours Centre-Val de Loire, France., Samargandi R; Department of Orthopedic Surgery, Faculty of Medicine, University of Jeddah, Jeddah, Saudi Arabia., Poichotte A; Service de Chirurgie Orthopédique et Traumatologique, Centre Hospitalier Loire-Vendée-Océan, Challans, France., Berhouet J; Service de Chirurgie Orthopédique et Traumatologique, Hôpital Trousseau, CHRU de Tours, Faculté de Médecine, Université de Tours Centre-Val de Loire, France., Benhenneda R; Service de Chirurgie Orthopédique et Traumatologique, Hôpital Trousseau, CHRU de Tours, Faculté de Médecine, Université de Tours Centre-Val de Loire, France. Electronic address: rayane.benhenneda@gmail.com.
Zdroj: Orthopaedics & traumatology, surgery & research : OTSR [Orthop Traumatol Surg Res] 2025 Dec; Vol. 111 (8), pp. 104080. Date of Electronic Publication: 2024 Dec 04.
Způsob vydávání: Journal Article; Comparative Study
Jazyk: English
Informace o časopise: Publisher: Elsevier Masson SAS Country of Publication: France NLM ID: 101494830 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1877-0568 (Electronic) Linking ISSN: 18770568 NLM ISO Abbreviation: Orthop Traumatol Surg Res Subsets: MEDLINE
Imprint Name(s): Original Publication: Issy les Moulineaus, France : Elsevier Masson SAS,
Výrazy ze slovníku MeSH: Internship and Residency* , Orthopedics*/education , Educational Measurement*/methods , Artificial Intelligence* , Traumatology*/education , Clinical Competence*, Humans ; France ; Acute Care Surgery ; Generative Artificial Intelligence
Abstrakt: Competing Interests: Declaration of competing interest None in relation to this work. Outside of this work: Julien Berhouet is consultant for Stryker.
Introduction: This study evaluates the performance of ChatGPT, particularly its versions 4 and 4o, in answering questions from the French orthopedic and trauma surgery exam (Diplôme d'Études Spécialisées, DES), compared to the results of French orthopedic surgery residents. Previous research has examined ChatGPT's capabilities across various medical specialties and exams, with mixed results, especially in the interpretation of complex radiological images.
Hypothesis: ChatGPT version 4o was capable of achieving a score equal to or higher (not lower) than that of residents for the DES exam.
Methods: The response capabilities of the ChatGPT model, versions 4 and 4o, were evaluated and compared to the results of residents for 250 questions taken from the DES exams from 2020 to 2024. A secondary analysis focused on the differences in the AI's performance based on the type of data being analyzed (text or images) and the topic of the questions.
Results: The score achieved by ChatGPT-4o was equivalent to that of residents over the past five years: 74.8% for ChatGPT-4o vs. 70.8% for residents (p = 0.32). The accuracy rate of ChatGPT was significantly higher in its latest version 4o compared to version 4 (58.8%, p = 0.0001). Secondary subgroup analysis revealed a performance deficiency of the AI in analyzing graphical images (success rates of 48% and 65% for ChatGPT-4 and 4o, respectively). ChatGPT-4o showed superior performance to version 4 when the topics involved the spine, pediatrics, and lower limb.
Conclusion: The performance of ChatGPT-4o is equivalent to that of French students in answering questions from the DES in orthopedic and trauma surgery. Significant progress has been observed between versions 4 and 4o. The analysis of questions involving iconography remains a notable challenge for the current versions of ChatGPT, with a tendency for the AI to perform less effectively compared to questions requiring only text analysis.
Level of Evidence: IV; Retrospective Observational Study.
(Copyright © 2024 The Authors. Published by Elsevier Masson SAS.. All rights reserved.)
Contributed Indexing: Keywords: Artificial intelligence; ChatGPT-4; ChatGPT-4o; Diploma of specialized studies; Orthopedic and trauma surgery
Entry Date(s): Date Created: 20241206 Date Completed: 20251128 Latest Revision: 20251128
Update Code: 20251129
DOI: 10.1016/j.otsr.2024.104080
PMID: 39643080
Databáze: MEDLINE
Popis
Abstrakt:Competing Interests: Declaration of competing interest None in relation to this work. Outside of this work: Julien Berhouet is consultant for Stryker.<br />Introduction: This study evaluates the performance of ChatGPT, particularly its versions 4 and 4o, in answering questions from the French orthopedic and trauma surgery exam (Diplôme d'Études Spécialisées, DES), compared to the results of French orthopedic surgery residents. Previous research has examined ChatGPT's capabilities across various medical specialties and exams, with mixed results, especially in the interpretation of complex radiological images.<br />Hypothesis: ChatGPT version 4o was capable of achieving a score equal to or higher (not lower) than that of residents for the DES exam.<br />Methods: The response capabilities of the ChatGPT model, versions 4 and 4o, were evaluated and compared to the results of residents for 250 questions taken from the DES exams from 2020 to 2024. A secondary analysis focused on the differences in the AI's performance based on the type of data being analyzed (text or images) and the topic of the questions.<br />Results: The score achieved by ChatGPT-4o was equivalent to that of residents over the past five years: 74.8% for ChatGPT-4o vs. 70.8% for residents (p = 0.32). The accuracy rate of ChatGPT was significantly higher in its latest version 4o compared to version 4 (58.8%, p = 0.0001). Secondary subgroup analysis revealed a performance deficiency of the AI in analyzing graphical images (success rates of 48% and 65% for ChatGPT-4 and 4o, respectively). ChatGPT-4o showed superior performance to version 4 when the topics involved the spine, pediatrics, and lower limb.<br />Conclusion: The performance of ChatGPT-4o is equivalent to that of French students in answering questions from the DES in orthopedic and trauma surgery. Significant progress has been observed between versions 4 and 4o. The analysis of questions involving iconography remains a notable challenge for the current versions of ChatGPT, with a tendency for the AI to perform less effectively compared to questions requiring only text analysis.<br />Level of Evidence: IV; Retrospective Observational Study.<br /> (Copyright © 2024 The Authors. Published by Elsevier Masson SAS.. All rights reserved.)
ISSN:1877-0568
DOI:10.1016/j.otsr.2024.104080