ChatGPT-3.5 and ChatGPT-4 dermatological knowledge level based on the Specialty Certificate Examination in Dermatology

Abstract Background The global use of artificial intelligence (AI) has the potential to revolutionize the healthcare industry. Despite the fact that AI is becoming more popular, there is still a lack of evidence on its use in dermatology. Objectives To determine the capacity of ChatGPT-3.5 and ChatG...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Clinical and experimental dermatology Jg. 49; H. 7; S. 686 - 691
Hauptverfasser: Lewandowski, Miłosz, Łukowicz, Paweł, Świetlik, Dariusz, Barańska-Rybak, Wioletta
Format: Journal Article
Sprache:Englisch
Veröffentlicht: UK Oxford University Press 25.06.2024
Schlagworte:
ISSN:0307-6938, 1365-2230, 1365-2230
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstract Background The global use of artificial intelligence (AI) has the potential to revolutionize the healthcare industry. Despite the fact that AI is becoming more popular, there is still a lack of evidence on its use in dermatology. Objectives To determine the capacity of ChatGPT-3.5 and ChatGPT-4 to support dermatology knowledge and clinical decision-making in medical practice. Methods Three Specialty Certificate Examination in Dermatology tests, in English and Polish, consisting of 120 single-best-answer, multiple-choice questions each, were used to assess the performance of ChatGPT-3.5 and ChatGPT-4. Results ChatGPT-4 exceeded the 60% pass rate in every performed test, with a minimum of 80% and 70% correct answers for the English and Polish versions, respectively. ChatGPT-4 performed significantly better on each exam (P < 0.01), regardless of language, compared with ChatGPT-3.5. Furthermore, ChatGPT-4 answered clinical picture-type questions with an average accuracy of 93.0% and 84.2% for questions in English and Polish, respectively. The difference between the tests in Polish and English were not significant; however, ChatGPT-3.5 and ChatGPT-4 performed better overall in English than in Polish by an average of 8 percentage points for each test. Incorrect ChatGPT answers were highly correlated with a lower difficulty index, denoting questions of higher difficulty in most of the tests (P < 0.05). Conclusions The dermatology knowledge level of ChatGPT was high, and ChatGPT-4 performed significantly better than ChatGPT-3.5. Although the use of ChatGPT will not replace a doctor’s final decision, physicians should support the development of AI in dermatology to raise the standards of medical care. The global use of artificial intelligence, including deep learning-based language models, in health care has the potential to revolutionize the industry. This study aimed to determine the capacity of ChatGPT-3.5 and ChatGPT-4 to support dermatology knowledge and clinical decision-making in medical practice. The accuracy of ChatGPT in answering questions from the Specialty Certificate Examination in Dermatology tests was high, with ChatGPT-4 performing significantly better than ChatGPT-3.5. Although ChatGPT demonstrated a high level of dermatology knowledge, it still requires a human final decision.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0307-6938
1365-2230
1365-2230
DOI:10.1093/ced/llad255