Image Recognition Performance of GPT-4V(ision) and GPT-4o in Ophthalmology: Use of Images in Clinical Questions.

Uloženo v:
Podrobná bibliografie
Název: Image Recognition Performance of GPT-4V(ision) and GPT-4o in Ophthalmology: Use of Images in Clinical Questions.
Autoři: Tomita, Kosei, Nishida, Takashi, Kitaguchi, Yoshiyuki, Kitazawa, Koji, Miyake, Masahiro
Zdroj: Clinical Ophthalmology; May2025, Vol. 19, p1557-1564, 8p
Témata: GENERATIVE pre-trained transformers, LANGUAGE models, CHATGPT, IMAGE recognition (Computer vision), DIAGNOSTIC imaging
Abstrakt: Purpose: To compare the diagnostic accuracy of Generative Pre-trained Transformer with Vision (GPT)-4, GPT-4 with Vision (GPT-4V), and GPT-4o for clinical questions in ophthalmology. Patients and Methods: The questions were collected from the "Diagnosis This" section on the American Academy of Ophthalmology website. We tested 580 questions and presented ChatGPT with the same questions under two conditions: 1) multimodal model, incorporating both the question text and associated images, and 2) text-only model. We then compared the difference in accuracy using McNemar tests among multimodal (GPT-4o and GPT-4V) and text-only (GPT-4V) models. The percentage of general correct answers was also collected from the website. Results: Multimodal GPT-4o performed the best accuracy (77.1%), followed by multimodal GPT-4V (71.0%), and then text-only GPT-4V (68.7%); (P values < 0.001, 0.012, and 0.001, respectively). All GPT-4 models showed higher accuracy than the general correct answers on the website (64.6%). Conclusion: The addition of information from images enhances the performance of GPT-4V in diagnosing clinical questions in ophthalmology. This suggests that integrating multimodal data could be crucial in developing more effective and reliable diagnostic tools in medical fields. [ABSTRACT FROM AUTHOR]
Copyright of Clinical Ophthalmology is the property of Dove Medical Press Ltd and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáze: Biomedical Index
Popis
Abstrakt:Purpose: To compare the diagnostic accuracy of Generative Pre-trained Transformer with Vision (GPT)-4, GPT-4 with Vision (GPT-4V), and GPT-4o for clinical questions in ophthalmology. Patients and Methods: The questions were collected from the "Diagnosis This" section on the American Academy of Ophthalmology website. We tested 580 questions and presented ChatGPT with the same questions under two conditions: 1) multimodal model, incorporating both the question text and associated images, and 2) text-only model. We then compared the difference in accuracy using McNemar tests among multimodal (GPT-4o and GPT-4V) and text-only (GPT-4V) models. The percentage of general correct answers was also collected from the website. Results: Multimodal GPT-4o performed the best accuracy (77.1%), followed by multimodal GPT-4V (71.0%), and then text-only GPT-4V (68.7%); (P values < 0.001, 0.012, and 0.001, respectively). All GPT-4 models showed higher accuracy than the general correct answers on the website (64.6%). Conclusion: The addition of information from images enhances the performance of GPT-4V in diagnosing clinical questions in ophthalmology. This suggests that integrating multimodal data could be crucial in developing more effective and reliable diagnostic tools in medical fields. [ABSTRACT FROM AUTHOR]
ISSN:11775467
DOI:10.2147/OPTH.S494480