Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions

BackgroundChat Generative Pre-trained Transformer (ChatGPT), a large language model by OpenAI, and Bard, Google’s artificial intelligence (AI) chatbot, have been evaluated in various contexts. This study aims to assess these models’ proficiency in the part 1 Fellowship of the Royal College of Ophtha...

Full description

Saved in:
Bibliographic Details
Published in:British journal of ophthalmology Vol. 108; no. 10; pp. 1379 - 1383
Main Authors: Fowler, Thomas, Pullen, Simon, Birkett, Liam
Format: Journal Article
Language:English
Published: BMA House, Tavistock Square, London, WC1H 9JR BMJ Publishing Group Ltd 01.10.2024
BMJ Publishing Group LTD
Subjects:
ISSN:0007-1161, 1468-2079, 1468-2079
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:BackgroundChat Generative Pre-trained Transformer (ChatGPT), a large language model by OpenAI, and Bard, Google’s artificial intelligence (AI) chatbot, have been evaluated in various contexts. This study aims to assess these models’ proficiency in the part 1 Fellowship of the Royal College of Ophthalmologists (FRCOphth) Multiple Choice Question (MCQ) examination, highlighting their potential in medical education.MethodsBoth models were tested on a sample question bank for the part 1 FRCOphth MCQ exam. Their performances were compared with historical human performance on the exam, focusing on the ability to comprehend, retain and apply information related to ophthalmology. We also tested it on the book ‘MCQs for FRCOpth part 1’, and assessed its performance across subjects.ResultsChatGPT demonstrated a strong performance, surpassing historical human pass marks and examination performance, while Bard underperformed. The comparison indicates the potential of certain AI models to match, and even exceed, human standards in such tasks.ConclusionThe results demonstrate the potential of AI models, such as ChatGPT, in processing and applying medical knowledge at a postgraduate level. However, performance varied among different models, highlighting the importance of appropriate AI selection. The study underlines the potential for AI applications in medical education and the necessity for further investigation into their strengths and limitations.
Bibliography:Clinical science
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0007-1161
1468-2079
1468-2079
DOI:10.1136/bjo-2023-324091