Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard

Large language models (LLMs) have revolutionized natural language processing with their ability to generate human-like text through extensive training on large data sets. These models, including Generative Pre-trained Transformers (GPT)-3.5 (OpenAI), GPT-4 (OpenAI), and Bard (Google LLC), find appli...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:JMIR medical education Ročník 10; s. e51523
Hlavní autoři: Farhat, Faiza, Chaudhry, Beenish Moalla, Nadeem, Mohammad, Sohail, Shahab Saquib, Madsen, Dag Øivind
Médium: Journal Article
Jazyk:angličtina
Vydáno: Canada JMIR Publications 21.02.2024
Témata:
ISSN:2369-3762, 2369-3762
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Large language models (LLMs) have revolutionized natural language processing with their ability to generate human-like text through extensive training on large data sets. These models, including Generative Pre-trained Transformers (GPT)-3.5 (OpenAI), GPT-4 (OpenAI), and Bard (Google LLC), find applications beyond natural language processing, attracting interest from academia and industry. Students are actively leveraging LLMs to enhance learning experiences and prepare for high-stakes exams, such as the National Eligibility cum Entrance Test (NEET) in India. This comparative analysis aims to evaluate the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. In this paper, we evaluated the performance of the 3 mainstream LLMs, namely GPT-3.5, GPT-4, and Google Bard, in answering questions related to the NEET-2023 exam. The questions of the NEET were provided to these artificial intelligence models, and the responses were recorded and compared against the correct answers from the official answer key. Consensus was used to evaluate the performance of all 3 models. It was evident that GPT-4 passed the entrance test with flying colors (300/700, 42.9%), showcasing exceptional performance. On the other hand, GPT-3.5 managed to meet the qualifying criteria, but with a substantially lower score (145/700, 20.7%). However, Bard (115/700, 16.4%) failed to meet the qualifying criteria and did not pass the test. GPT-4 demonstrated consistent superiority over Bard and GPT-3.5 in all 3 subjects. Specifically, GPT-4 achieved accuracy rates of 73% (29/40) in physics, 44% (16/36) in chemistry, and 51% (50/99) in biology. Conversely, GPT-3.5 attained an accuracy rate of 45% (18/40) in physics, 33% (13/26) in chemistry, and 34% (34/99) in biology. The accuracy consensus metric showed that the matching responses between GPT-4 and Bard, as well as GPT-4 and GPT-3.5, had higher incidences of being correct, at 0.56 and 0.57, respectively, compared to the matching responses between Bard and GPT-3.5, which stood at 0.42. When all 3 models were considered together, their matching responses reached the highest accuracy consensus of 0.59. The study's findings provide valuable insights into the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. GPT-4 emerged as the most accurate model, highlighting its potential for educational applications. Cross-checking responses across models may result in confusion as the compared models (as duos or a trio) tend to agree on only a little over half of the correct responses. Using GPT-4 as one of the compared models will result in higher accuracy consensus. The results underscore the suitability of LLMs for high-stakes exams and their positive impact on education. Additionally, the study establishes a benchmark for evaluating and enhancing LLMs' performance in educational tasks, promoting responsible and informed use of these models in diverse learning environments.
AbstractList BackgroundLarge language models (LLMs) have revolutionized natural language processing with their ability to generate human-like text through extensive training on large data sets. These models, including Generative Pre-trained Transformers (GPT)-3.5 (OpenAI), GPT-4 (OpenAI), and Bard (Google LLC), find applications beyond natural language processing, attracting interest from academia and industry. Students are actively leveraging LLMs to enhance learning experiences and prepare for high-stakes exams, such as the National Eligibility cum Entrance Test (NEET) in India. ObjectiveThis comparative analysis aims to evaluate the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. MethodsIn this paper, we evaluated the performance of the 3 mainstream LLMs, namely GPT-3.5, GPT-4, and Google Bard, in answering questions related to the NEET-2023 exam. The questions of the NEET were provided to these artificial intelligence models, and the responses were recorded and compared against the correct answers from the official answer key. Consensus was used to evaluate the performance of all 3 models. ResultsIt was evident that GPT-4 passed the entrance test with flying colors (300/700, 42.9%), showcasing exceptional performance. On the other hand, GPT-3.5 managed to meet the qualifying criteria, but with a substantially lower score (145/700, 20.7%). However, Bard (115/700, 16.4%) failed to meet the qualifying criteria and did not pass the test. GPT-4 demonstrated consistent superiority over Bard and GPT-3.5 in all 3 subjects. Specifically, GPT-4 achieved accuracy rates of 73% (29/40) in physics, 44% (16/36) in chemistry, and 51% (50/99) in biology. Conversely, GPT-3.5 attained an accuracy rate of 45% (18/40) in physics, 33% (13/26) in chemistry, and 34% (34/99) in biology. The accuracy consensus metric showed that the matching responses between GPT-4 and Bard, as well as GPT-4 and GPT-3.5, had higher incidences of being correct, at 0.56 and 0.57, respectively, compared to the matching responses between Bard and GPT-3.5, which stood at 0.42. When all 3 models were considered together, their matching responses reached the highest accuracy consensus of 0.59. ConclusionsThe study’s findings provide valuable insights into the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. GPT-4 emerged as the most accurate model, highlighting its potential for educational applications. Cross-checking responses across models may result in confusion as the compared models (as duos or a trio) tend to agree on only a little over half of the correct responses. Using GPT-4 as one of the compared models will result in higher accuracy consensus. The results underscore the suitability of LLMs for high-stakes exams and their positive impact on education. Additionally, the study establishes a benchmark for evaluating and enhancing LLMs’ performance in educational tasks, promoting responsible and informed use of these models in diverse learning environments.
Large language models (LLMs) have revolutionized natural language processing with their ability to generate human-like text through extensive training on large data sets. These models, including Generative Pre-trained Transformers (GPT)-3.5 (OpenAI), GPT-4 (OpenAI), and Bard (Google LLC), find applications beyond natural language processing, attracting interest from academia and industry. Students are actively leveraging LLMs to enhance learning experiences and prepare for high-stakes exams, such as the National Eligibility cum Entrance Test (NEET) in India. This comparative analysis aims to evaluate the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. In this paper, we evaluated the performance of the 3 mainstream LLMs, namely GPT-3.5, GPT-4, and Google Bard, in answering questions related to the NEET-2023 exam. The questions of the NEET were provided to these artificial intelligence models, and the responses were recorded and compared against the correct answers from the official answer key. Consensus was used to evaluate the performance of all 3 models. It was evident that GPT-4 passed the entrance test with flying colors (300/700, 42.9%), showcasing exceptional performance. On the other hand, GPT-3.5 managed to meet the qualifying criteria, but with a substantially lower score (145/700, 20.7%). However, Bard (115/700, 16.4%) failed to meet the qualifying criteria and did not pass the test. GPT-4 demonstrated consistent superiority over Bard and GPT-3.5 in all 3 subjects. Specifically, GPT-4 achieved accuracy rates of 73% (29/40) in physics, 44% (16/36) in chemistry, and 51% (50/99) in biology. Conversely, GPT-3.5 attained an accuracy rate of 45% (18/40) in physics, 33% (13/26) in chemistry, and 34% (34/99) in biology. The accuracy consensus metric showed that the matching responses between GPT-4 and Bard, as well as GPT-4 and GPT-3.5, had higher incidences of being correct, at 0.56 and 0.57, respectively, compared to the matching responses between Bard and GPT-3.5, which stood at 0.42. When all 3 models were considered together, their matching responses reached the highest accuracy consensus of 0.59. The study's findings provide valuable insights into the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. GPT-4 emerged as the most accurate model, highlighting its potential for educational applications. Cross-checking responses across models may result in confusion as the compared models (as duos or a trio) tend to agree on only a little over half of the correct responses. Using GPT-4 as one of the compared models will result in higher accuracy consensus. The results underscore the suitability of LLMs for high-stakes exams and their positive impact on education. Additionally, the study establishes a benchmark for evaluating and enhancing LLMs' performance in educational tasks, promoting responsible and informed use of these models in diverse learning environments.
Large language models (LLMs) have revolutionized natural language processing with their ability to generate human-like text through extensive training on large data sets. These models, including Generative Pre-trained Transformers (GPT)-3.5 (OpenAI), GPT-4 (OpenAI), and Bard (Google LLC), find applications beyond natural language processing, attracting interest from academia and industry. Students are actively leveraging LLMs to enhance learning experiences and prepare for high-stakes exams, such as the National Eligibility cum Entrance Test (NEET) in India.BACKGROUNDLarge language models (LLMs) have revolutionized natural language processing with their ability to generate human-like text through extensive training on large data sets. These models, including Generative Pre-trained Transformers (GPT)-3.5 (OpenAI), GPT-4 (OpenAI), and Bard (Google LLC), find applications beyond natural language processing, attracting interest from academia and industry. Students are actively leveraging LLMs to enhance learning experiences and prepare for high-stakes exams, such as the National Eligibility cum Entrance Test (NEET) in India.This comparative analysis aims to evaluate the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions.OBJECTIVEThis comparative analysis aims to evaluate the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions.In this paper, we evaluated the performance of the 3 mainstream LLMs, namely GPT-3.5, GPT-4, and Google Bard, in answering questions related to the NEET-2023 exam. The questions of the NEET were provided to these artificial intelligence models, and the responses were recorded and compared against the correct answers from the official answer key. Consensus was used to evaluate the performance of all 3 models.METHODSIn this paper, we evaluated the performance of the 3 mainstream LLMs, namely GPT-3.5, GPT-4, and Google Bard, in answering questions related to the NEET-2023 exam. The questions of the NEET were provided to these artificial intelligence models, and the responses were recorded and compared against the correct answers from the official answer key. Consensus was used to evaluate the performance of all 3 models.It was evident that GPT-4 passed the entrance test with flying colors (300/700, 42.9%), showcasing exceptional performance. On the other hand, GPT-3.5 managed to meet the qualifying criteria, but with a substantially lower score (145/700, 20.7%). However, Bard (115/700, 16.4%) failed to meet the qualifying criteria and did not pass the test. GPT-4 demonstrated consistent superiority over Bard and GPT-3.5 in all 3 subjects. Specifically, GPT-4 achieved accuracy rates of 73% (29/40) in physics, 44% (16/36) in chemistry, and 51% (50/99) in biology. Conversely, GPT-3.5 attained an accuracy rate of 45% (18/40) in physics, 33% (13/26) in chemistry, and 34% (34/99) in biology. The accuracy consensus metric showed that the matching responses between GPT-4 and Bard, as well as GPT-4 and GPT-3.5, had higher incidences of being correct, at 0.56 and 0.57, respectively, compared to the matching responses between Bard and GPT-3.5, which stood at 0.42. When all 3 models were considered together, their matching responses reached the highest accuracy consensus of 0.59.RESULTSIt was evident that GPT-4 passed the entrance test with flying colors (300/700, 42.9%), showcasing exceptional performance. On the other hand, GPT-3.5 managed to meet the qualifying criteria, but with a substantially lower score (145/700, 20.7%). However, Bard (115/700, 16.4%) failed to meet the qualifying criteria and did not pass the test. GPT-4 demonstrated consistent superiority over Bard and GPT-3.5 in all 3 subjects. Specifically, GPT-4 achieved accuracy rates of 73% (29/40) in physics, 44% (16/36) in chemistry, and 51% (50/99) in biology. Conversely, GPT-3.5 attained an accuracy rate of 45% (18/40) in physics, 33% (13/26) in chemistry, and 34% (34/99) in biology. The accuracy consensus metric showed that the matching responses between GPT-4 and Bard, as well as GPT-4 and GPT-3.5, had higher incidences of being correct, at 0.56 and 0.57, respectively, compared to the matching responses between Bard and GPT-3.5, which stood at 0.42. When all 3 models were considered together, their matching responses reached the highest accuracy consensus of 0.59.The study's findings provide valuable insights into the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. GPT-4 emerged as the most accurate model, highlighting its potential for educational applications. Cross-checking responses across models may result in confusion as the compared models (as duos or a trio) tend to agree on only a little over half of the correct responses. Using GPT-4 as one of the compared models will result in higher accuracy consensus. The results underscore the suitability of LLMs for high-stakes exams and their positive impact on education. Additionally, the study establishes a benchmark for evaluating and enhancing LLMs' performance in educational tasks, promoting responsible and informed use of these models in diverse learning environments.CONCLUSIONSThe study's findings provide valuable insights into the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. GPT-4 emerged as the most accurate model, highlighting its potential for educational applications. Cross-checking responses across models may result in confusion as the compared models (as duos or a trio) tend to agree on only a little over half of the correct responses. Using GPT-4 as one of the compared models will result in higher accuracy consensus. The results underscore the suitability of LLMs for high-stakes exams and their positive impact on education. Additionally, the study establishes a benchmark for evaluating and enhancing LLMs' performance in educational tasks, promoting responsible and informed use of these models in diverse learning environments.
Author Farhat, Faiza
Chaudhry, Beenish Moalla
Nadeem, Mohammad
Sohail, Shahab Saquib
Madsen, Dag Øivind
AuthorAffiliation 1 Department of Zoology Aligarh Muslim University Aligarh India
2 School of Computing and Informatics The University of Louisiana Lafayette, LA United States
3 Department of Computer Science Aligarh Muslim University Aligarh India
5 School of Business University of South-Eastern Norway Hønefoss Norway
4 School of Computing Science and Engineering VIT Bhopal University Sehore India
AuthorAffiliation_xml – name: 5 School of Business University of South-Eastern Norway Hønefoss Norway
– name: 3 Department of Computer Science Aligarh Muslim University Aligarh India
– name: 4 School of Computing Science and Engineering VIT Bhopal University Sehore India
– name: 1 Department of Zoology Aligarh Muslim University Aligarh India
– name: 2 School of Computing and Informatics The University of Louisiana Lafayette, LA United States
Author_xml – sequence: 1
  givenname: Faiza
  orcidid: 0000-0002-1310-1586
  surname: Farhat
  fullname: Farhat, Faiza
– sequence: 2
  givenname: Beenish Moalla
  orcidid: 0000-0002-0437-6924
  surname: Chaudhry
  fullname: Chaudhry, Beenish Moalla
– sequence: 3
  givenname: Mohammad
  orcidid: 0000-0003-3664-5014
  surname: Nadeem
  fullname: Nadeem, Mohammad
– sequence: 4
  givenname: Shahab Saquib
  orcidid: 0000-0002-5944-7371
  surname: Sohail
  fullname: Sohail, Shahab Saquib
– sequence: 5
  givenname: Dag Øivind
  orcidid: 0000-0001-8735-3332
  surname: Madsen
  fullname: Madsen, Dag Øivind
BackLink https://www.ncbi.nlm.nih.gov/pubmed/38381486$$D View this record in MEDLINE/PubMed
BookMark eNpdksFuEzEQhleoiJaSV0C-ICHRLWt7vWtzQSUKJVKAHsrZml2PU1cbO9ibiEp9-JqkRQ0Xe-T59P2SZ14XRz54LIoJrc4ZVc1HQQXjL4oTxhtV8rZhR8_q42KS0m1VVbStWSXUq-KYSy5pLZuT4n62hWEDo_NLsoC4xHz65QZy8T0YHBKxIZLxBsmPDAUPA7mKuELj-lzO_sCKOE_m3jj4RKZhtYaYuS2Si4zeJZdIsOTy6rrk5-JsV9RnBLwhXyCaN8VLC0PCyeN9Wvz6OruefisXPy_n04tF2de8GkthoQYrKbVohWG07lgvO9kabBroc8sgUIAWFTcGpWhb0zBQmVSMcSn4aTHfe02AW72ObgXxTgdwevcQ4lJDHF0_oLYMkXasgw5UrVBI3gBtAGzVQmVFk12f9671psvf0KMfIwwH0sOOdzd6GbaaVopKUVfZ8P7REMPvDaZRr1zqcRjAY9gkzRRTgsucnNG3z8P-pTzNLwMf9kAfQ0oRre7duBtUznZDDtV_F0TvFiTT7_6jn4SH3ANmJbfM
CitedBy_id crossref_primary_10_1111_jep_14011
crossref_primary_10_1111_eje_13073
crossref_primary_10_1016_j_rcot_2024_12_005
crossref_primary_10_1515_gme_2024_0021
crossref_primary_10_7759_cureus_59960
crossref_primary_10_1016_j_asoc_2025_113311
crossref_primary_10_7759_cureus_81618
crossref_primary_10_1001_jamanetworkopen_2025_6359
crossref_primary_10_1016_j_ijlcj_2024_100686
crossref_primary_10_1097_PEC_0000000000003271
crossref_primary_10_2196_73226
crossref_primary_10_1007_s00264_024_06182_9
crossref_primary_10_1016_j_pec_2024_108307
crossref_primary_10_1038_s41598_024_79335_w
crossref_primary_10_3389_fpubh_2024_1360597
crossref_primary_10_3390_bioengineering12060653
crossref_primary_10_2196_64486
crossref_primary_10_1007_s00405_024_09062_5
crossref_primary_10_2196_68070
crossref_primary_10_1016_j_knosys_2025_113914
crossref_primary_10_1016_j_pcad_2024_10_010
crossref_primary_10_1016_j_otsr_2024_104080
crossref_primary_10_3390_info15090543
crossref_primary_10_2196_51319
crossref_primary_10_12688_mep_20815_1
crossref_primary_10_1177_14604582251381260
crossref_primary_10_1080_23311983_2024_2353984
crossref_primary_10_1080_2331186X_2024_2332850
crossref_primary_10_7759_cureus_63048
crossref_primary_10_1002_ijgo_70251
Cites_doi 10.1080/23311916.2023.2222988
10.1007/s10439-023-03326-7
10.1007/s10439-023-03305-y
10.1186/s40561-023-00237-x
10.1371/journal.pdig.0000198
10.2196/48291
10.35542/osf.io/sytu3
10.2308/ISSUES-2023-013
10.1145/3571730
10.2139/ssrn.4335905
10.1007/s10439-023-03335-6
10.1108/ijchm-05-2023-0686
10.1016/j.resuscitation.2023.109783
10.1108/QAE-07-2018-0080
10.1227/neu.0000000000002551
10.1016/j.iotcps.2023.04.003
10.2196/45312
10.1371/journal.pdig.0000205
10.1080/10872981.2023.2220920
10.1007/s13347-023-00621-y
10.52866/20ijcsm.2023.01.01.0018
ContentType Journal Article
Copyright Faiza Farhat, Beenish Moalla Chaudhry, Mohammad Nadeem, Shahab Saquib Sohail, Dag Øivind Madsen. Originally published in JMIR Medical Education (https://mededu.jmir.org), 21.02.2024.
Faiza Farhat, Beenish Moalla Chaudhry, Mohammad Nadeem, Shahab Saquib Sohail, Dag Øivind Madsen. Originally published in JMIR Medical Education (https://mededu.jmir.org), 21.02.2024. 2024
Copyright_xml – notice: Faiza Farhat, Beenish Moalla Chaudhry, Mohammad Nadeem, Shahab Saquib Sohail, Dag Øivind Madsen. Originally published in JMIR Medical Education (https://mededu.jmir.org), 21.02.2024.
– notice: Faiza Farhat, Beenish Moalla Chaudhry, Mohammad Nadeem, Shahab Saquib Sohail, Dag Øivind Madsen. Originally published in JMIR Medical Education (https://mededu.jmir.org), 21.02.2024. 2024
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7X8
5PM
DOA
DOI 10.2196/51523
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
PubMed Central (Full Participant titles)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList
MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals (ODIN)
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
EISSN 2369-3762
ExternalDocumentID oai_doaj_org_article_f2ee1b2baba949e5836a16aaf07a0f56
PMC10918540
38381486
10_2196_51523
Genre Journal Article
GeographicLocations India
GeographicLocations_xml – name: India
GroupedDBID 7X7
8FI
8FJ
AAFWJ
AAHSB
AAYXX
ABUWG
ADBBV
AFFHD
AFKRA
AFPKN
ALMA_UNASSIGNED_HOLDINGS
AOIJS
BCNDV
BENPR
CCPQU
CITATION
FYUFA
GROUPED_DOAJ
HMCUK
HYE
KQ8
M48
M~E
OK1
PGMZT
PHGZM
PHGZT
PIMPY
RPM
UKHRP
ALIPV
CGR
CUY
CVF
ECM
EIF
NPM
7X8
PUEGO
5PM
ID FETCH-LOGICAL-c430t-5fa4af811fef5d214b2c8b87de66ac4afdea1aa7e93dde8577d62a9d219223853
IEDL.DBID DOA
ISICitedReferencesCount 32
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001178222600002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2369-3762
IngestDate Tue Oct 14 19:06:13 EDT 2025
Tue Nov 04 02:05:49 EST 2025
Wed Oct 01 13:47:31 EDT 2025
Mon Jul 21 06:05:19 EDT 2025
Sat Nov 29 03:45:58 EST 2025
Tue Nov 18 22:14:50 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords natural language processing
suitability
Generative Pre-trained Transformers
accuracy
premedical exams
Bard
artificial intelligence
medical education, medical exam
AI model
performance
ChatGPT
large language models
educational task
GPT-4
Language English
License Faiza Farhat, Beenish Moalla Chaudhry, Mohammad Nadeem, Shahab Saquib Sohail, Dag Øivind Madsen. Originally published in JMIR Medical Education (https://mededu.jmir.org), 21.02.2024.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Education, is properly cited. The complete bibliographic information, a link to the original publication on https://mededu.jmir.org/, as well as this copyright and license information must be included.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c430t-5fa4af811fef5d214b2c8b87de66ac4afdea1aa7e93dde8577d62a9d219223853
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0001-8735-3332
0000-0002-0437-6924
0000-0003-3664-5014
0000-0002-1310-1586
0000-0002-5944-7371
OpenAccessLink https://doaj.org/article/f2ee1b2baba949e5836a16aaf07a0f56
PMID 38381486
PQID 2929538836
PQPubID 23479
ParticipantIDs doaj_primary_oai_doaj_org_article_f2ee1b2baba949e5836a16aaf07a0f56
pubmedcentral_primary_oai_pubmedcentral_nih_gov_10918540
proquest_miscellaneous_2929538836
pubmed_primary_38381486
crossref_citationtrail_10_2196_51523
crossref_primary_10_2196_51523
PublicationCentury 2000
PublicationDate 20240221
PublicationDateYYYYMMDD 2024-02-21
PublicationDate_xml – month: 2
  year: 2024
  text: 20240221
  day: 21
PublicationDecade 2020
PublicationPlace Canada
PublicationPlace_xml – name: Canada
– name: Toronto, Canada
PublicationTitle JMIR medical education
PublicationTitleAlternate JMIR Med Educ
PublicationYear 2024
Publisher JMIR Publications
Publisher_xml – name: JMIR Publications
References ref13
ref35
ref12
ref34
ref37
ref14
ref31
ref30
ref11
ref33
Hong, WCH (ref21) 2023; 5
Liu, X (ref36)
ref2
ref1
ref39
ref38
ref18
Wu, S (ref16)
Zhao, WX (ref10)
ref24
ref23
Teebagy, S (ref32)
ref26
ref25
Touvron, H (ref15)
Thoppilan, R (ref19)
ref20
Chowdhery, A (ref17) 2023; 24
ref22
ref28
ref27
ref29
ref8
ref7
ref4
ref3
ref6
ref5
Cao, Y (ref9)
References_xml – ident: ref10
  publication-title: ArXiv. Preprint posted online on November 24, 2023
– ident: ref1
  doi: 10.1080/23311916.2023.2222988
– ident: ref38
  doi: 10.1007/s10439-023-03326-7
– ident: ref3
  doi: 10.1007/s10439-023-03305-y
– volume: 24
  start-page: 1
  issue: 240
  year: 2023
  ident: ref17
  publication-title: J Mach Learn Res
– ident: ref7
– ident: ref19
  publication-title: ArXiv. Preprint posted online on February 10, 2022
– ident: ref20
  doi: 10.1186/s40561-023-00237-x
– ident: ref29
– ident: ref31
  doi: 10.1371/journal.pdig.0000198
– ident: ref25
– ident: ref15
  publication-title: ArXiv. Preprint posted online on February 27, 2023
– ident: ref22
  doi: 10.2196/48291
– ident: ref24
  doi: 10.35542/osf.io/sytu3
– ident: ref9
  publication-title: ArXiv. Preprint posted online on March 7, 2023
– ident: ref26
  doi: 10.2308/ISSUES-2023-013
– ident: ref23
  doi: 10.1145/3571730
– ident: ref27
  doi: 10.2139/ssrn.4335905
– ident: ref37
  doi: 10.1007/s10439-023-03335-6
– ident: ref2
  doi: 10.1108/ijchm-05-2023-0686
– ident: ref13
– ident: ref4
– ident: ref6
– ident: ref28
  doi: 10.1016/j.resuscitation.2023.109783
– ident: ref35
  doi: 10.1108/QAE-07-2018-0080
– ident: ref33
  doi: 10.1227/neu.0000000000002551
– ident: ref12
  doi: 10.1016/j.iotcps.2023.04.003
– ident: ref30
  doi: 10.2196/45312
– ident: ref36
  publication-title: medRxiv. Preprint posted online on April 18, 2023
– volume: 5
  issue: 1
  year: 2023
  ident: ref21
  publication-title: J Educ Technol Inov
– ident: ref16
  publication-title: ArXiv. Preprint posted online on May 9, 2023
– ident: ref39
  doi: 10.1371/journal.pdig.0000205
– ident: ref34
  doi: 10.1080/10872981.2023.2220920
– ident: ref8
– ident: ref18
– ident: ref5
  doi: 10.1007/s13347-023-00621-y
– ident: ref11
  doi: 10.52866/20ijcsm.2023.01.01.0018
– ident: ref32
  publication-title: medRxiv. Preprint posted online on April 03, 2023
– ident: ref14
SSID ssj0001742059
Score 2.418482
Snippet Large language models (LLMs) have revolutionized natural language processing with their ability to generate human-like text through extensive training on large...
BackgroundLarge language models (LLMs) have revolutionized natural language processing with their ability to generate human-like text through extensive...
SourceID doaj
pubmedcentral
proquest
pubmed
crossref
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
Enrichment Source
StartPage e51523
SubjectTerms Artificial Intelligence
Benchmarking
Confusion
Educational Status
Humans
India
Original Paper
Title Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard
URI https://www.ncbi.nlm.nih.gov/pubmed/38381486
https://www.proquest.com/docview/2929538836
https://pubmed.ncbi.nlm.nih.gov/PMC10918540
https://doaj.org/article/f2ee1b2baba949e5836a16aaf07a0f56
Volume 10
WOSCitedRecordID wos001178222600002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals (ODIN)
  customDbUrl:
  eissn: 2369-3762
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001742059
  issn: 2369-3762
  databaseCode: DOA
  dateStart: 20150101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2369-3762
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001742059
  issn: 2369-3762
  databaseCode: M~E
  dateStart: 20150101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVPQU
  databaseName: Health & Medical Collection
  customDbUrl:
  eissn: 2369-3762
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001742059
  issn: 2369-3762
  databaseCode: 7X7
  dateStart: 20150101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/healthcomplete
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl:
  eissn: 2369-3762
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001742059
  issn: 2369-3762
  databaseCode: BENPR
  dateStart: 20150101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Publicly Available Content Database
  customDbUrl:
  eissn: 2369-3762
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001742059
  issn: 2369-3762
  databaseCode: PIMPY
  dateStart: 20150101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/publiccontent
  providerName: ProQuest
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3db9MwED_BQGgSQiA-Vj4qI-1xYbETf4Q3OnUwaVQRGqg8RXZsQ6XiorVDPPDHc3bS0k5IvPBiJTlHsXwX-3f2-XcAh8J6ljtXZJ56m5Xc-MgBKbPWC2ON0oVLZ2E-ncvJRE2nVb2V6ivGhHX0wF3HHXvmHDXMaKOrsnJcFUJTobXPpc49T2Tbuay2nKm0uoIeHwKHO3A3xjqjlR3jvM2KnckncfT_DVhej4_cmnBO78O9HimSN10LH8ANFx7Cr3HPzh2-kPMYxI1lt-BIYlaz-ZIgCCUI6khPeD0ndVwBTLsxZPxTfyOzQM4CWsVrcvKH-ZusyUnIwpO39UVWvOJH6aI8IjpYMkJLegQfT8cXJ--yPoFC1pZFvsq416X2ilLvPLeMloa1yihpnRC6RZF1mmotXVXgKKe4lFYwXWHNClEDTuSPYS8sgjsAkuM4hK4g90y0paDWcOdti2jFUYU-bjmAw3XPNm3PLh6TXMwb9DKiApqkgAEMN9W-d3Qa1yuMolo2wsh-nR6gTTS9TTT_sokBvFwrtcG_JW6B6OAWV8uGIRrEIR5fGMCTTsmbT6GvrtA5RInaUf9OW3YlYfY1MXJHdlWF2Pfp_2j9M9hniJzSuXn6HPZWl1fuBdxuf6xmy8sh3JRTmUo1hFuj8aT-MEy2j3f12fv6828MQgqC
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Evaluating+Large+Language+Models+for+the+National+Premedical+Exam+in+India%3A+Comparative+Analysis+of+GPT-3.5%2C+GPT-4%2C+and+Bard&rft.jtitle=JMIR+medical+education&rft.au=Farhat%2C+Faiza&rft.au=Chaudhry%2C+Beenish+Moalla&rft.au=Nadeem%2C+Mohammad&rft.au=Sohail%2C+Shahab+Saquib&rft.date=2024-02-21&rft.eissn=2369-3762&rft.volume=10&rft.spage=e51523&rft_id=info:doi/10.2196%2F51523&rft_id=info%3Apmid%2F38381486&rft.externalDocID=38381486
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2369-3762&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2369-3762&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2369-3762&client=summon