Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard
Large language models (LLMs) have revolutionized natural language processing with their ability to generate human-like text through extensive training on large data sets. These models, including Generative Pre-trained Transformers (GPT)-3.5 (OpenAI), GPT-4 (OpenAI), and Bard (Google LLC), find appli...
Uloženo v:
| Vydáno v: | JMIR medical education Ročník 10; s. e51523 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Canada
JMIR Publications
21.02.2024
|
| Témata: | |
| ISSN: | 2369-3762, 2369-3762 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Large language models (LLMs) have revolutionized natural language processing with their ability to generate human-like text through extensive training on large data sets. These models, including Generative Pre-trained Transformers (GPT)-3.5 (OpenAI), GPT-4 (OpenAI), and Bard (Google LLC), find applications beyond natural language processing, attracting interest from academia and industry. Students are actively leveraging LLMs to enhance learning experiences and prepare for high-stakes exams, such as the National Eligibility cum Entrance Test (NEET) in India.
This comparative analysis aims to evaluate the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions.
In this paper, we evaluated the performance of the 3 mainstream LLMs, namely GPT-3.5, GPT-4, and Google Bard, in answering questions related to the NEET-2023 exam. The questions of the NEET were provided to these artificial intelligence models, and the responses were recorded and compared against the correct answers from the official answer key. Consensus was used to evaluate the performance of all 3 models.
It was evident that GPT-4 passed the entrance test with flying colors (300/700, 42.9%), showcasing exceptional performance. On the other hand, GPT-3.5 managed to meet the qualifying criteria, but with a substantially lower score (145/700, 20.7%). However, Bard (115/700, 16.4%) failed to meet the qualifying criteria and did not pass the test. GPT-4 demonstrated consistent superiority over Bard and GPT-3.5 in all 3 subjects. Specifically, GPT-4 achieved accuracy rates of 73% (29/40) in physics, 44% (16/36) in chemistry, and 51% (50/99) in biology. Conversely, GPT-3.5 attained an accuracy rate of 45% (18/40) in physics, 33% (13/26) in chemistry, and 34% (34/99) in biology. The accuracy consensus metric showed that the matching responses between GPT-4 and Bard, as well as GPT-4 and GPT-3.5, had higher incidences of being correct, at 0.56 and 0.57, respectively, compared to the matching responses between Bard and GPT-3.5, which stood at 0.42. When all 3 models were considered together, their matching responses reached the highest accuracy consensus of 0.59.
The study's findings provide valuable insights into the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. GPT-4 emerged as the most accurate model, highlighting its potential for educational applications. Cross-checking responses across models may result in confusion as the compared models (as duos or a trio) tend to agree on only a little over half of the correct responses. Using GPT-4 as one of the compared models will result in higher accuracy consensus. The results underscore the suitability of LLMs for high-stakes exams and their positive impact on education. Additionally, the study establishes a benchmark for evaluating and enhancing LLMs' performance in educational tasks, promoting responsible and informed use of these models in diverse learning environments. |
|---|---|
| AbstractList | BackgroundLarge language models (LLMs) have revolutionized natural language processing with their ability to generate human-like text through extensive training on large data sets. These models, including Generative Pre-trained Transformers (GPT)-3.5 (OpenAI), GPT-4 (OpenAI), and Bard (Google LLC), find applications beyond natural language processing, attracting interest from academia and industry. Students are actively leveraging LLMs to enhance learning experiences and prepare for high-stakes exams, such as the National Eligibility cum Entrance Test (NEET) in India. ObjectiveThis comparative analysis aims to evaluate the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. MethodsIn this paper, we evaluated the performance of the 3 mainstream LLMs, namely GPT-3.5, GPT-4, and Google Bard, in answering questions related to the NEET-2023 exam. The questions of the NEET were provided to these artificial intelligence models, and the responses were recorded and compared against the correct answers from the official answer key. Consensus was used to evaluate the performance of all 3 models. ResultsIt was evident that GPT-4 passed the entrance test with flying colors (300/700, 42.9%), showcasing exceptional performance. On the other hand, GPT-3.5 managed to meet the qualifying criteria, but with a substantially lower score (145/700, 20.7%). However, Bard (115/700, 16.4%) failed to meet the qualifying criteria and did not pass the test. GPT-4 demonstrated consistent superiority over Bard and GPT-3.5 in all 3 subjects. Specifically, GPT-4 achieved accuracy rates of 73% (29/40) in physics, 44% (16/36) in chemistry, and 51% (50/99) in biology. Conversely, GPT-3.5 attained an accuracy rate of 45% (18/40) in physics, 33% (13/26) in chemistry, and 34% (34/99) in biology. The accuracy consensus metric showed that the matching responses between GPT-4 and Bard, as well as GPT-4 and GPT-3.5, had higher incidences of being correct, at 0.56 and 0.57, respectively, compared to the matching responses between Bard and GPT-3.5, which stood at 0.42. When all 3 models were considered together, their matching responses reached the highest accuracy consensus of 0.59. ConclusionsThe study’s findings provide valuable insights into the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. GPT-4 emerged as the most accurate model, highlighting its potential for educational applications. Cross-checking responses across models may result in confusion as the compared models (as duos or a trio) tend to agree on only a little over half of the correct responses. Using GPT-4 as one of the compared models will result in higher accuracy consensus. The results underscore the suitability of LLMs for high-stakes exams and their positive impact on education. Additionally, the study establishes a benchmark for evaluating and enhancing LLMs’ performance in educational tasks, promoting responsible and informed use of these models in diverse learning environments. Large language models (LLMs) have revolutionized natural language processing with their ability to generate human-like text through extensive training on large data sets. These models, including Generative Pre-trained Transformers (GPT)-3.5 (OpenAI), GPT-4 (OpenAI), and Bard (Google LLC), find applications beyond natural language processing, attracting interest from academia and industry. Students are actively leveraging LLMs to enhance learning experiences and prepare for high-stakes exams, such as the National Eligibility cum Entrance Test (NEET) in India. This comparative analysis aims to evaluate the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. In this paper, we evaluated the performance of the 3 mainstream LLMs, namely GPT-3.5, GPT-4, and Google Bard, in answering questions related to the NEET-2023 exam. The questions of the NEET were provided to these artificial intelligence models, and the responses were recorded and compared against the correct answers from the official answer key. Consensus was used to evaluate the performance of all 3 models. It was evident that GPT-4 passed the entrance test with flying colors (300/700, 42.9%), showcasing exceptional performance. On the other hand, GPT-3.5 managed to meet the qualifying criteria, but with a substantially lower score (145/700, 20.7%). However, Bard (115/700, 16.4%) failed to meet the qualifying criteria and did not pass the test. GPT-4 demonstrated consistent superiority over Bard and GPT-3.5 in all 3 subjects. Specifically, GPT-4 achieved accuracy rates of 73% (29/40) in physics, 44% (16/36) in chemistry, and 51% (50/99) in biology. Conversely, GPT-3.5 attained an accuracy rate of 45% (18/40) in physics, 33% (13/26) in chemistry, and 34% (34/99) in biology. The accuracy consensus metric showed that the matching responses between GPT-4 and Bard, as well as GPT-4 and GPT-3.5, had higher incidences of being correct, at 0.56 and 0.57, respectively, compared to the matching responses between Bard and GPT-3.5, which stood at 0.42. When all 3 models were considered together, their matching responses reached the highest accuracy consensus of 0.59. The study's findings provide valuable insights into the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. GPT-4 emerged as the most accurate model, highlighting its potential for educational applications. Cross-checking responses across models may result in confusion as the compared models (as duos or a trio) tend to agree on only a little over half of the correct responses. Using GPT-4 as one of the compared models will result in higher accuracy consensus. The results underscore the suitability of LLMs for high-stakes exams and their positive impact on education. Additionally, the study establishes a benchmark for evaluating and enhancing LLMs' performance in educational tasks, promoting responsible and informed use of these models in diverse learning environments. Large language models (LLMs) have revolutionized natural language processing with their ability to generate human-like text through extensive training on large data sets. These models, including Generative Pre-trained Transformers (GPT)-3.5 (OpenAI), GPT-4 (OpenAI), and Bard (Google LLC), find applications beyond natural language processing, attracting interest from academia and industry. Students are actively leveraging LLMs to enhance learning experiences and prepare for high-stakes exams, such as the National Eligibility cum Entrance Test (NEET) in India.BACKGROUNDLarge language models (LLMs) have revolutionized natural language processing with their ability to generate human-like text through extensive training on large data sets. These models, including Generative Pre-trained Transformers (GPT)-3.5 (OpenAI), GPT-4 (OpenAI), and Bard (Google LLC), find applications beyond natural language processing, attracting interest from academia and industry. Students are actively leveraging LLMs to enhance learning experiences and prepare for high-stakes exams, such as the National Eligibility cum Entrance Test (NEET) in India.This comparative analysis aims to evaluate the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions.OBJECTIVEThis comparative analysis aims to evaluate the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions.In this paper, we evaluated the performance of the 3 mainstream LLMs, namely GPT-3.5, GPT-4, and Google Bard, in answering questions related to the NEET-2023 exam. The questions of the NEET were provided to these artificial intelligence models, and the responses were recorded and compared against the correct answers from the official answer key. Consensus was used to evaluate the performance of all 3 models.METHODSIn this paper, we evaluated the performance of the 3 mainstream LLMs, namely GPT-3.5, GPT-4, and Google Bard, in answering questions related to the NEET-2023 exam. The questions of the NEET were provided to these artificial intelligence models, and the responses were recorded and compared against the correct answers from the official answer key. Consensus was used to evaluate the performance of all 3 models.It was evident that GPT-4 passed the entrance test with flying colors (300/700, 42.9%), showcasing exceptional performance. On the other hand, GPT-3.5 managed to meet the qualifying criteria, but with a substantially lower score (145/700, 20.7%). However, Bard (115/700, 16.4%) failed to meet the qualifying criteria and did not pass the test. GPT-4 demonstrated consistent superiority over Bard and GPT-3.5 in all 3 subjects. Specifically, GPT-4 achieved accuracy rates of 73% (29/40) in physics, 44% (16/36) in chemistry, and 51% (50/99) in biology. Conversely, GPT-3.5 attained an accuracy rate of 45% (18/40) in physics, 33% (13/26) in chemistry, and 34% (34/99) in biology. The accuracy consensus metric showed that the matching responses between GPT-4 and Bard, as well as GPT-4 and GPT-3.5, had higher incidences of being correct, at 0.56 and 0.57, respectively, compared to the matching responses between Bard and GPT-3.5, which stood at 0.42. When all 3 models were considered together, their matching responses reached the highest accuracy consensus of 0.59.RESULTSIt was evident that GPT-4 passed the entrance test with flying colors (300/700, 42.9%), showcasing exceptional performance. On the other hand, GPT-3.5 managed to meet the qualifying criteria, but with a substantially lower score (145/700, 20.7%). However, Bard (115/700, 16.4%) failed to meet the qualifying criteria and did not pass the test. GPT-4 demonstrated consistent superiority over Bard and GPT-3.5 in all 3 subjects. Specifically, GPT-4 achieved accuracy rates of 73% (29/40) in physics, 44% (16/36) in chemistry, and 51% (50/99) in biology. Conversely, GPT-3.5 attained an accuracy rate of 45% (18/40) in physics, 33% (13/26) in chemistry, and 34% (34/99) in biology. The accuracy consensus metric showed that the matching responses between GPT-4 and Bard, as well as GPT-4 and GPT-3.5, had higher incidences of being correct, at 0.56 and 0.57, respectively, compared to the matching responses between Bard and GPT-3.5, which stood at 0.42. When all 3 models were considered together, their matching responses reached the highest accuracy consensus of 0.59.The study's findings provide valuable insights into the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. GPT-4 emerged as the most accurate model, highlighting its potential for educational applications. Cross-checking responses across models may result in confusion as the compared models (as duos or a trio) tend to agree on only a little over half of the correct responses. Using GPT-4 as one of the compared models will result in higher accuracy consensus. The results underscore the suitability of LLMs for high-stakes exams and their positive impact on education. Additionally, the study establishes a benchmark for evaluating and enhancing LLMs' performance in educational tasks, promoting responsible and informed use of these models in diverse learning environments.CONCLUSIONSThe study's findings provide valuable insights into the performance of GPT-3.5, GPT-4, and Bard in answering NEET-2023 questions. GPT-4 emerged as the most accurate model, highlighting its potential for educational applications. Cross-checking responses across models may result in confusion as the compared models (as duos or a trio) tend to agree on only a little over half of the correct responses. Using GPT-4 as one of the compared models will result in higher accuracy consensus. The results underscore the suitability of LLMs for high-stakes exams and their positive impact on education. Additionally, the study establishes a benchmark for evaluating and enhancing LLMs' performance in educational tasks, promoting responsible and informed use of these models in diverse learning environments. |
| Author | Farhat, Faiza Chaudhry, Beenish Moalla Nadeem, Mohammad Sohail, Shahab Saquib Madsen, Dag Øivind |
| AuthorAffiliation | 1 Department of Zoology Aligarh Muslim University Aligarh India 2 School of Computing and Informatics The University of Louisiana Lafayette, LA United States 3 Department of Computer Science Aligarh Muslim University Aligarh India 5 School of Business University of South-Eastern Norway Hønefoss Norway 4 School of Computing Science and Engineering VIT Bhopal University Sehore India |
| AuthorAffiliation_xml | – name: 5 School of Business University of South-Eastern Norway Hønefoss Norway – name: 3 Department of Computer Science Aligarh Muslim University Aligarh India – name: 4 School of Computing Science and Engineering VIT Bhopal University Sehore India – name: 1 Department of Zoology Aligarh Muslim University Aligarh India – name: 2 School of Computing and Informatics The University of Louisiana Lafayette, LA United States |
| Author_xml | – sequence: 1 givenname: Faiza orcidid: 0000-0002-1310-1586 surname: Farhat fullname: Farhat, Faiza – sequence: 2 givenname: Beenish Moalla orcidid: 0000-0002-0437-6924 surname: Chaudhry fullname: Chaudhry, Beenish Moalla – sequence: 3 givenname: Mohammad orcidid: 0000-0003-3664-5014 surname: Nadeem fullname: Nadeem, Mohammad – sequence: 4 givenname: Shahab Saquib orcidid: 0000-0002-5944-7371 surname: Sohail fullname: Sohail, Shahab Saquib – sequence: 5 givenname: Dag Øivind orcidid: 0000-0001-8735-3332 surname: Madsen fullname: Madsen, Dag Øivind |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/38381486$$D View this record in MEDLINE/PubMed |
| BookMark | eNpdksFuEzEQhleoiJaSV0C-ICHRLWt7vWtzQSUKJVKAHsrZml2PU1cbO9ibiEp9-JqkRQ0Xe-T59P2SZ14XRz54LIoJrc4ZVc1HQQXjL4oTxhtV8rZhR8_q42KS0m1VVbStWSXUq-KYSy5pLZuT4n62hWEDo_NLsoC4xHz65QZy8T0YHBKxIZLxBsmPDAUPA7mKuELj-lzO_sCKOE_m3jj4RKZhtYaYuS2Si4zeJZdIsOTy6rrk5-JsV9RnBLwhXyCaN8VLC0PCyeN9Wvz6OruefisXPy_n04tF2de8GkthoQYrKbVohWG07lgvO9kabBroc8sgUIAWFTcGpWhb0zBQmVSMcSn4aTHfe02AW72ObgXxTgdwevcQ4lJDHF0_oLYMkXasgw5UrVBI3gBtAGzVQmVFk12f9671psvf0KMfIwwH0sOOdzd6GbaaVopKUVfZ8P7REMPvDaZRr1zqcRjAY9gkzRRTgsucnNG3z8P-pTzNLwMf9kAfQ0oRre7duBtUznZDDtV_F0TvFiTT7_6jn4SH3ANmJbfM |
| CitedBy_id | crossref_primary_10_1111_jep_14011 crossref_primary_10_1111_eje_13073 crossref_primary_10_1016_j_rcot_2024_12_005 crossref_primary_10_1515_gme_2024_0021 crossref_primary_10_7759_cureus_59960 crossref_primary_10_1016_j_asoc_2025_113311 crossref_primary_10_7759_cureus_81618 crossref_primary_10_1001_jamanetworkopen_2025_6359 crossref_primary_10_1016_j_ijlcj_2024_100686 crossref_primary_10_1097_PEC_0000000000003271 crossref_primary_10_2196_73226 crossref_primary_10_1007_s00264_024_06182_9 crossref_primary_10_1016_j_pec_2024_108307 crossref_primary_10_1038_s41598_024_79335_w crossref_primary_10_3389_fpubh_2024_1360597 crossref_primary_10_3390_bioengineering12060653 crossref_primary_10_2196_64486 crossref_primary_10_1007_s00405_024_09062_5 crossref_primary_10_2196_68070 crossref_primary_10_1016_j_knosys_2025_113914 crossref_primary_10_1016_j_pcad_2024_10_010 crossref_primary_10_1016_j_otsr_2024_104080 crossref_primary_10_3390_info15090543 crossref_primary_10_2196_51319 crossref_primary_10_12688_mep_20815_1 crossref_primary_10_1177_14604582251381260 crossref_primary_10_1080_23311983_2024_2353984 crossref_primary_10_1080_2331186X_2024_2332850 crossref_primary_10_7759_cureus_63048 crossref_primary_10_1002_ijgo_70251 |
| Cites_doi | 10.1080/23311916.2023.2222988 10.1007/s10439-023-03326-7 10.1007/s10439-023-03305-y 10.1186/s40561-023-00237-x 10.1371/journal.pdig.0000198 10.2196/48291 10.35542/osf.io/sytu3 10.2308/ISSUES-2023-013 10.1145/3571730 10.2139/ssrn.4335905 10.1007/s10439-023-03335-6 10.1108/ijchm-05-2023-0686 10.1016/j.resuscitation.2023.109783 10.1108/QAE-07-2018-0080 10.1227/neu.0000000000002551 10.1016/j.iotcps.2023.04.003 10.2196/45312 10.1371/journal.pdig.0000205 10.1080/10872981.2023.2220920 10.1007/s13347-023-00621-y 10.52866/20ijcsm.2023.01.01.0018 |
| ContentType | Journal Article |
| Copyright | Faiza Farhat, Beenish Moalla Chaudhry, Mohammad Nadeem, Shahab Saquib Sohail, Dag Øivind Madsen. Originally published in JMIR Medical Education (https://mededu.jmir.org), 21.02.2024. Faiza Farhat, Beenish Moalla Chaudhry, Mohammad Nadeem, Shahab Saquib Sohail, Dag Øivind Madsen. Originally published in JMIR Medical Education (https://mededu.jmir.org), 21.02.2024. 2024 |
| Copyright_xml | – notice: Faiza Farhat, Beenish Moalla Chaudhry, Mohammad Nadeem, Shahab Saquib Sohail, Dag Øivind Madsen. Originally published in JMIR Medical Education (https://mededu.jmir.org), 21.02.2024. – notice: Faiza Farhat, Beenish Moalla Chaudhry, Mohammad Nadeem, Shahab Saquib Sohail, Dag Øivind Madsen. Originally published in JMIR Medical Education (https://mededu.jmir.org), 21.02.2024. 2024 |
| DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8 5PM DOA |
| DOI | 10.2196/51523 |
| DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals (ODIN) url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| EISSN | 2369-3762 |
| ExternalDocumentID | oai_doaj_org_article_f2ee1b2baba949e5836a16aaf07a0f56 PMC10918540 38381486 10_2196_51523 |
| Genre | Journal Article |
| GeographicLocations | India |
| GeographicLocations_xml | – name: India |
| GroupedDBID | 7X7 8FI 8FJ AAFWJ AAHSB AAYXX ABUWG ADBBV AFFHD AFKRA AFPKN ALMA_UNASSIGNED_HOLDINGS AOIJS BCNDV BENPR CCPQU CITATION FYUFA GROUPED_DOAJ HMCUK HYE KQ8 M48 M~E OK1 PGMZT PHGZM PHGZT PIMPY RPM UKHRP ALIPV CGR CUY CVF ECM EIF NPM 7X8 PUEGO 5PM |
| ID | FETCH-LOGICAL-c430t-5fa4af811fef5d214b2c8b87de66ac4afdea1aa7e93dde8577d62a9d219223853 |
| IEDL.DBID | DOA |
| ISICitedReferencesCount | 32 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001178222600002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2369-3762 |
| IngestDate | Tue Oct 14 19:06:13 EDT 2025 Tue Nov 04 02:05:49 EST 2025 Wed Oct 01 13:47:31 EDT 2025 Mon Jul 21 06:05:19 EDT 2025 Sat Nov 29 03:45:58 EST 2025 Tue Nov 18 22:14:50 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | natural language processing suitability Generative Pre-trained Transformers accuracy premedical exams Bard artificial intelligence medical education, medical exam AI model performance ChatGPT large language models educational task GPT-4 |
| Language | English |
| License | Faiza Farhat, Beenish Moalla Chaudhry, Mohammad Nadeem, Shahab Saquib Sohail, Dag Øivind Madsen. Originally published in JMIR Medical Education (https://mededu.jmir.org), 21.02.2024. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Education, is properly cited. The complete bibliographic information, a link to the original publication on https://mededu.jmir.org/, as well as this copyright and license information must be included. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c430t-5fa4af811fef5d214b2c8b87de66ac4afdea1aa7e93dde8577d62a9d219223853 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0000-0001-8735-3332 0000-0002-0437-6924 0000-0003-3664-5014 0000-0002-1310-1586 0000-0002-5944-7371 |
| OpenAccessLink | https://doaj.org/article/f2ee1b2baba949e5836a16aaf07a0f56 |
| PMID | 38381486 |
| PQID | 2929538836 |
| PQPubID | 23479 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_f2ee1b2baba949e5836a16aaf07a0f56 pubmedcentral_primary_oai_pubmedcentral_nih_gov_10918540 proquest_miscellaneous_2929538836 pubmed_primary_38381486 crossref_citationtrail_10_2196_51523 crossref_primary_10_2196_51523 |
| PublicationCentury | 2000 |
| PublicationDate | 20240221 |
| PublicationDateYYYYMMDD | 2024-02-21 |
| PublicationDate_xml | – month: 2 year: 2024 text: 20240221 day: 21 |
| PublicationDecade | 2020 |
| PublicationPlace | Canada |
| PublicationPlace_xml | – name: Canada – name: Toronto, Canada |
| PublicationTitle | JMIR medical education |
| PublicationTitleAlternate | JMIR Med Educ |
| PublicationYear | 2024 |
| Publisher | JMIR Publications |
| Publisher_xml | – name: JMIR Publications |
| References | ref13 ref35 ref12 ref34 ref37 ref14 ref31 ref30 ref11 ref33 Hong, WCH (ref21) 2023; 5 Liu, X (ref36) ref2 ref1 ref39 ref38 ref18 Wu, S (ref16) Zhao, WX (ref10) ref24 ref23 Teebagy, S (ref32) ref26 ref25 Touvron, H (ref15) Thoppilan, R (ref19) ref20 Chowdhery, A (ref17) 2023; 24 ref22 ref28 ref27 ref29 ref8 ref7 ref4 ref3 ref6 ref5 Cao, Y (ref9) |
| References_xml | – ident: ref10 publication-title: ArXiv. Preprint posted online on November 24, 2023 – ident: ref1 doi: 10.1080/23311916.2023.2222988 – ident: ref38 doi: 10.1007/s10439-023-03326-7 – ident: ref3 doi: 10.1007/s10439-023-03305-y – volume: 24 start-page: 1 issue: 240 year: 2023 ident: ref17 publication-title: J Mach Learn Res – ident: ref7 – ident: ref19 publication-title: ArXiv. Preprint posted online on February 10, 2022 – ident: ref20 doi: 10.1186/s40561-023-00237-x – ident: ref29 – ident: ref31 doi: 10.1371/journal.pdig.0000198 – ident: ref25 – ident: ref15 publication-title: ArXiv. Preprint posted online on February 27, 2023 – ident: ref22 doi: 10.2196/48291 – ident: ref24 doi: 10.35542/osf.io/sytu3 – ident: ref9 publication-title: ArXiv. Preprint posted online on March 7, 2023 – ident: ref26 doi: 10.2308/ISSUES-2023-013 – ident: ref23 doi: 10.1145/3571730 – ident: ref27 doi: 10.2139/ssrn.4335905 – ident: ref37 doi: 10.1007/s10439-023-03335-6 – ident: ref2 doi: 10.1108/ijchm-05-2023-0686 – ident: ref13 – ident: ref4 – ident: ref6 – ident: ref28 doi: 10.1016/j.resuscitation.2023.109783 – ident: ref35 doi: 10.1108/QAE-07-2018-0080 – ident: ref33 doi: 10.1227/neu.0000000000002551 – ident: ref12 doi: 10.1016/j.iotcps.2023.04.003 – ident: ref30 doi: 10.2196/45312 – ident: ref36 publication-title: medRxiv. Preprint posted online on April 18, 2023 – volume: 5 issue: 1 year: 2023 ident: ref21 publication-title: J Educ Technol Inov – ident: ref16 publication-title: ArXiv. Preprint posted online on May 9, 2023 – ident: ref39 doi: 10.1371/journal.pdig.0000205 – ident: ref34 doi: 10.1080/10872981.2023.2220920 – ident: ref8 – ident: ref18 – ident: ref5 doi: 10.1007/s13347-023-00621-y – ident: ref11 doi: 10.52866/20ijcsm.2023.01.01.0018 – ident: ref32 publication-title: medRxiv. Preprint posted online on April 03, 2023 – ident: ref14 |
| SSID | ssj0001742059 |
| Score | 2.418482 |
| Snippet | Large language models (LLMs) have revolutionized natural language processing with their ability to generate human-like text through extensive training on large... BackgroundLarge language models (LLMs) have revolutionized natural language processing with their ability to generate human-like text through extensive... |
| SourceID | doaj pubmedcentral proquest pubmed crossref |
| SourceType | Open Website Open Access Repository Aggregation Database Index Database Enrichment Source |
| StartPage | e51523 |
| SubjectTerms | Artificial Intelligence Benchmarking Confusion Educational Status Humans India Original Paper |
| Title | Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/38381486 https://www.proquest.com/docview/2929538836 https://pubmed.ncbi.nlm.nih.gov/PMC10918540 https://doaj.org/article/f2ee1b2baba949e5836a16aaf07a0f56 |
| Volume | 10 |
| WOSCitedRecordID | wos001178222600002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals (ODIN) customDbUrl: eissn: 2369-3762 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001742059 issn: 2369-3762 databaseCode: DOA dateStart: 20150101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2369-3762 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001742059 issn: 2369-3762 databaseCode: M~E dateStart: 20150101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVPQU databaseName: Health & Medical Collection customDbUrl: eissn: 2369-3762 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001742059 issn: 2369-3762 databaseCode: 7X7 dateStart: 20150101 isFulltext: true titleUrlDefault: https://search.proquest.com/healthcomplete providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: eissn: 2369-3762 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001742059 issn: 2369-3762 databaseCode: BENPR dateStart: 20150101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Publicly Available Content Database customDbUrl: eissn: 2369-3762 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001742059 issn: 2369-3762 databaseCode: PIMPY dateStart: 20150101 isFulltext: true titleUrlDefault: http://search.proquest.com/publiccontent providerName: ProQuest |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3db9MwED_BQGgSQiA-Vj4qI-1xYbETf4Q3OnUwaVQRGqg8RXZsQ6XiorVDPPDHc3bS0k5IvPBiJTlHsXwX-3f2-XcAh8J6ljtXZJ56m5Xc-MgBKbPWC2ON0oVLZ2E-ncvJRE2nVb2V6ivGhHX0wF3HHXvmHDXMaKOrsnJcFUJTobXPpc49T2Tbuay2nKm0uoIeHwKHO3A3xjqjlR3jvM2KnckncfT_DVhej4_cmnBO78O9HimSN10LH8ANFx7Cr3HPzh2-kPMYxI1lt-BIYlaz-ZIgCCUI6khPeD0ndVwBTLsxZPxTfyOzQM4CWsVrcvKH-ZusyUnIwpO39UVWvOJH6aI8IjpYMkJLegQfT8cXJ--yPoFC1pZFvsq416X2ilLvPLeMloa1yihpnRC6RZF1mmotXVXgKKe4lFYwXWHNClEDTuSPYS8sgjsAkuM4hK4g90y0paDWcOdti2jFUYU-bjmAw3XPNm3PLh6TXMwb9DKiApqkgAEMN9W-d3Qa1yuMolo2wsh-nR6gTTS9TTT_sokBvFwrtcG_JW6B6OAWV8uGIRrEIR5fGMCTTsmbT6GvrtA5RInaUf9OW3YlYfY1MXJHdlWF2Pfp_2j9M9hniJzSuXn6HPZWl1fuBdxuf6xmy8sh3JRTmUo1hFuj8aT-MEy2j3f12fv6828MQgqC |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Evaluating+Large+Language+Models+for+the+National+Premedical+Exam+in+India%3A+Comparative+Analysis+of+GPT-3.5%2C+GPT-4%2C+and+Bard&rft.jtitle=JMIR+medical+education&rft.au=Farhat%2C+Faiza&rft.au=Chaudhry%2C+Beenish+Moalla&rft.au=Nadeem%2C+Mohammad&rft.au=Sohail%2C+Shahab+Saquib&rft.date=2024-02-21&rft.eissn=2369-3762&rft.volume=10&rft.spage=e51523&rft_id=info:doi/10.2196%2F51523&rft_id=info%3Apmid%2F38381486&rft.externalDocID=38381486 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2369-3762&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2369-3762&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2369-3762&client=summon |