Performance of ChatGPT on Nephrology Test Questions
ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains und...
Uloženo v:
| Vydáno v: | Clinical journal of the American Society of Nephrology Ročník 19; číslo 1; s. 35 |
|---|---|
| Hlavní autoři: | , , , , , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
United States
01.01.2024
|
| Témata: | |
| ISSN: | 1555-905X, 1555-905X |
| On-line přístup: | Zjistit podrobnosti o přístupu |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains undetermined. This study assessed ChatGPT's capabilities in answering nephrology test questions.
Questions sourced from Nephrology Self-Assessment Program and Kidney Self-Assessment Program were used, each with multiple-choice single-answer questions. Questions containing visual elements were excluded. Each question bank was run twice using GPT-3.5 and GPT-4. Total accuracy rate, defined as the percentage of correct answers obtained by ChatGPT in either the first or second run, and the total concordance, defined as the percentage of identical answers provided by ChatGPT during both runs, regardless of their correctness, were used to assess its performance.
A comprehensive assessment was conducted on a set of 975 questions, comprising 508 questions from Nephrology Self-Assessment Program and 467 from Kidney Self-Assessment Program. GPT-3.5 resulted in a total accuracy rate of 51%. Notably, the employment of Nephrology Self-Assessment Program yielded a higher accuracy rate compared with Kidney Self-Assessment Program (58% versus 44%; P < 0.001). The total concordance rate across all questions was 78%, with correct answers exhibiting a higher concordance rate (84%) compared with incorrect answers (73%) ( P < 0.001). When examining various nephrology subfields, the total accuracy rates were relatively lower in electrolyte and acid-base disorder, glomerular disease, and kidney-related bone and stone disorders. The total accuracy rate of GPT-4's response was 74%, higher than GPT-3.5 ( P < 0.001) but remained below the passing threshold and average scores of nephrology examinees (77%).
ChatGPT exhibited limitations regarding accuracy and repeatability when addressing nephrology-related questions. Variations in performance were evident across various subfields. |
|---|---|
| AbstractList | ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains undetermined. This study assessed ChatGPT's capabilities in answering nephrology test questions.BACKGROUNDChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains undetermined. This study assessed ChatGPT's capabilities in answering nephrology test questions.Questions sourced from Nephrology Self-Assessment Program and Kidney Self-Assessment Program were used, each with multiple-choice single-answer questions. Questions containing visual elements were excluded. Each question bank was run twice using GPT-3.5 and GPT-4. Total accuracy rate, defined as the percentage of correct answers obtained by ChatGPT in either the first or second run, and the total concordance, defined as the percentage of identical answers provided by ChatGPT during both runs, regardless of their correctness, were used to assess its performance.METHODSQuestions sourced from Nephrology Self-Assessment Program and Kidney Self-Assessment Program were used, each with multiple-choice single-answer questions. Questions containing visual elements were excluded. Each question bank was run twice using GPT-3.5 and GPT-4. Total accuracy rate, defined as the percentage of correct answers obtained by ChatGPT in either the first or second run, and the total concordance, defined as the percentage of identical answers provided by ChatGPT during both runs, regardless of their correctness, were used to assess its performance.A comprehensive assessment was conducted on a set of 975 questions, comprising 508 questions from Nephrology Self-Assessment Program and 467 from Kidney Self-Assessment Program. GPT-3.5 resulted in a total accuracy rate of 51%. Notably, the employment of Nephrology Self-Assessment Program yielded a higher accuracy rate compared with Kidney Self-Assessment Program (58% versus 44%; P < 0.001). The total concordance rate across all questions was 78%, with correct answers exhibiting a higher concordance rate (84%) compared with incorrect answers (73%) ( P < 0.001). When examining various nephrology subfields, the total accuracy rates were relatively lower in electrolyte and acid-base disorder, glomerular disease, and kidney-related bone and stone disorders. The total accuracy rate of GPT-4's response was 74%, higher than GPT-3.5 ( P < 0.001) but remained below the passing threshold and average scores of nephrology examinees (77%).RESULTSA comprehensive assessment was conducted on a set of 975 questions, comprising 508 questions from Nephrology Self-Assessment Program and 467 from Kidney Self-Assessment Program. GPT-3.5 resulted in a total accuracy rate of 51%. Notably, the employment of Nephrology Self-Assessment Program yielded a higher accuracy rate compared with Kidney Self-Assessment Program (58% versus 44%; P < 0.001). The total concordance rate across all questions was 78%, with correct answers exhibiting a higher concordance rate (84%) compared with incorrect answers (73%) ( P < 0.001). When examining various nephrology subfields, the total accuracy rates were relatively lower in electrolyte and acid-base disorder, glomerular disease, and kidney-related bone and stone disorders. The total accuracy rate of GPT-4's response was 74%, higher than GPT-3.5 ( P < 0.001) but remained below the passing threshold and average scores of nephrology examinees (77%).ChatGPT exhibited limitations regarding accuracy and repeatability when addressing nephrology-related questions. Variations in performance were evident across various subfields.CONCLUSIONSChatGPT exhibited limitations regarding accuracy and repeatability when addressing nephrology-related questions. Variations in performance were evident across various subfields. ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains undetermined. This study assessed ChatGPT's capabilities in answering nephrology test questions. Questions sourced from Nephrology Self-Assessment Program and Kidney Self-Assessment Program were used, each with multiple-choice single-answer questions. Questions containing visual elements were excluded. Each question bank was run twice using GPT-3.5 and GPT-4. Total accuracy rate, defined as the percentage of correct answers obtained by ChatGPT in either the first or second run, and the total concordance, defined as the percentage of identical answers provided by ChatGPT during both runs, regardless of their correctness, were used to assess its performance. A comprehensive assessment was conducted on a set of 975 questions, comprising 508 questions from Nephrology Self-Assessment Program and 467 from Kidney Self-Assessment Program. GPT-3.5 resulted in a total accuracy rate of 51%. Notably, the employment of Nephrology Self-Assessment Program yielded a higher accuracy rate compared with Kidney Self-Assessment Program (58% versus 44%; P < 0.001). The total concordance rate across all questions was 78%, with correct answers exhibiting a higher concordance rate (84%) compared with incorrect answers (73%) ( P < 0.001). When examining various nephrology subfields, the total accuracy rates were relatively lower in electrolyte and acid-base disorder, glomerular disease, and kidney-related bone and stone disorders. The total accuracy rate of GPT-4's response was 74%, higher than GPT-3.5 ( P < 0.001) but remained below the passing threshold and average scores of nephrology examinees (77%). ChatGPT exhibited limitations regarding accuracy and repeatability when addressing nephrology-related questions. Variations in performance were evident across various subfields. |
| Author | Krisanapan, Pajaree Craici, Iasmina M Cheungpasitporn, Wisit Sheikh, Mohammad S Davis, Paul W Suarez, Maria Gonzalez Mekraksakit, Poemlarp Garcia Valencia, Oscar A Thongprayoon, Charat Miao, Jing |
| Author_xml | – sequence: 1 givenname: Jing orcidid: 0000-0003-0642-9740 surname: Miao fullname: Miao, Jing organization: Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, Minnesota – sequence: 2 givenname: Charat orcidid: 0000-0002-8313-3604 surname: Thongprayoon fullname: Thongprayoon, Charat – sequence: 3 givenname: Oscar A orcidid: 0000-0003-0186-9448 surname: Garcia Valencia fullname: Garcia Valencia, Oscar A – sequence: 4 givenname: Pajaree orcidid: 0000-0002-2888-881 surname: Krisanapan fullname: Krisanapan, Pajaree – sequence: 5 givenname: Mohammad S orcidid: 0009-0006-9388-8505 surname: Sheikh fullname: Sheikh, Mohammad S – sequence: 6 givenname: Paul W orcidid: 0000-0003-4637-3198 surname: Davis fullname: Davis, Paul W – sequence: 7 givenname: Poemlarp orcidid: 0000-0002-2127-2529 surname: Mekraksakit fullname: Mekraksakit, Poemlarp – sequence: 8 givenname: Maria Gonzalez orcidid: 0000-0002-8930-4611 surname: Suarez fullname: Suarez, Maria Gonzalez – sequence: 9 givenname: Iasmina M surname: Craici fullname: Craici, Iasmina M – sequence: 10 givenname: Wisit orcidid: 0000-0001-9954-9711 surname: Cheungpasitporn fullname: Cheungpasitporn, Wisit |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/37851468$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNT8tOwzAQtFARfcAfIJQjl5T1K9keUQQFVJUiBYlb5NgODUrsEjeH_j1BFKl72J3DzOzMlIycd5aQawpzxqi8y17WczgdzuGMTKiUMl6A_Bid4DGZhvAFIARn8oKMeYqSigQnhG9sV_muVU7byFdRtlX75SaPvIvWdrftfOM_D1Fuwz5664ddexcuyXmlmmCvjndG3h8f8uwpXr0un7P7VayFRIgNsARoZQwapVADApZpWZm0ZCVQmfAKuB4iKUw4CrtIjTYCGRoYhIjAZuT2z3fX-e_f50VbB22bRjnr-1AwTDGlMGgG6s2R2petNcWuq1vVHYr_ouwH1jNVqQ |
| CitedBy_id | crossref_primary_10_2215_CJN_0000000000000378 crossref_primary_10_3390_jpm13121681 crossref_primary_10_1002_ca_70012 crossref_primary_10_3748_wjg_v31_i3_101092 crossref_primary_10_1177_20552076241277458 crossref_primary_10_1177_20552076251342067 crossref_primary_10_3390_clinpract14040109 crossref_primary_10_1007_s11695_025_08115_w crossref_primary_10_1007_s12195_024_00820_3 crossref_primary_10_1016_j_jcrc_2024_155010 crossref_primary_10_1177_20552076251326014 crossref_primary_10_3390_medicina60030445 crossref_primary_10_1055_a_2405_0138 crossref_primary_10_1111_ctr_15466 crossref_primary_10_1007_s40620_024_01974_z crossref_primary_10_3389_fonc_2025_1516264 crossref_primary_10_3390_medicines10100058 crossref_primary_10_3390_jpm13121679 crossref_primary_10_1093_ckj_sfae193 crossref_primary_10_1111_ctr_70303 crossref_primary_10_1111_jch_14822 crossref_primary_10_3390_jpm14010107 crossref_primary_10_3390_clinpract14010008 crossref_primary_10_3390_healthcare12222305 crossref_primary_10_1080_0886022X_2024_2337291 crossref_primary_10_2196_57157 crossref_primary_10_3390_jpm14030233 crossref_primary_10_1097_MCC_0000000000001202 crossref_primary_10_1186_s12909_025_07427_w crossref_primary_10_3390_healthcare13010057 crossref_primary_10_1111_petr_70068 crossref_primary_10_3390_medicina60010148 crossref_primary_10_3389_fdgth_2024_1366967 crossref_primary_10_1177_20552076241248082 crossref_primary_10_1007_s12015_024_10814_3 crossref_primary_10_1080_0142159X_2025_2458808 crossref_primary_10_1038_s41598_025_99774_3 crossref_primary_10_1177_20552076241283256 crossref_primary_10_1016_j_heliyon_2024_e34851 crossref_primary_10_1177_20552076251328807 crossref_primary_10_1159_000541168 crossref_primary_10_1080_0886022X_2024_2402075 |
| ContentType | Journal Article |
| Copyright | Copyright © 2023 by the American Society of Nephrology. |
| Copyright_xml | – notice: Copyright © 2023 by the American Society of Nephrology. |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.2215/CJN.0000000000000330 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Medicine |
| EISSN | 1555-905X |
| ExternalDocumentID | 37851468 |
| Genre | Journal Article |
| GroupedDBID | --- 0R~ 29B 2WC 53G 5GY 5VS 6PF AAOCO AAUIN AAWTL ABBLC ABJNI ABXYN ACLDA ACZKN ADBBV ADSXY AENEX AFEXH AFNMH AHOMT AHQVU ALMA_UNASSIGNED_HOLDINGS BAWUL BTFSW BYPQX CGR CS3 CUY CVF DIK DU5 EBS ECM EIF EJD ERAAH F5P GX1 HYE HZ~ KQ8 NPM O9- OK1 OVD P2P RHI RPM TEORI TNP TR2 W8F WOQ 7X8 |
| ID | FETCH-LOGICAL-c4580-d02601fdd8daa8c0808b7bfd7b2b01563f03c044a86384e97dcd4828d00268802 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 53 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001096675200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1555-905X |
| IngestDate | Sat Nov 01 14:57:29 EDT 2025 Thu Aug 28 04:24:46 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Language | English |
| License | Copyright © 2023 by the American Society of Nephrology. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c4580-d02601fdd8daa8c0808b7bfd7b2b01563f03c044a86384e97dcd4828d00268802 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0000-0002-2127-2529 0009-0006-9388-8505 0000-0002-8313-3604 0000-0001-9954-9711 0000-0002-8930-4611 0000-0003-4637-3198 0000-0002-2888-881 0000-0003-0642-9740 0000-0003-0186-9448 |
| OpenAccessLink | https://www.ncbi.nlm.nih.gov/pmc/articles/10843340 |
| PMID | 37851468 |
| PQID | 2878710482 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_2878710482 pubmed_primary_37851468 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-January |
| PublicationDateYYYYMMDD | 2024-01-01 |
| PublicationDate_xml | – month: 01 year: 2024 text: 2024-January |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Clinical journal of the American Society of Nephrology |
| PublicationTitleAlternate | Clin J Am Soc Nephrol |
| PublicationYear | 2024 |
| SSID | ssj0044325 |
| Score | 2.5937726 |
| Snippet | ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 35 |
| SubjectTerms | Educational Measurement - methods Humans Machine Learning Nephrology - standards Surveys and Questionnaires |
| Title | Performance of ChatGPT on Nephrology Test Questions |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/37851468 https://www.proquest.com/docview/2878710482 |
| Volume | 19 |
| WOSCitedRecordID | wos001096675200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT8MwDLaAIcSF92O8FCSu0boma9MTQhMDIVb1MKTeqrRuBJd20MHvx0k7dkJCoofe2kaOHX-2688AN9oIjFSuLU-l4BJLzXU0RB4QXEYtKWIxjmf2OYxjlaZR0iXcmu63yuWZ6A5qrAubIx8QsidsT_rm387fuZ0aZaur3QiNdegJgjJWq8P0p4ogpXBDV8lljnjkjdK2dc4nLzcYP8UtdeHyEsL7HWQ6ZzPZ_e8y92Cng5nsrtWLfVgrqwPYmnaF9EMQyaphgNWGjV_14iGZsbpicUn765LtbEbfZS4lapXzCF4m97PxI-_mJ_BCjpTH0fGFGUSFWitLKa7yMDcY5n5uO6iF8URBUtKKjFCWUYgF0sIV2sCM7No_ho2qrspTYAZJQMNAIeZaFl5BsILcGkXlkl4YhLIP10txZKSftuigq7L-bLKVQPpw0so0m7dEGpkICe_JQJ394elz2PYJT7TZjwvoGbLO8hI2i6_FW_Nx5Tae7nEy_QZPArTT |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Performance+of+ChatGPT+on+Nephrology+Test+Questions&rft.jtitle=Clinical+journal+of+the+American+Society+of+Nephrology&rft.au=Miao%2C+Jing&rft.au=Thongprayoon%2C+Charat&rft.au=Garcia+Valencia%2C+Oscar+A&rft.au=Krisanapan%2C+Pajaree&rft.date=2024-01-01&rft.issn=1555-905X&rft.eissn=1555-905X&rft.volume=19&rft.issue=1&rft.spage=35&rft_id=info:doi/10.2215%2FCJN.0000000000000330&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1555-905X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1555-905X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1555-905X&client=summon |