Performance of ChatGPT on Nephrology Test Questions

ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains und...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Clinical journal of the American Society of Nephrology Ročník 19; číslo 1; s. 35
Hlavní autoři:	Miao, Jing, Thongprayoon, Charat, Garcia Valencia, Oscar A, Krisanapan, Pajaree, Sheikh, Mohammad S, Davis, Paul W, Mekraksakit, Poemlarp, Suarez, Maria Gonzalez, Craici, Iasmina M, Cheungpasitporn, Wisit
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	United States 01.01.2024
Témata:	Educational Measurement - methods Humans Machine Learning Nephrology - standards Surveys and Questionnaires
ISSN:	1555-905X, 1555-905X
On-line přístup:	Zjistit podrobnosti o přístupu
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains undetermined. This study assessed ChatGPT's capabilities in answering nephrology test questions. Questions sourced from Nephrology Self-Assessment Program and Kidney Self-Assessment Program were used, each with multiple-choice single-answer questions. Questions containing visual elements were excluded. Each question bank was run twice using GPT-3.5 and GPT-4. Total accuracy rate, defined as the percentage of correct answers obtained by ChatGPT in either the first or second run, and the total concordance, defined as the percentage of identical answers provided by ChatGPT during both runs, regardless of their correctness, were used to assess its performance. A comprehensive assessment was conducted on a set of 975 questions, comprising 508 questions from Nephrology Self-Assessment Program and 467 from Kidney Self-Assessment Program. GPT-3.5 resulted in a total accuracy rate of 51%. Notably, the employment of Nephrology Self-Assessment Program yielded a higher accuracy rate compared with Kidney Self-Assessment Program (58% versus 44%; P < 0.001). The total concordance rate across all questions was 78%, with correct answers exhibiting a higher concordance rate (84%) compared with incorrect answers (73%) ( P < 0.001). When examining various nephrology subfields, the total accuracy rates were relatively lower in electrolyte and acid-base disorder, glomerular disease, and kidney-related bone and stone disorders. The total accuracy rate of GPT-4's response was 74%, higher than GPT-3.5 ( P < 0.001) but remained below the passing threshold and average scores of nephrology examinees (77%). ChatGPT exhibited limitations regarding accuracy and repeatability when addressing nephrology-related questions. Variations in performance were evident across various subfields.
AbstractList	ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains undetermined. This study assessed ChatGPT's capabilities in answering nephrology test questions.BACKGROUNDChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains undetermined. This study assessed ChatGPT's capabilities in answering nephrology test questions.Questions sourced from Nephrology Self-Assessment Program and Kidney Self-Assessment Program were used, each with multiple-choice single-answer questions. Questions containing visual elements were excluded. Each question bank was run twice using GPT-3.5 and GPT-4. Total accuracy rate, defined as the percentage of correct answers obtained by ChatGPT in either the first or second run, and the total concordance, defined as the percentage of identical answers provided by ChatGPT during both runs, regardless of their correctness, were used to assess its performance.METHODSQuestions sourced from Nephrology Self-Assessment Program and Kidney Self-Assessment Program were used, each with multiple-choice single-answer questions. Questions containing visual elements were excluded. Each question bank was run twice using GPT-3.5 and GPT-4. Total accuracy rate, defined as the percentage of correct answers obtained by ChatGPT in either the first or second run, and the total concordance, defined as the percentage of identical answers provided by ChatGPT during both runs, regardless of their correctness, were used to assess its performance.A comprehensive assessment was conducted on a set of 975 questions, comprising 508 questions from Nephrology Self-Assessment Program and 467 from Kidney Self-Assessment Program. GPT-3.5 resulted in a total accuracy rate of 51%. Notably, the employment of Nephrology Self-Assessment Program yielded a higher accuracy rate compared with Kidney Self-Assessment Program (58% versus 44%; P < 0.001). The total concordance rate across all questions was 78%, with correct answers exhibiting a higher concordance rate (84%) compared with incorrect answers (73%) ( P < 0.001). When examining various nephrology subfields, the total accuracy rates were relatively lower in electrolyte and acid-base disorder, glomerular disease, and kidney-related bone and stone disorders. The total accuracy rate of GPT-4's response was 74%, higher than GPT-3.5 ( P < 0.001) but remained below the passing threshold and average scores of nephrology examinees (77%).RESULTSA comprehensive assessment was conducted on a set of 975 questions, comprising 508 questions from Nephrology Self-Assessment Program and 467 from Kidney Self-Assessment Program. GPT-3.5 resulted in a total accuracy rate of 51%. Notably, the employment of Nephrology Self-Assessment Program yielded a higher accuracy rate compared with Kidney Self-Assessment Program (58% versus 44%; P < 0.001). The total concordance rate across all questions was 78%, with correct answers exhibiting a higher concordance rate (84%) compared with incorrect answers (73%) ( P < 0.001). When examining various nephrology subfields, the total accuracy rates were relatively lower in electrolyte and acid-base disorder, glomerular disease, and kidney-related bone and stone disorders. The total accuracy rate of GPT-4's response was 74%, higher than GPT-3.5 ( P < 0.001) but remained below the passing threshold and average scores of nephrology examinees (77%).ChatGPT exhibited limitations regarding accuracy and repeatability when addressing nephrology-related questions. Variations in performance were evident across various subfields.CONCLUSIONSChatGPT exhibited limitations regarding accuracy and repeatability when addressing nephrology-related questions. Variations in performance were evident across various subfields. ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains undetermined. This study assessed ChatGPT's capabilities in answering nephrology test questions. Questions sourced from Nephrology Self-Assessment Program and Kidney Self-Assessment Program were used, each with multiple-choice single-answer questions. Questions containing visual elements were excluded. Each question bank was run twice using GPT-3.5 and GPT-4. Total accuracy rate, defined as the percentage of correct answers obtained by ChatGPT in either the first or second run, and the total concordance, defined as the percentage of identical answers provided by ChatGPT during both runs, regardless of their correctness, were used to assess its performance. A comprehensive assessment was conducted on a set of 975 questions, comprising 508 questions from Nephrology Self-Assessment Program and 467 from Kidney Self-Assessment Program. GPT-3.5 resulted in a total accuracy rate of 51%. Notably, the employment of Nephrology Self-Assessment Program yielded a higher accuracy rate compared with Kidney Self-Assessment Program (58% versus 44%; P < 0.001). The total concordance rate across all questions was 78%, with correct answers exhibiting a higher concordance rate (84%) compared with incorrect answers (73%) ( P < 0.001). When examining various nephrology subfields, the total accuracy rates were relatively lower in electrolyte and acid-base disorder, glomerular disease, and kidney-related bone and stone disorders. The total accuracy rate of GPT-4's response was 74%, higher than GPT-3.5 ( P < 0.001) but remained below the passing threshold and average scores of nephrology examinees (77%). ChatGPT exhibited limitations regarding accuracy and repeatability when addressing nephrology-related questions. Variations in performance were evident across various subfields.
Author	Krisanapan, Pajaree Craici, Iasmina M Cheungpasitporn, Wisit Sheikh, Mohammad S Davis, Paul W Suarez, Maria Gonzalez Mekraksakit, Poemlarp Garcia Valencia, Oscar A Thongprayoon, Charat Miao, Jing
Author_xml	– sequence: 1 givenname: Jing orcidid: 0000-0003-0642-9740 surname: Miao fullname: Miao, Jing organization: Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, Minnesota – sequence: 2 givenname: Charat orcidid: 0000-0002-8313-3604 surname: Thongprayoon fullname: Thongprayoon, Charat – sequence: 3 givenname: Oscar A orcidid: 0000-0003-0186-9448 surname: Garcia Valencia fullname: Garcia Valencia, Oscar A – sequence: 4 givenname: Pajaree orcidid: 0000-0002-2888-881 surname: Krisanapan fullname: Krisanapan, Pajaree – sequence: 5 givenname: Mohammad S orcidid: 0009-0006-9388-8505 surname: Sheikh fullname: Sheikh, Mohammad S – sequence: 6 givenname: Paul W orcidid: 0000-0003-4637-3198 surname: Davis fullname: Davis, Paul W – sequence: 7 givenname: Poemlarp orcidid: 0000-0002-2127-2529 surname: Mekraksakit fullname: Mekraksakit, Poemlarp – sequence: 8 givenname: Maria Gonzalez orcidid: 0000-0002-8930-4611 surname: Suarez fullname: Suarez, Maria Gonzalez – sequence: 9 givenname: Iasmina M surname: Craici fullname: Craici, Iasmina M – sequence: 10 givenname: Wisit orcidid: 0000-0001-9954-9711 surname: Cheungpasitporn fullname: Cheungpasitporn, Wisit
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/37851468$$D View this record in MEDLINE/PubMed
BookMark	eNpNT8tOwzAQtFARfcAfIJQjl5T1K9keUQQFVJUiBYlb5NgODUrsEjeH_j1BFKl72J3DzOzMlIycd5aQawpzxqi8y17WczgdzuGMTKiUMl6A_Bid4DGZhvAFIARn8oKMeYqSigQnhG9sV_muVU7byFdRtlX75SaPvIvWdrftfOM_D1Fuwz5664ddexcuyXmlmmCvjndG3h8f8uwpXr0un7P7VayFRIgNsARoZQwapVADApZpWZm0ZCVQmfAKuB4iKUw4CrtIjTYCGRoYhIjAZuT2z3fX-e_f50VbB22bRjnr-1AwTDGlMGgG6s2R2petNcWuq1vVHYr_ouwH1jNVqQ
CitedBy_id	crossref_primary_10_2215_CJN_0000000000000378 crossref_primary_10_3390_jpm13121681 crossref_primary_10_1002_ca_70012 crossref_primary_10_3748_wjg_v31_i3_101092 crossref_primary_10_1177_20552076241277458 crossref_primary_10_1177_20552076251342067 crossref_primary_10_3390_clinpract14040109 crossref_primary_10_1007_s11695_025_08115_w crossref_primary_10_1007_s12195_024_00820_3 crossref_primary_10_1016_j_jcrc_2024_155010 crossref_primary_10_1177_20552076251326014 crossref_primary_10_3390_medicina60030445 crossref_primary_10_1055_a_2405_0138 crossref_primary_10_1111_ctr_15466 crossref_primary_10_1007_s40620_024_01974_z crossref_primary_10_3389_fonc_2025_1516264 crossref_primary_10_3390_medicines10100058 crossref_primary_10_3390_jpm13121679 crossref_primary_10_1093_ckj_sfae193 crossref_primary_10_1111_ctr_70303 crossref_primary_10_1111_jch_14822 crossref_primary_10_3390_jpm14010107 crossref_primary_10_3390_clinpract14010008 crossref_primary_10_3390_healthcare12222305 crossref_primary_10_1080_0886022X_2024_2337291 crossref_primary_10_2196_57157 crossref_primary_10_3390_jpm14030233 crossref_primary_10_1097_MCC_0000000000001202 crossref_primary_10_1186_s12909_025_07427_w crossref_primary_10_3390_healthcare13010057 crossref_primary_10_1111_petr_70068 crossref_primary_10_3390_medicina60010148 crossref_primary_10_3389_fdgth_2024_1366967 crossref_primary_10_1177_20552076241248082 crossref_primary_10_1007_s12015_024_10814_3 crossref_primary_10_1080_0142159X_2025_2458808 crossref_primary_10_1038_s41598_025_99774_3 crossref_primary_10_1177_20552076241283256 crossref_primary_10_1016_j_heliyon_2024_e34851 crossref_primary_10_1177_20552076251328807 crossref_primary_10_1159_000541168 crossref_primary_10_1080_0886022X_2024_2402075
ContentType	Journal Article
Copyright	Copyright © 2023 by the American Society of Nephrology.
Copyright_xml	– notice: Copyright © 2023 by the American Society of Nephrology.
DBID	CGR CUY CVF ECM EIF NPM 7X8
DOI	10.2215/CJN.0000000000000330
DatabaseName	Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic
DatabaseTitle	MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic
DatabaseTitleList	MEDLINE - Academic MEDLINE
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database
DeliveryMethod	no_fulltext_linktorsrc
Discipline	Medicine
EISSN	1555-905X
ExternalDocumentID	37851468
Genre	Journal Article
GroupedDBID	--- 0R~ 29B 2WC 53G 5GY 5VS 6PF AAOCO AAUIN AAWTL ABBLC ABJNI ABXYN ACLDA ACZKN ADBBV ADSXY AENEX AFEXH AFNMH AHOMT AHQVU ALMA_UNASSIGNED_HOLDINGS BAWUL BTFSW BYPQX CGR CS3 CUY CVF DIK DU5 EBS ECM EIF EJD ERAAH F5P GX1 HYE HZ~ KQ8 NPM O9- OK1 OVD P2P RHI RPM TEORI TNP TR2 W8F WOQ 7X8
ID	FETCH-LOGICAL-c4580-d02601fdd8daa8c0808b7bfd7b2b01563f03c044a86384e97dcd4828d00268802
IEDL.DBID	7X8
ISICitedReferencesCount	53
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001096675200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	1555-905X
IngestDate	Sat Nov 01 14:57:29 EDT 2025 Thu Aug 28 04:24:46 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	1
Language	English
License	Copyright © 2023 by the American Society of Nephrology.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c4580-d02601fdd8daa8c0808b7bfd7b2b01563f03c044a86384e97dcd4828d00268802
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ORCID	0000-0002-2127-2529 0009-0006-9388-8505 0000-0002-8313-3604 0000-0001-9954-9711 0000-0002-8930-4611 0000-0003-4637-3198 0000-0002-2888-881 0000-0003-0642-9740 0000-0003-0186-9448
OpenAccessLink	https://www.ncbi.nlm.nih.gov/pmc/articles/10843340
PMID	37851468
PQID	2878710482
PQPubID	23479
ParticipantIDs	proquest_miscellaneous_2878710482 pubmed_primary_37851468
PublicationCentury	2000
PublicationDate	2024-January
PublicationDateYYYYMMDD	2024-01-01
PublicationDate_xml	– month: 01 year: 2024 text: 2024-January
PublicationDecade	2020
PublicationPlace	United States
PublicationPlace_xml	– name: United States
PublicationTitle	Clinical journal of the American Society of Nephrology
PublicationTitleAlternate	Clin J Am Soc Nephrol
PublicationYear	2024
SSID	ssj0044325
Score	2.5937726
Snippet	ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical...
SourceID	proquest pubmed
SourceType	Aggregation Database Index Database
StartPage	35
SubjectTerms	Educational Measurement - methods Humans Machine Learning Nephrology - standards Surveys and Questionnaires
Title	Performance of ChatGPT on Nephrology Test Questions
URI	https://www.ncbi.nlm.nih.gov/pubmed/37851468 https://www.proquest.com/docview/2878710482
Volume	19
WOSCitedRecordID	wos001096675200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT8MwDLaAIcSF92O8FCSu0boma9MTQhMDIVb1MKTeqrRuBJd20MHvx0k7dkJCoofe2kaOHX-2688AN9oIjFSuLU-l4BJLzXU0RB4QXEYtKWIxjmf2OYxjlaZR0iXcmu63yuWZ6A5qrAubIx8QsidsT_rm387fuZ0aZaur3QiNdegJgjJWq8P0p4ogpXBDV8lljnjkjdK2dc4nLzcYP8UtdeHyEsL7HWQ6ZzPZ_e8y92Cng5nsrtWLfVgrqwPYmnaF9EMQyaphgNWGjV_14iGZsbpicUn765LtbEbfZS4lapXzCF4m97PxI-_mJ_BCjpTH0fGFGUSFWitLKa7yMDcY5n5uO6iF8URBUtKKjFCWUYgF0sIV2sCM7No_ho2qrspTYAZJQMNAIeZaFl5BsILcGkXlkl4YhLIP10txZKSftuigq7L-bLKVQPpw0so0m7dEGpkICe_JQJ394elz2PYJT7TZjwvoGbLO8hI2i6_FW_Nx5Tae7nEy_QZPArTT
linkProvider	ProQuest
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Performance+of+ChatGPT+on+Nephrology+Test+Questions&rft.jtitle=Clinical+journal+of+the+American+Society+of+Nephrology&rft.au=Miao%2C+Jing&rft.au=Thongprayoon%2C+Charat&rft.au=Garcia+Valencia%2C+Oscar+A&rft.au=Krisanapan%2C+Pajaree&rft.date=2024-01-01&rft.issn=1555-905X&rft.eissn=1555-905X&rft.volume=19&rft.issue=1&rft.spage=35&rft_id=info:doi/10.2215%2FCJN.0000000000000330&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1555-905X&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1555-905X&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1555-905X&client=summon