Performance of ChatGPT on Nephrology Test Questions

ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains und...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Clinical journal of the American Society of Nephrology Ročník 19; číslo 1; s. 35
Hlavní autoři: Miao, Jing, Thongprayoon, Charat, Garcia Valencia, Oscar A, Krisanapan, Pajaree, Sheikh, Mohammad S, Davis, Paul W, Mekraksakit, Poemlarp, Suarez, Maria Gonzalez, Craici, Iasmina M, Cheungpasitporn, Wisit
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States 01.01.2024
Témata:
ISSN:1555-905X, 1555-905X
On-line přístup:Zjistit podrobnosti o přístupu
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains undetermined. This study assessed ChatGPT's capabilities in answering nephrology test questions. Questions sourced from Nephrology Self-Assessment Program and Kidney Self-Assessment Program were used, each with multiple-choice single-answer questions. Questions containing visual elements were excluded. Each question bank was run twice using GPT-3.5 and GPT-4. Total accuracy rate, defined as the percentage of correct answers obtained by ChatGPT in either the first or second run, and the total concordance, defined as the percentage of identical answers provided by ChatGPT during both runs, regardless of their correctness, were used to assess its performance. A comprehensive assessment was conducted on a set of 975 questions, comprising 508 questions from Nephrology Self-Assessment Program and 467 from Kidney Self-Assessment Program. GPT-3.5 resulted in a total accuracy rate of 51%. Notably, the employment of Nephrology Self-Assessment Program yielded a higher accuracy rate compared with Kidney Self-Assessment Program (58% versus 44%; P < 0.001). The total concordance rate across all questions was 78%, with correct answers exhibiting a higher concordance rate (84%) compared with incorrect answers (73%) ( P < 0.001). When examining various nephrology subfields, the total accuracy rates were relatively lower in electrolyte and acid-base disorder, glomerular disease, and kidney-related bone and stone disorders. The total accuracy rate of GPT-4's response was 74%, higher than GPT-3.5 ( P < 0.001) but remained below the passing threshold and average scores of nephrology examinees (77%). ChatGPT exhibited limitations regarding accuracy and repeatability when addressing nephrology-related questions. Variations in performance were evident across various subfields.
AbstractList ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains undetermined. This study assessed ChatGPT's capabilities in answering nephrology test questions.BACKGROUNDChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains undetermined. This study assessed ChatGPT's capabilities in answering nephrology test questions.Questions sourced from Nephrology Self-Assessment Program and Kidney Self-Assessment Program were used, each with multiple-choice single-answer questions. Questions containing visual elements were excluded. Each question bank was run twice using GPT-3.5 and GPT-4. Total accuracy rate, defined as the percentage of correct answers obtained by ChatGPT in either the first or second run, and the total concordance, defined as the percentage of identical answers provided by ChatGPT during both runs, regardless of their correctness, were used to assess its performance.METHODSQuestions sourced from Nephrology Self-Assessment Program and Kidney Self-Assessment Program were used, each with multiple-choice single-answer questions. Questions containing visual elements were excluded. Each question bank was run twice using GPT-3.5 and GPT-4. Total accuracy rate, defined as the percentage of correct answers obtained by ChatGPT in either the first or second run, and the total concordance, defined as the percentage of identical answers provided by ChatGPT during both runs, regardless of their correctness, were used to assess its performance.A comprehensive assessment was conducted on a set of 975 questions, comprising 508 questions from Nephrology Self-Assessment Program and 467 from Kidney Self-Assessment Program. GPT-3.5 resulted in a total accuracy rate of 51%. Notably, the employment of Nephrology Self-Assessment Program yielded a higher accuracy rate compared with Kidney Self-Assessment Program (58% versus 44%; P < 0.001). The total concordance rate across all questions was 78%, with correct answers exhibiting a higher concordance rate (84%) compared with incorrect answers (73%) ( P < 0.001). When examining various nephrology subfields, the total accuracy rates were relatively lower in electrolyte and acid-base disorder, glomerular disease, and kidney-related bone and stone disorders. The total accuracy rate of GPT-4's response was 74%, higher than GPT-3.5 ( P < 0.001) but remained below the passing threshold and average scores of nephrology examinees (77%).RESULTSA comprehensive assessment was conducted on a set of 975 questions, comprising 508 questions from Nephrology Self-Assessment Program and 467 from Kidney Self-Assessment Program. GPT-3.5 resulted in a total accuracy rate of 51%. Notably, the employment of Nephrology Self-Assessment Program yielded a higher accuracy rate compared with Kidney Self-Assessment Program (58% versus 44%; P < 0.001). The total concordance rate across all questions was 78%, with correct answers exhibiting a higher concordance rate (84%) compared with incorrect answers (73%) ( P < 0.001). When examining various nephrology subfields, the total accuracy rates were relatively lower in electrolyte and acid-base disorder, glomerular disease, and kidney-related bone and stone disorders. The total accuracy rate of GPT-4's response was 74%, higher than GPT-3.5 ( P < 0.001) but remained below the passing threshold and average scores of nephrology examinees (77%).ChatGPT exhibited limitations regarding accuracy and repeatability when addressing nephrology-related questions. Variations in performance were evident across various subfields.CONCLUSIONSChatGPT exhibited limitations regarding accuracy and repeatability when addressing nephrology-related questions. Variations in performance were evident across various subfields.
ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains undetermined. This study assessed ChatGPT's capabilities in answering nephrology test questions. Questions sourced from Nephrology Self-Assessment Program and Kidney Self-Assessment Program were used, each with multiple-choice single-answer questions. Questions containing visual elements were excluded. Each question bank was run twice using GPT-3.5 and GPT-4. Total accuracy rate, defined as the percentage of correct answers obtained by ChatGPT in either the first or second run, and the total concordance, defined as the percentage of identical answers provided by ChatGPT during both runs, regardless of their correctness, were used to assess its performance. A comprehensive assessment was conducted on a set of 975 questions, comprising 508 questions from Nephrology Self-Assessment Program and 467 from Kidney Self-Assessment Program. GPT-3.5 resulted in a total accuracy rate of 51%. Notably, the employment of Nephrology Self-Assessment Program yielded a higher accuracy rate compared with Kidney Self-Assessment Program (58% versus 44%; P < 0.001). The total concordance rate across all questions was 78%, with correct answers exhibiting a higher concordance rate (84%) compared with incorrect answers (73%) ( P < 0.001). When examining various nephrology subfields, the total accuracy rates were relatively lower in electrolyte and acid-base disorder, glomerular disease, and kidney-related bone and stone disorders. The total accuracy rate of GPT-4's response was 74%, higher than GPT-3.5 ( P < 0.001) but remained below the passing threshold and average scores of nephrology examinees (77%). ChatGPT exhibited limitations regarding accuracy and repeatability when addressing nephrology-related questions. Variations in performance were evident across various subfields.
Author Krisanapan, Pajaree
Craici, Iasmina M
Cheungpasitporn, Wisit
Sheikh, Mohammad S
Davis, Paul W
Suarez, Maria Gonzalez
Mekraksakit, Poemlarp
Garcia Valencia, Oscar A
Thongprayoon, Charat
Miao, Jing
Author_xml – sequence: 1
  givenname: Jing
  orcidid: 0000-0003-0642-9740
  surname: Miao
  fullname: Miao, Jing
  organization: Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, Minnesota
– sequence: 2
  givenname: Charat
  orcidid: 0000-0002-8313-3604
  surname: Thongprayoon
  fullname: Thongprayoon, Charat
– sequence: 3
  givenname: Oscar A
  orcidid: 0000-0003-0186-9448
  surname: Garcia Valencia
  fullname: Garcia Valencia, Oscar A
– sequence: 4
  givenname: Pajaree
  orcidid: 0000-0002-2888-881
  surname: Krisanapan
  fullname: Krisanapan, Pajaree
– sequence: 5
  givenname: Mohammad S
  orcidid: 0009-0006-9388-8505
  surname: Sheikh
  fullname: Sheikh, Mohammad S
– sequence: 6
  givenname: Paul W
  orcidid: 0000-0003-4637-3198
  surname: Davis
  fullname: Davis, Paul W
– sequence: 7
  givenname: Poemlarp
  orcidid: 0000-0002-2127-2529
  surname: Mekraksakit
  fullname: Mekraksakit, Poemlarp
– sequence: 8
  givenname: Maria Gonzalez
  orcidid: 0000-0002-8930-4611
  surname: Suarez
  fullname: Suarez, Maria Gonzalez
– sequence: 9
  givenname: Iasmina M
  surname: Craici
  fullname: Craici, Iasmina M
– sequence: 10
  givenname: Wisit
  orcidid: 0000-0001-9954-9711
  surname: Cheungpasitporn
  fullname: Cheungpasitporn, Wisit
BackLink https://www.ncbi.nlm.nih.gov/pubmed/37851468$$D View this record in MEDLINE/PubMed
BookMark eNpNT8tOwzAQtFARfcAfIJQjl5T1K9keUQQFVJUiBYlb5NgODUrsEjeH_j1BFKl72J3DzOzMlIycd5aQawpzxqi8y17WczgdzuGMTKiUMl6A_Bid4DGZhvAFIARn8oKMeYqSigQnhG9sV_muVU7byFdRtlX75SaPvIvWdrftfOM_D1Fuwz5664ddexcuyXmlmmCvjndG3h8f8uwpXr0un7P7VayFRIgNsARoZQwapVADApZpWZm0ZCVQmfAKuB4iKUw4CrtIjTYCGRoYhIjAZuT2z3fX-e_f50VbB22bRjnr-1AwTDGlMGgG6s2R2petNcWuq1vVHYr_ouwH1jNVqQ
CitedBy_id crossref_primary_10_2215_CJN_0000000000000378
crossref_primary_10_3390_jpm13121681
crossref_primary_10_1002_ca_70012
crossref_primary_10_3748_wjg_v31_i3_101092
crossref_primary_10_1177_20552076241277458
crossref_primary_10_1177_20552076251342067
crossref_primary_10_3390_clinpract14040109
crossref_primary_10_1007_s11695_025_08115_w
crossref_primary_10_1007_s12195_024_00820_3
crossref_primary_10_1016_j_jcrc_2024_155010
crossref_primary_10_1177_20552076251326014
crossref_primary_10_3390_medicina60030445
crossref_primary_10_1055_a_2405_0138
crossref_primary_10_1111_ctr_15466
crossref_primary_10_1007_s40620_024_01974_z
crossref_primary_10_3389_fonc_2025_1516264
crossref_primary_10_3390_medicines10100058
crossref_primary_10_3390_jpm13121679
crossref_primary_10_1093_ckj_sfae193
crossref_primary_10_1111_ctr_70303
crossref_primary_10_1111_jch_14822
crossref_primary_10_3390_jpm14010107
crossref_primary_10_3390_clinpract14010008
crossref_primary_10_3390_healthcare12222305
crossref_primary_10_1080_0886022X_2024_2337291
crossref_primary_10_2196_57157
crossref_primary_10_3390_jpm14030233
crossref_primary_10_1097_MCC_0000000000001202
crossref_primary_10_1186_s12909_025_07427_w
crossref_primary_10_3390_healthcare13010057
crossref_primary_10_1111_petr_70068
crossref_primary_10_3390_medicina60010148
crossref_primary_10_3389_fdgth_2024_1366967
crossref_primary_10_1177_20552076241248082
crossref_primary_10_1007_s12015_024_10814_3
crossref_primary_10_1080_0142159X_2025_2458808
crossref_primary_10_1038_s41598_025_99774_3
crossref_primary_10_1177_20552076241283256
crossref_primary_10_1016_j_heliyon_2024_e34851
crossref_primary_10_1177_20552076251328807
crossref_primary_10_1159_000541168
crossref_primary_10_1080_0886022X_2024_2402075
ContentType Journal Article
Copyright Copyright © 2023 by the American Society of Nephrology.
Copyright_xml – notice: Copyright © 2023 by the American Society of Nephrology.
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.2215/CJN.0000000000000330
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Medicine
EISSN 1555-905X
ExternalDocumentID 37851468
Genre Journal Article
GroupedDBID ---
0R~
29B
2WC
53G
5GY
5VS
6PF
AAOCO
AAUIN
AAWTL
ABBLC
ABJNI
ABXYN
ACLDA
ACZKN
ADBBV
ADSXY
AENEX
AFEXH
AFNMH
AHOMT
AHQVU
ALMA_UNASSIGNED_HOLDINGS
BAWUL
BTFSW
BYPQX
CGR
CS3
CUY
CVF
DIK
DU5
EBS
ECM
EIF
EJD
ERAAH
F5P
GX1
HYE
HZ~
KQ8
NPM
O9-
OK1
OVD
P2P
RHI
RPM
TEORI
TNP
TR2
W8F
WOQ
7X8
ID FETCH-LOGICAL-c4580-d02601fdd8daa8c0808b7bfd7b2b01563f03c044a86384e97dcd4828d00268802
IEDL.DBID 7X8
ISICitedReferencesCount 53
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001096675200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1555-905X
IngestDate Sat Nov 01 14:57:29 EDT 2025
Thu Aug 28 04:24:46 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License Copyright © 2023 by the American Society of Nephrology.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c4580-d02601fdd8daa8c0808b7bfd7b2b01563f03c044a86384e97dcd4828d00268802
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0002-2127-2529
0009-0006-9388-8505
0000-0002-8313-3604
0000-0001-9954-9711
0000-0002-8930-4611
0000-0003-4637-3198
0000-0002-2888-881
0000-0003-0642-9740
0000-0003-0186-9448
OpenAccessLink https://www.ncbi.nlm.nih.gov/pmc/articles/10843340
PMID 37851468
PQID 2878710482
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2878710482
pubmed_primary_37851468
PublicationCentury 2000
PublicationDate 2024-January
PublicationDateYYYYMMDD 2024-01-01
PublicationDate_xml – month: 01
  year: 2024
  text: 2024-January
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Clinical journal of the American Society of Nephrology
PublicationTitleAlternate Clin J Am Soc Nephrol
PublicationYear 2024
SSID ssj0044325
Score 2.5937726
Snippet ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 35
SubjectTerms Educational Measurement - methods
Humans
Machine Learning
Nephrology - standards
Surveys and Questionnaires
Title Performance of ChatGPT on Nephrology Test Questions
URI https://www.ncbi.nlm.nih.gov/pubmed/37851468
https://www.proquest.com/docview/2878710482
Volume 19
WOSCitedRecordID wos001096675200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT8MwDLaAIcSF92O8FCSu0boma9MTQhMDIVb1MKTeqrRuBJd20MHvx0k7dkJCoofe2kaOHX-2688AN9oIjFSuLU-l4BJLzXU0RB4QXEYtKWIxjmf2OYxjlaZR0iXcmu63yuWZ6A5qrAubIx8QsidsT_rm387fuZ0aZaur3QiNdegJgjJWq8P0p4ogpXBDV8lljnjkjdK2dc4nLzcYP8UtdeHyEsL7HWQ6ZzPZ_e8y92Cng5nsrtWLfVgrqwPYmnaF9EMQyaphgNWGjV_14iGZsbpicUn765LtbEbfZS4lapXzCF4m97PxI-_mJ_BCjpTH0fGFGUSFWitLKa7yMDcY5n5uO6iF8URBUtKKjFCWUYgF0sIV2sCM7No_ho2qrspTYAZJQMNAIeZaFl5BsILcGkXlkl4YhLIP10txZKSftuigq7L-bLKVQPpw0so0m7dEGpkICe_JQJ394elz2PYJT7TZjwvoGbLO8hI2i6_FW_Nx5Tae7nEy_QZPArTT
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Performance+of+ChatGPT+on+Nephrology+Test+Questions&rft.jtitle=Clinical+journal+of+the+American+Society+of+Nephrology&rft.au=Miao%2C+Jing&rft.au=Thongprayoon%2C+Charat&rft.au=Garcia+Valencia%2C+Oscar+A&rft.au=Krisanapan%2C+Pajaree&rft.date=2024-01-01&rft.issn=1555-905X&rft.eissn=1555-905X&rft.volume=19&rft.issue=1&rft.spage=35&rft_id=info:doi/10.2215%2FCJN.0000000000000330&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1555-905X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1555-905X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1555-905X&client=summon