View in EDS

Evaluating Resident Feedback Using a Large Language Model: Are We Missing Core Competencies?

Saved in:

Bibliographic Details
Title:	Evaluating Resident Feedback Using a Large Language Model: Are We Missing Core Competencies?
Authors:	Ahmad SA; Johns Hopkins University School of Medicine, Baltimore, Maryland, USA., Armache M; Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA., Trakimas DR; Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA., Chen JX; Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA., Galaiya D; Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA.
Source:	The Laryngoscope [Laryngoscope] 2025 Nov; Vol. 135 (11), pp. 4119-4124. Date of Electronic Publication: 2025 Jun 27.
Publication Type:	Journal Article
Language:	English
Journal Info:	Publisher: Wiley-Blackwell Country of Publication: United States NLM ID: 8607378 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1531-4995 (Electronic) Linking ISSN: 0023852X NLM ISO Abbreviation: Laryngoscope Subsets: MEDLINE
Imprint Name(s):	Publication: <2009- >: Philadelphia, PA : Wiley-Blackwell Original Publication: St. Louis, Mo. : [s.n., 1896-
MeSH Terms:	Internship and Residency* , Clinical Competence/standards , Education, Medical, Graduate/methods , Formative Feedback* , Educational Measurement*/methods, Humans ; Language ; Feedback ; Large Language Models
Abstract:	Objectives: Use a large language model (LLM) to examine the content and quality of narrative feedback provided to residents through: (1) an app collecting workplace-based assessments of surgical performance (SIMPL-OR), (2) Objective Structured Assessment of Technical Skills (OSATS), and (3) end-of-rotation (EOR) evaluations. Methods: Narrative feedback provided to residents at a single institution from 2017 to 2021 was examined. Sixty entries (20 of each format) were evaluated by two faculty members on whether they were encouraging, corrective, or specific, and whether they addressed the Core Competencies outlined by the Accreditation Council for Graduate Medical Education. ChatGPT4o was tested on these 60 entries before evaluating the remaining 776 entries. Results: ChatGPT evaluated entries with 90% concordance with faculty (κ = 0.94). Within the 776 feedback entries evaluated by ChatGPT, competencies addressed included: patient care (n = 491, 97% vs. 77% vs. 36% for SIMPL-OR, OSATS, EOR respectively, p < 0.001), practice-based learning (n = 175, 32% vs. 23% vs. 16%, p < 0.001), professionalism (n = 168, 1% vs. 6% vs. 40%, p < 0.001), medical knowledge (n = 95, 7% vs. 8% vs. 17%, p < 0.001), interpersonal and communication skills (n = 59, 3% vs. 3% vs. 12%, p < 0.001), and systems-based practice (n = 31, 4% vs. 2% vs. 5%, p = 0.387). Feedback was "encouraging" in 93% of both SIMPL-OR and OSATS, as compared to 84% of EOR (p < 0.001). Feedback was "corrective" in 71% of SIMPL-OR versus 44% of OSATS versus 24% of EOR (p < 0.001), and "specific" in 97% versus 53% versus 15%, respectively (p < 0.001). Conclusion: Different instruments provided feedback of differing content and quality and a multimodal feedback approach is important. Level of Evidence: N/A. (© 2025 The American Laryngological, Rhinological and Otological Society, Inc.)
References:	Otolaryngol Head Neck Surg. 2019 Dec;161(6):939-945. (PMID: 31405355) Otolaryngol Head Neck Surg. 2022 Aug;167(2):268-273. (PMID: 34609936) Laryngoscope Investig Otolaryngol. 2019 Nov 11;4(6):578-586. (PMID: 31890874) Acad Med. 2019 Dec;94(12):1961-1969. (PMID: 31169541) Med Educ. 2017 Apr;51(4):401-410. (PMID: 28093833) Laryngoscope Investig Otolaryngol. 2022 Feb 01;7(2):404-408. (PMID: 35434323) Br J Surg. 1997 Feb;84(2):273-8. (PMID: 9052454) Fam Med. 2023 Feb;55(2):103-106. (PMID: 36689448) J Surg Res. 2015 Sep;198(1):61-5. (PMID: 26070495) Ann Surg. 2022 Mar 1;275(3):617-620. (PMID: 32511125) Laryngoscope Investig Otolaryngol. 2023 Jun 15;8(4):827-831. (PMID: 37621294) JAMA Netw Open. 2023 Mar 1;6(3):e231204. (PMID: 36862411) West J Emerg Med. 2023 May 05;24(3):479-494. (PMID: 37278777)
Grant Information:	R25 DC021243 United States DC NIDCD NIH HHS; R25 DC021243 United States DC NIDCD NIH HHS
Contributed Indexing:	Keywords: medical education; natural language processing; resident education
Entry Date(s):	Date Created: 20250627 Date Completed: 20251125 Latest Revision: 20251128
Update Code:	20251128
PubMed Central ID:	PMC12221221
DOI:	10.1002/lary.32368
PMID:	40574724
Database:	MEDLINE

Full Text Finder

Nájsť tento článok vo Web of Science

Description
Abstract:	Objectives: Use a large language model (LLM) to examine the content and quality of narrative feedback provided to residents through: (1) an app collecting workplace-based assessments of surgical performance (SIMPL-OR), (2) Objective Structured Assessment of Technical Skills (OSATS), and (3) end-of-rotation (EOR) evaluations.<br />Methods: Narrative feedback provided to residents at a single institution from 2017 to 2021 was examined. Sixty entries (20 of each format) were evaluated by two faculty members on whether they were encouraging, corrective, or specific, and whether they addressed the Core Competencies outlined by the Accreditation Council for Graduate Medical Education. ChatGPT4o was tested on these 60 entries before evaluating the remaining 776 entries.<br />Results: ChatGPT evaluated entries with 90% concordance with faculty (κ = 0.94). Within the 776 feedback entries evaluated by ChatGPT, competencies addressed included: patient care (n = 491, 97% vs. 77% vs. 36% for SIMPL-OR, OSATS, EOR respectively, p < 0.001), practice-based learning (n = 175, 32% vs. 23% vs. 16%, p < 0.001), professionalism (n = 168, 1% vs. 6% vs. 40%, p < 0.001), medical knowledge (n = 95, 7% vs. 8% vs. 17%, p < 0.001), interpersonal and communication skills (n = 59, 3% vs. 3% vs. 12%, p < 0.001), and systems-based practice (n = 31, 4% vs. 2% vs. 5%, p = 0.387). Feedback was "encouraging" in 93% of both SIMPL-OR and OSATS, as compared to 84% of EOR (p < 0.001). Feedback was "corrective" in 71% of SIMPL-OR versus 44% of OSATS versus 24% of EOR (p < 0.001), and "specific" in 97% versus 53% versus 15%, respectively (p < 0.001).<br />Conclusion: Different instruments provided feedback of differing content and quality and a multimodal feedback approach is important.<br />Level of Evidence: N/A.<br /> (© 2025 The American Laryngological, Rhinological and Otological Society, Inc.)
ISSN:	1531-4995
DOI:	10.1002/lary.32368