Zobrazit v EDS

Evaluation of AI models for radiology exam preparation: DeepSeek vs. ChatGPT-3.5.

Uloženo v:

Podrobná bibliografie
Název:	Evaluation of AI models for radiology exam preparation: DeepSeek vs. ChatGPT-3.5.
Autoři:	Hu N; Department of Radiology, The Affiliated Hospital of Guizhou Medical University, Guiyang, Guizhou Province, People's Republic of China., Luo Y; Department of Anesthesiology, Guizhou Provincial People's Hospital, Guiyang, Guizhou Province, People's Republic of China., Lei P; Department of Radiology, The Affiliated Hospital of Guizhou Medical University, Guiyang, Guizhou Province, People's Republic of China.
Zdroj:	Medical education online [Med Educ Online] 2025 Dec 31; Vol. 30 (1), pp. 2589679. Date of Electronic Publication: 2025 Nov 28.
Způsob vydávání:	Journal Article; Comparative Study
Jazyk:	English
Informace o časopise:	Publisher: Taylor & Francis Country of Publication: United States NLM ID: 9806550 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1087-2981 (Electronic) Linking ISSN: 10872981 NLM ISO Abbreviation: Med Educ Online Subsets: MEDLINE
Imprint Name(s):	Publication: 2016- : Philadelphia, PA : Taylor & Francis Original Publication: [E. Lansing, MI] : Medical Education Online, [1996-
Výrazy ze slovníku MeSH:	Educational Measurement/methods , Radiology/education , Artificial Intelligence*, Humans ; Generative Artificial Intelligence
Abstrakt:	The rapid advancement of artificial intelligence (AI) chatbots has generated significant interest regarding their potential applications within medical education. This study sought to assess the performance of the open-source large language model DeepSeek-V3 in answering radiology board-style questions and to compare its accuracy with that of ChatGPT-3.5.A total of 161 questions (comprising 207 items) were randomly selected from the Exercise Book for the National Senior Health Professional Qualification Examination: Radiology . The question set included single-choice, multiple-choice, shared-stem, and case analysis questions. Both DeepSeek-V3 and ChatGPT-3.5 were evaluated using the same question set over a seven-day testing period. Response accuracy was systematically assessed, and statistical analyses were performed using Pearson's chi-square test and Fisher's exact test.DeepSeek-V3 achieved an overall accuracy of 72%, which was significantly higher than the 55.6% accuracy achieved by ChatGPT-3.5 ( P < 0.001). Performance analysis by question type revealed DeepSeek's superior accuracy in single-choice questions (87.1%), though with comparatively lower performance in multiple-choice (55.7%) and case analysis questions (68.0%). Across clinical subspecialties, DeepSeek consistently outperformed ChatGPT, particularly in peripheral nervous system ( P = 0.003), respiratory system ( P = 0.008), circulatory system ( P = 0.012), and musculoskeletal system ( P = 0.021) domains.In conclusion, DeepSeek demonstrates considerable potential as an educational tool in radiology, particularly for knowledge recall and foundational learning applications. However, its relatively weaker performance on higher-order cognitive tasks and complex question formats suggests the need for further model refinement. Future research should investigate DeepSeek's capability in processing image-based questions and perform comparative analyses with more advanced models (e.g., GPT-5) to better evaluate its potential for medical education.
References:	Int J Med Inform. 2025 Jun;198:105871. (PMID: 40107040) Nature. 2025 Feb;638(8049):13-14. (PMID: 39849139) Healthcare (Basel). 2023 Mar 19;11(6):. (PMID: 36981544) J Surg Educ. 2024 Nov;81(11):1645-1649. (PMID: 39284250) Acad Med. 2024 Feb 1;99(2):192-197. (PMID: 37934828) BMC Med Educ. 2025 Feb 27;25(1):321. (PMID: 40016760) J Am Dent Assoc. 2023 Nov;154(11):970-974. (PMID: 37676187) J Biomed Inform. 2025 Mar;163:104791. (PMID: 39938624) J Chin Med Assoc. 2025 Apr 1;88(4):338-339. (PMID: 39972548) JMIR Med Educ. 2023 Feb 8;9:e45312. (PMID: 36753318) Nature. 2025 Jan 29;:. (PMID: 39881178) Indian J Radiol Imaging. 2024 Mar 25;34(3):574-575. (PMID: 38912242) OTO Open. 2023 Nov 29;7(4):e98. (PMID: 38034065) Med Sci Educ. 2024 Aug 17;34(6):1571-1576. (PMID: 39758489) Indian J Radiol Imaging. 2023 Dec 29;34(2):276-282. (PMID: 38549897) Radiology. 2024 Sep;312(3):e240153. (PMID: 39225605) Lancet Digit Health. 2023 Mar;5(3):e107-e108. (PMID: 36754724) JMIR Med Educ. 2023 Mar 6;9:e46885. (PMID: 36863937) Indian J Radiol Imaging. 2024 Nov 04;35(2):287-294. (PMID: 40297110) JMIR Med Educ. 2023 Aug 14;9:e50945. (PMID: 37578830) Indian J Radiol Imaging. 2024 Jul 04;34(4):653-660. (PMID: 39318561) PLOS Digit Health. 2023 Feb 9;2(2):e0000198. (PMID: 36812645)
Contributed Indexing:	Keywords: Artificial intelligence; ChatGPT; DeepSeek; medical imaging education; radiology examination
Entry Date(s):	Date Created: 20251128 Date Completed: 20251128 Latest Revision: 20251203
Update Code:	20251203
PubMed Central ID:	PMC12667340
DOI:	10.1080/10872981.2025.2589679
PMID:	41311245
Databáze:	MEDLINE

Full Text Finder

Nájsť tento článok vo Web of Science

Popis
Abstrakt:	The rapid advancement of artificial intelligence (AI) chatbots has generated significant interest regarding their potential applications within medical education. This study sought to assess the performance of the open-source large language model DeepSeek-V3 in answering radiology board-style questions and to compare its accuracy with that of ChatGPT-3.5.A total of 161 questions (comprising 207 items) were randomly selected from the Exercise Book for the National Senior Health Professional Qualification Examination: Radiology . The question set included single-choice, multiple-choice, shared-stem, and case analysis questions. Both DeepSeek-V3 and ChatGPT-3.5 were evaluated using the same question set over a seven-day testing period. Response accuracy was systematically assessed, and statistical analyses were performed using Pearson's chi-square test and Fisher's exact test.DeepSeek-V3 achieved an overall accuracy of 72%, which was significantly higher than the 55.6% accuracy achieved by ChatGPT-3.5 ( P < 0.001). Performance analysis by question type revealed DeepSeek's superior accuracy in single-choice questions (87.1%), though with comparatively lower performance in multiple-choice (55.7%) and case analysis questions (68.0%). Across clinical subspecialties, DeepSeek consistently outperformed ChatGPT, particularly in peripheral nervous system ( P = 0.003), respiratory system ( P = 0.008), circulatory system ( P = 0.012), and musculoskeletal system ( P = 0.021) domains.In conclusion, DeepSeek demonstrates considerable potential as an educational tool in radiology, particularly for knowledge recall and foundational learning applications. However, its relatively weaker performance on higher-order cognitive tasks and complex question formats suggests the need for further model refinement. Future research should investigate DeepSeek's capability in processing image-based questions and perform comparative analyses with more advanced models (e.g., GPT-5) to better evaluate its potential for medical education.
ISSN:	1087-2981
DOI:	10.1080/10872981.2025.2589679