Evaluating the accuracy and reliability of AI chatbots in disseminating the content of current resuscitation guidelines: a comparative analysis between the ERC 2021 guidelines and both ChatGPTs 3.5 and 4
Uložené v:
| Názov: | Evaluating the accuracy and reliability of AI chatbots in disseminating the content of current resuscitation guidelines: a comparative analysis between the ERC 2021 guidelines and both ChatGPTs 3.5 and 4 |
|---|---|
| Autori: | Stefanie Beck, Manuel Kuhner, Markus Haar, Anne Daubmann, Martin Semmann, Stefan Kluge |
| Zdroj: | Scand J Trauma Resusc Emerg Med Scandinavian Journal of Trauma, Resuscitation and Emergency Medicine, Vol 32, Iss 1, Pp 1-9 (2024) Scandinavian journal of trauma, resuscitation and emergency medicine, 32(1):95 |
| Informácie o vydavateľovi: | Springer Science and Business Media LLC, 2024. |
| Rok vydania: | 2024 |
| Predmety: | RC86-88.9, Artificial Intelligence, Information Dissemination, Resuscitation, Practice Guidelines as Topic, Humans, Reproducibility of Results, Medical emergencies. Critical care. Intensive care. First aid, Information Dissemination/methods [MeSH], Reproducibility of Results [MeSH], Humans [MeSH], Prospective Studies [MeSH], Practice Guidelines as Topic [MeSH], Original Research, Artificial Intelligence [MeSH], Resuscitation/standards [MeSH], Prospective Studies |
| Popis: | Aim of the study Artificial intelligence (AI) chatbots are established as tools for answering medical questions worldwide. Healthcare trainees are increasingly using this cutting-edge technology, although its reliability and accuracy in the context of healthcare remain uncertain. This study evaluated the suitability of Chat-GPT versions 3.5 and 4 for healthcare professionals seeking up-to-date evidence and recommendations for resuscitation by comparing the key messages of the resuscitation guidelines, which methodically set the gold standard of current evidence and recommendations, with the statements of the AI chatbots on this topic. Methods This prospective comparative content analysis was conducted between the 2021 European Resuscitation Council (ERC) guidelines and the responses of two freely available ChatGPT versions (ChatGPT-3.5 and the Bing version of the ChatGPT-4) to questions about the key messages of clinically relevant ERC guideline chapters for adults. (1) The content analysis was performed bidirectionally by independent raters. The completeness and actuality of the AI output were assessed by comparing the key message with the AI-generated statements. (2) The conformity of the AI output was evaluated by comparing the statements of the two ChatGPT versions with the content of the ERC guidelines. Results In response to inquiries about the five chapters, ChatGPT-3.5 generated a total of 60 statements, whereas ChatGPT-4 produced 32 statements. ChatGPT-3.5 did not address 123 key messages, and ChatGPT-4 did not address 132 of the 172 key messages of the ERC guideline chapters. A total of 77% of the ChatGPT-3.5 statements and 84% of the ChatGPT-4 statements were fully in line with the ERC guidelines. The main reason for nonconformity was superficial and incorrect AI statements. The interrater reliability between the two raters, measured by Cohen’s kappa, was greater for ChatGPT-4 (0.56 for completeness and 0.76 for conformity analysis) than for ChatGPT-3.5 (0.48 for completeness and 0.36 for conformity). Conclusion We advise healthcare professionals not to rely solely on the tested AI-based chatbots to keep up to date with the latest evidence, as the relevant texts for the task were not part of the training texts of the underlying LLMs, and the lack of conceptual understanding of AI carries a high risk of spreading misconceptions. Original publications should always be considered for comprehensive understanding. |
| Druh dokumentu: | Article Other literature type |
| Jazyk: | English |
| ISSN: | 1757-7241 |
| DOI: | 10.1186/s13049-024-01266-2 |
| Prístupová URL adresa: | https://pubmed.ncbi.nlm.nih.gov/39327587 https://doaj.org/article/21b28b724a11411e834bf1429d97adc2 https://repository.publisso.de/resource/frl:6497237 |
| Rights: | CC BY URL: http://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (http://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (http://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
| Prístupové číslo: | edsair.doi.dedup.....d69a704ad44a1ecace7f8dc60fa06002 |
| Databáza: | OpenAIRE |
| Abstrakt: | Aim of the study Artificial intelligence (AI) chatbots are established as tools for answering medical questions worldwide. Healthcare trainees are increasingly using this cutting-edge technology, although its reliability and accuracy in the context of healthcare remain uncertain. This study evaluated the suitability of Chat-GPT versions 3.5 and 4 for healthcare professionals seeking up-to-date evidence and recommendations for resuscitation by comparing the key messages of the resuscitation guidelines, which methodically set the gold standard of current evidence and recommendations, with the statements of the AI chatbots on this topic. Methods This prospective comparative content analysis was conducted between the 2021 European Resuscitation Council (ERC) guidelines and the responses of two freely available ChatGPT versions (ChatGPT-3.5 and the Bing version of the ChatGPT-4) to questions about the key messages of clinically relevant ERC guideline chapters for adults. (1) The content analysis was performed bidirectionally by independent raters. The completeness and actuality of the AI output were assessed by comparing the key message with the AI-generated statements. (2) The conformity of the AI output was evaluated by comparing the statements of the two ChatGPT versions with the content of the ERC guidelines. Results In response to inquiries about the five chapters, ChatGPT-3.5 generated a total of 60 statements, whereas ChatGPT-4 produced 32 statements. ChatGPT-3.5 did not address 123 key messages, and ChatGPT-4 did not address 132 of the 172 key messages of the ERC guideline chapters. A total of 77% of the ChatGPT-3.5 statements and 84% of the ChatGPT-4 statements were fully in line with the ERC guidelines. The main reason for nonconformity was superficial and incorrect AI statements. The interrater reliability between the two raters, measured by Cohen’s kappa, was greater for ChatGPT-4 (0.56 for completeness and 0.76 for conformity analysis) than for ChatGPT-3.5 (0.48 for completeness and 0.36 for conformity). Conclusion We advise healthcare professionals not to rely solely on the tested AI-based chatbots to keep up to date with the latest evidence, as the relevant texts for the task were not part of the training texts of the underlying LLMs, and the lack of conceptual understanding of AI carries a high risk of spreading misconceptions. Original publications should always be considered for comprehensive understanding. |
|---|---|
| ISSN: | 17577241 |
| DOI: | 10.1186/s13049-024-01266-2 |
Full Text Finder
Nájsť tento článok vo Web of Science