Criteria2Query 3.0: Leveraging generative large language models for clinical trial eligibility query generation.

Uloženo v:
Podrobná bibliografie
Název: Criteria2Query 3.0: Leveraging generative large language models for clinical trial eligibility query generation.
Autoři: Park J; Department of Biomedical Informatics, Columbia University, New York, United States., Fang Y; Department of Biomedical Informatics, Columbia University, New York, United States., Ta C; Department of Biomedical Informatics, Columbia University, New York, United States., Zhang G; Department of Biomedical Informatics, Columbia University, New York, United States., Idnay B; Department of Biomedical Informatics, Columbia University, New York, United States., Chen F; Department of Biomedical Informatics, Columbia University, New York, United States., Feng D; Department of Biomedical Informatics, Columbia University, New York, United States., Shyu R; Department of Biomedical Informatics, Columbia University, New York, United States., Gordon ER; Columbia University Vagelos College of Physicians and Surgeons, New York, United States., Spotnitz M; Department of Biomedical Informatics, Columbia University, New York, United States., Weng C; Department of Biomedical Informatics, Columbia University, New York, United States. Electronic address: cw2384@cumc.columbia.edu.
Zdroj: Journal of biomedical informatics [J Biomed Inform] 2024 Jun; Vol. 154, pp. 104649. Date of Electronic Publication: 2024 Apr 30.
Způsob vydávání: Journal Article; Research Support, N.I.H., Extramural
Jazyk: English
Informace o časopise: Publisher: Elsevier Country of Publication: United States NLM ID: 100970413 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1532-0480 (Electronic) Linking ISSN: 15320464 NLM ISO Abbreviation: J Biomed Inform Subsets: MEDLINE
Imprint Name(s): Publication: Orlando : Elsevier
Original Publication: San Diego, CA : Academic Press, c2001-
Výrazy ze slovníku MeSH: Clinical Trials as Topic*, Humans ; Natural Language Processing ; Software ; Patient Selection
Abstrakt: Objective: Automated identification of eligible patients is a bottleneck of clinical research. We propose Criteria2Query (C2Q) 3.0, a system that leverages GPT-4 for the semi-automatic transformation of clinical trial eligibility criteria text into executable clinical database queries.
Materials and Methods: C2Q 3.0 integrated three GPT-4 prompts for concept extraction, SQL query generation, and reasoning. Each prompt was designed and evaluated separately. The concept extraction prompt was benchmarked against manual annotations from 20 clinical trials by two evaluators, who later also measured SQL generation accuracy and identified errors in GPT-generated SQL queries from 5 clinical trials. The reasoning prompt was assessed by three evaluators on four metrics: readability, correctness, coherence, and usefulness, using corrected SQL queries and an open-ended feedback questionnaire.
Results: Out of 518 concepts from 20 clinical trials, GPT-4 achieved an F1-score of 0.891 in concept extraction. For SQL generation, 29 errors spanning seven categories were detected, with logic errors being the most common (n = 10; 34.48 %). Reasoning evaluations yielded a high coherence rating, with the mean score being 4.70 but relatively lower readability, with a mean of 3.95. Mean scores of correctness and usefulness were identified as 3.97 and 4.37, respectively.
Conclusion: GPT-4 significantly improves the accuracy of extracting clinical trial eligibility criteria concepts in C2Q 3.0. Continued research is warranted to ensure the reliability of large language models.
(Copyright © 2024 Elsevier Inc. All rights reserved.)
Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References: J Am Med Inform Assoc. 2017 Nov 01;24(6):1062-1071. (PMID: 28379377)
Biochem Med (Zagreb). 2012;22(3):276-82. (PMID: 23092060)
NPJ Digit Med. 2023 Aug 24;6(1):158. (PMID: 37620423)
N Engl J Med. 2023 Mar 30;388(13):1233-1239. (PMID: 36988602)
J Am Med Inform Assoc. 2022 Jun 14;29(7):1161-1171. (PMID: 35426943)
Nat Med. 2025 Jan 8;:. (PMID: 39779926)
J Am Med Inform Assoc. 2019 Nov 1;26(11):1297-1304. (PMID: 31265066)
Int J Med Inform. 2023 Mar;171:104985. (PMID: 36638583)
Summit Transl Bioinform. 2010 Mar 01;2010:46-50. (PMID: 21347148)
Stud Health Technol Inform. 2022 Jun 6;290:297-300. (PMID: 35673021)
Nat Med. 2023 Jul;29(7):1593-1594. (PMID: 37221382)
Stud Health Technol Inform. 2015;216:574-8. (PMID: 26262116)
Vis Comput Ind Biomed Art. 2023 May 18;6(1):9. (PMID: 37198498)
J Am Med Inform Assoc. 2011 Dec;18 Suppl 1:i116-24. (PMID: 21807647)
Br Med Bull. 2021 Sep 10;139(1):4-15. (PMID: 34405854)
J Am Med Inform Assoc. 2019 Apr 1;26(4):294-305. (PMID: 30753493)
Stud Health Technol Inform. 2021 May 27;281:984-988. (PMID: 34042820)
Brief Bioinform. 2023 Nov 22;25(1):. (PMID: 38168838)
Grant Information: R01 LM009886 United States LM NLM NIH HHS; R01 LM014344 United States LM NLM NIH HHS; UL1 TR001873 United States TR NCATS NIH HHS
Contributed Indexing: Keywords: Artificial intelligence; ChatGPT; Eligibility prescreening; Human–computer collaboration; Large language models
Entry Date(s): Date Created: 20240502 Date Completed: 20240527 Latest Revision: 20250602
Update Code: 20250602
PubMed Central ID: PMC11129920
DOI: 10.1016/j.jbi.2024.104649
PMID: 38697494
Databáze: MEDLINE
Popis
Abstrakt:Objective: Automated identification of eligible patients is a bottleneck of clinical research. We propose Criteria2Query (C2Q) 3.0, a system that leverages GPT-4 for the semi-automatic transformation of clinical trial eligibility criteria text into executable clinical database queries.<br />Materials and Methods: C2Q 3.0 integrated three GPT-4 prompts for concept extraction, SQL query generation, and reasoning. Each prompt was designed and evaluated separately. The concept extraction prompt was benchmarked against manual annotations from 20 clinical trials by two evaluators, who later also measured SQL generation accuracy and identified errors in GPT-generated SQL queries from 5 clinical trials. The reasoning prompt was assessed by three evaluators on four metrics: readability, correctness, coherence, and usefulness, using corrected SQL queries and an open-ended feedback questionnaire.<br />Results: Out of 518 concepts from 20 clinical trials, GPT-4 achieved an F1-score of 0.891 in concept extraction. For SQL generation, 29 errors spanning seven categories were detected, with logic errors being the most common (n = 10; 34.48 %). Reasoning evaluations yielded a high coherence rating, with the mean score being 4.70 but relatively lower readability, with a mean of 3.95. Mean scores of correctness and usefulness were identified as 3.97 and 4.37, respectively.<br />Conclusion: GPT-4 significantly improves the accuracy of extracting clinical trial eligibility criteria concepts in C2Q 3.0. Continued research is warranted to ensure the reliability of large language models.<br /> (Copyright © 2024 Elsevier Inc. All rights reserved.)
ISSN:1532-0480
DOI:10.1016/j.jbi.2024.104649