Enabling Just-in-Time Clinical Oncology Analysis With Large Language Models: Feasibility and Validation Study Using Unstructured Synthetic Data.

Saved in:
Bibliographic Details
Title: Enabling Just-in-Time Clinical Oncology Analysis With Large Language Models: Feasibility and Validation Study Using Unstructured Synthetic Data.
Authors: May P; Department of Internal Medicine III, School of Medicine and Health, TUM University Hospital, Technical University of Munich, Ismaninger Str. 22, Munich, Germany, 49 89-4140-8753., Greß J; Department of Internal Medicine III, School of Medicine and Health, TUM University Hospital, Technical University of Munich, Ismaninger Str. 22, Munich, Germany, 49 89-4140-8753.; MPiriQ Science Technologies GmbH, Munich, Germany., Seidel C; Department of Oncology, Hematology and Bone Marrow Transplantation with Division of Pneumology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany., Sommer S; MVZ Elisenhof, Munich, Germany., Schuler MK; MPiriQ Science Technologies GmbH, Munich, Germany.; Onkologischer Schwerpunkt am Oskar-Helene Heim, Berlin, Germany., Nokodian S; Department of Internal Medicine III, School of Medicine and Health, TUM University Hospital, Technical University of Munich, Ismaninger Str. 22, Munich, Germany, 49 89-4140-8753., Schröder F; MPiriQ Science Technologies GmbH, Munich, Germany., Jung J; Department of Internal Medicine III, School of Medicine and Health, TUM University Hospital, Technical University of Munich, Ismaninger Str. 22, Munich, Germany, 49 89-4140-8753.; Department of Hematology and Medical Oncology, University Medical Center Göttingen, Göttingen, Germany.
Source: JMIR medical informatics [JMIR Med Inform] 2025 Dec 01; Vol. 13, pp. e78332. Date of Electronic Publication: 2025 Dec 01.
Publication Type: Journal Article; Validation Study
Language: English
Journal Info: Publisher: JMIR Publications Country of Publication: Canada NLM ID: 101645109 Publication Model: Electronic Cited Medium: Internet ISSN: 2291-9694 (Electronic) Linking ISSN: 22919694 NLM ISO Abbreviation: JMIR Med Inform Subsets: MEDLINE
Imprint Name(s): Original Publication: Toronto : JMIR Publications, [2013]-
MeSH Terms: Natural Language Processing* , Medical Oncology*/methods, Humans ; Feasibility Studies ; Lung Neoplasms/mortality ; Carcinoma, Non-Small-Cell Lung ; Large Language Models
Abstract: Background: Traditional cancer registries, limited by labor-intensive manual data abstraction and rigid, predefined schemas, often hinder timely and comprehensive oncology research. While large language models (LLMs) have shown promise in automating data extraction, their potential to perform direct, just-in-time (JIT) analysis on unstructured clinical narratives-potentially bypassing intermediate structured databases for many analytical tasks-remains largely unexplored.
Objective: This study aimed to evaluate whether a state-of-the-art LLM (Gemini 2.5 Pro) can enable a JIT clinical oncology analysis paradigm by assessing its ability to (1) perform high-fidelity multiparameter data extraction, (2) answer complex clinical queries directly from raw text, (3) automate multistep survival analyses including executable code generation, and (4) generate novel, clinically plausible hypotheses from free-text documentation.
Methods: A synthetic dataset of 240 unstructured clinical letters from patients with stage IV non-small cell lung cancer (NSCLC), embedding 14 predefined variables, was used. Gemini 2.5 Pro was evaluated on four core JIT capabilities. Performance was measured by using the following metrics: extraction accuracy (compared to human extraction of n=40 letters and across the full n=240 dataset); numerical deviation for direct question answering (n=40 to 240 letters, 5 questions); log-rank P value and Harrell concordance index for LLM-generated versus ground-truth Kaplan-Meier survival analyses (n=160 letters, overall survival and progression-free survival); and correct justification, novelty, and a qualitative evaluation of LLM-generated hypotheses (n=80 and n=160 letters).
Results: For multiparameter extraction from 40 letters, the LLM achieved >99% average accuracy, comparable to human extraction, but in significantly less time (LLM: 3.7 min vs human: 133.8 min). Across the full 240-letter dataset, LLM multiparameter extraction maintained >98% accuracy for most variables. The LLM answered multiconditional clinical queries directly from raw text with a relative deviation rarely exceeding 1.5%, even with up to 240 letters. Crucially, it autonomously performed end-to-end survival analysis, generating text-to-R-code that produced Kaplan-Meier curves statistically indistinguishable from ground truth. Consistent performance was demonstrated on a small validation cohort of 80 synthetic acute myeloid leukemia reports. Stress testing on data with simulated imperfections revealed a key role of a human-in-the-loop to resolve AI-flagged ambiguities. Furthermore, the LLM generated several correctly justified, biologically plausible, and potentially novel hypotheses from datasets up to 80 letters.
Conclusions: This feasibility study demonstrated that a frontier LLM (Gemini 2.5 Pro) can successfully perform high-fidelity data extraction, multiconditional querying, and automated survival analysis directly from unstructured text. These results provide a foundational proof of concept for the JIT clinical analysis approach. However, these findings are confined to synthetic patients, and rigorous validation on real-world clinical data is an essential next step before clinical implementation can be considered.
(© Peter May, Julian Greß, Christoph Seidel, Sebastian Sommer, Markus K Schuler, Sina Nokodian, Florian Schröder, Johannes Jung. Originally published in JMIR Medical Informatics (https://medinform.jmir.org).)
References: JMIR Cancer. 2025 Oct 2;11:e72005. (PMID: 41037674)
JMIR Cancer. 2025 May 15;11:e64697. (PMID: 40372953)
Br J Psychiatry. 2024 Dec;225(6):532-537. (PMID: 39497458)
JMIR Med Inform. 2025 Sep 12;13:e68707. (PMID: 40939201)
Lancet Reg Health Eur. 2024 Sep 06;46:101064. (PMID: 39290808)
JCO Clin Cancer Inform. 2024 Jun;8:e2300091. (PMID: 38857465)
JCO Clin Cancer Inform. 2021 Jun;5:622-630. (PMID: 34097438)
Int J Med Inform. 2014 Sep;83(9):605-23. (PMID: 25008281)
BMC Bioinformatics. 2023 Sep 2;24(1):328. (PMID: 37658330)
J Med Internet Res. 2025 Apr 30;27:e64486. (PMID: 40305085)
JMIR Med Inform. 2025 Oct 17;13:e75556. (PMID: 41105871)
JTO Clin Res Rep. 2022 May 17;3(6):100340. (PMID: 35719866)
JMIR Form Res. 2025 Apr 7;9:e64544. (PMID: 40194317)
JMIR Cancer. 2025 Jan 23;11:e57275. (PMID: 39864093)
Sci Rep. 2024 Dec 28;14(1):30794. (PMID: 39730573)
NPJ Digit Med. 2025 Mar 19;8(1):169. (PMID: 40108434)
J Med Internet Res. 2025 Jan 7;27:e59069. (PMID: 39773666)
IEEE J Biomed Health Inform. 2020 Jul;24(7):1952-1967. (PMID: 32386166)
Nat Med. 2024 Apr;30(4):1134-1142. (PMID: 38413730)
Nat Commun. 2024 Oct 16;15(1):8916. (PMID: 39414770)
NPJ Digit Med. 2024 Sep 20;7(1):257. (PMID: 39304709)
Front Artif Intell. 2025 Feb 05;8:1533508. (PMID: 39974356)
Cancer Res. 2019 Nov 1;79(21):5463-5470. (PMID: 31395609)
JMIR Cancer. 2025 Mar 28;11:e65984. (PMID: 40153782)
Contributed Indexing: Keywords: artificial intelligence; cancer registries; clinical oncology; just-in-time analysis; large language models; natural language processing; real-world evidence; survival analysis; unstructured data
Entry Date(s): Date Created: 20251202 Date Completed: 20251202 Latest Revision: 20251205
Update Code: 20251205
PubMed Central ID: PMC12670046
DOI: 10.2196/78332
PMID: 41328496
Database: MEDLINE
Description
Abstract:Background: Traditional cancer registries, limited by labor-intensive manual data abstraction and rigid, predefined schemas, often hinder timely and comprehensive oncology research. While large language models (LLMs) have shown promise in automating data extraction, their potential to perform direct, just-in-time (JIT) analysis on unstructured clinical narratives-potentially bypassing intermediate structured databases for many analytical tasks-remains largely unexplored.<br />Objective: This study aimed to evaluate whether a state-of-the-art LLM (Gemini 2.5 Pro) can enable a JIT clinical oncology analysis paradigm by assessing its ability to (1) perform high-fidelity multiparameter data extraction, (2) answer complex clinical queries directly from raw text, (3) automate multistep survival analyses including executable code generation, and (4) generate novel, clinically plausible hypotheses from free-text documentation.<br />Methods: A synthetic dataset of 240 unstructured clinical letters from patients with stage IV non-small cell lung cancer (NSCLC), embedding 14 predefined variables, was used. Gemini 2.5 Pro was evaluated on four core JIT capabilities. Performance was measured by using the following metrics: extraction accuracy (compared to human extraction of n=40 letters and across the full n=240 dataset); numerical deviation for direct question answering (n=40 to 240 letters, 5 questions); log-rank P value and Harrell concordance index for LLM-generated versus ground-truth Kaplan-Meier survival analyses (n=160 letters, overall survival and progression-free survival); and correct justification, novelty, and a qualitative evaluation of LLM-generated hypotheses (n=80 and n=160 letters).<br />Results: For multiparameter extraction from 40 letters, the LLM achieved &gt;99% average accuracy, comparable to human extraction, but in significantly less time (LLM: 3.7 min vs human: 133.8 min). Across the full 240-letter dataset, LLM multiparameter extraction maintained &gt;98% accuracy for most variables. The LLM answered multiconditional clinical queries directly from raw text with a relative deviation rarely exceeding 1.5%, even with up to 240 letters. Crucially, it autonomously performed end-to-end survival analysis, generating text-to-R-code that produced Kaplan-Meier curves statistically indistinguishable from ground truth. Consistent performance was demonstrated on a small validation cohort of 80 synthetic acute myeloid leukemia reports. Stress testing on data with simulated imperfections revealed a key role of a human-in-the-loop to resolve AI-flagged ambiguities. Furthermore, the LLM generated several correctly justified, biologically plausible, and potentially novel hypotheses from datasets up to 80 letters.<br />Conclusions: This feasibility study demonstrated that a frontier LLM (Gemini 2.5 Pro) can successfully perform high-fidelity data extraction, multiconditional querying, and automated survival analysis directly from unstructured text. These results provide a foundational proof of concept for the JIT clinical analysis approach. However, these findings are confined to synthetic patients, and rigorous validation on real-world clinical data is an essential next step before clinical implementation can be considered.<br /> (© Peter May, Julian Greß, Christoph Seidel, Sebastian Sommer, Markus K Schuler, Sina Nokodian, Florian Schröder, Johannes Jung. Originally published in JMIR Medical Informatics (https://medinform.jmir.org).)
ISSN:2291-9694
DOI:10.2196/78332