Glaucoma Detection and Structured OCT Report Generation via a Fine-tuned Multimodal Large Language Model

To develop an explainable multimodal large language model (MM-LLM) that (1) screens optic nerve head (ONH) OCT circle scans for quality and (2) generates structured clinical reports that include glaucoma diagnosis and sector-wise retinal nerve fiber layer (RNFL) thinning assessments. Retrospective c...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:ArXiv.org
Hlavní autoři: Jalili, Jalil, Gavhane, Yashraj, Walker, Evan, Heinke, Anna, Bowd, Christopher, Belghith, Akram, Fazio, Massimo A, Girkin, Christopher A, De Moraes, C Gustavo, Liebmann, Jeffrey M, Baxter, Sally L, Weinreb, Robert N, Zangwill, Linda M, Christopher, Mark
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States 01.10.2025
Témata:
ISSN:2331-8422, 2331-8422
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:To develop an explainable multimodal large language model (MM-LLM) that (1) screens optic nerve head (ONH) OCT circle scans for quality and (2) generates structured clinical reports that include glaucoma diagnosis and sector-wise retinal nerve fiber layer (RNFL) thinning assessments. Retrospective cohort study using longitudinal data from the Diagnostic Innovations in Glaucoma Study (DIGS) and the African Descent and Glaucoma Evaluation Study (ADAGES). 43,849 Spectralis ONH OCT circle scans from 1,310 subjects, including 1,331 glaucomatous and 867 healthy eyes. A MM-LLM (Llama 3.2 Vision-Instruct model) was fine-tuned to generate clinical descriptions of OCT imaging data. Training data included paired OCT images and automatically generated, structured clinical reports that described global and sectoral RNFL thinning. Poor-quality scans were labeled as unusable and paired with a fixed refusal statement. The model was evaluated on a held-out test set for three tasks: quality assessment, glaucoma detection, and RNFL thinning classification across seven anatomical sectors. Evaluation metrics included accuracy, sensitivity, specificity, precision, and F1-score. Model description quality was also evaluated using standard text evaluation metrics (BLEU, ROUGE, METEOR, BERTScore). The model achieved 0.90 accuracy and 0.98 specificity for quality triage. For glaucoma detection, accuracy was 0.86 (sensitivity 0.91, specificity 0.73, F1-score 0.91). RNFL thinning prediction accuracy ranged from 0.83 to 0.94, with highest performance in global and temporal sectors. Text generation scores (mean ± SD) showed strong alignment with reference reports (BLEU: 0.82 ± 0.19; ROUGE-1: 0.94 ± 0.08; ROUGE-2: 0.87 ± 0.17; ROUGE-L: 0.92 ± 0.11; BERTScore-F1: 0.99 ± 0.02). Stratified analysis revealed better RNFL thinning detection in moderate-to-advanced glaucoma cases, especially in temporal sectors, while performance in nasal regions was better for mild cases. The fine-tuned MM-LLM generated accurate clinical descriptions based on OCT imaging. The model achieved high accuracy in identifying image quality issues and detecting glaucoma. The model also provided sectoral descriptions of RNFL thinning to help support clinical OCT evaluation. This approach shows potential as a scalable tool for clinical decision support, but further validation across additional datasets is needed.
Bibliografie:ObjectType-Working Paper/Pre-Print-3
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2331-8422
2331-8422