ZeroTuneBio NER: A three-stage framework for zero-shot and zero-tuning biomedical entity extraction using large language models and prompt engineering.

Uloženo v:
Podrobná bibliografie
Název: ZeroTuneBio NER: A three-stage framework for zero-shot and zero-tuning biomedical entity extraction using large language models and prompt engineering.
Autoři: Qin M; Department of Dermatology, Huashan Hospital, Shanghai Institute of Dermatology, Fudan University, Shanghai, China; Greater Bay Area Institute of Precision Medicine, School of Life Sciences, Fudan University, Shanghai, China., Feng L; Greater Bay Area Institute of Precision Medicine, School of Life Sciences, Fudan University, Shanghai, China., Lu J; School of Life Sciences, Fudan University, Shanghai, China., Sun Z; School of Life Sciences, Fudan University, Shanghai, China., Yu Z; Greater Bay Area Institute of Precision Medicine, School of Life Sciences, Fudan University, Shanghai, China., Han L; Department of Dermatology, Huashan Hospital, Shanghai Institute of Dermatology, Fudan University, Shanghai, China; Greater Bay Area Institute of Precision Medicine, School of Life Sciences, Fudan University, Shanghai, China. Electronic address: hanlianyi@fudan.edu.cn.
Zdroj: Computer methods and programs in biomedicine [Comput Methods Programs Biomed] 2025 Dec; Vol. 272, pp. 109070. Date of Electronic Publication: 2025 Sep 05.
Způsob vydávání: Journal Article
Jazyk: English
Informace o časopise: Publisher: Elsevier Scientific Publishers Country of Publication: Ireland NLM ID: 8506513 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1872-7565 (Electronic) Linking ISSN: 01692607 NLM ISO Abbreviation: Comput Methods Programs Biomed Subsets: MEDLINE
Imprint Name(s): Publication: Limerick : Elsevier Scientific Publishers
Original Publication: Amsterdam : Elsevier Science Publishers, c1984-
Výrazy ze slovníku MeSH: Natural Language Processing* , Data Mining*/methods, Humans ; Algorithms ; Language ; Large Language Models
Abstrakt: Competing Interests: Declaration of competing interest All authors of this study declare that there are no financial, personal, or other interests that could potentially affect the objectivity and fairness of the research results in relation to the content of this study. None of the authors has any financial relationships with organizations that may have an interest in the research findings. There are no relevant patents, products under development, or marketed products, and no funding that could compromise the fairness of the research has been accepted.
Objective: This study aims to (1) enhance the performance of large language models (LLMs) in biomedical entity extraction, (2) investigate zero-shot named entity recognition (NER) capabilities without fine-tuning, and (3) compare the proposed framework with existing models and human annotation methods. Additionally, we analyze discrepancies between human and LLM-generated annotations to refine manual labeling processes for specialized datasets.
Materials and Methods: We propose ZeroTuneBio NER, a three-stage NER framework integrating chain-of-thought reasoning and prompt engineering. Evaluated on three public datasets (disease, chemistry, and gene), the method requires no task-specific examples or LLM fine-tuning, addressing challenges in complex concept interpretation.
Results: ZeroTuneBio NER excels in tasks without strict matching, achieving an average F1-score improvement of 0.28 over direct LLM queries and a partial-matching F1-score of ∼88%. It rivals the performance of a fine-tuned LLaMA model trained on 11,240 examples and surpasses BioBERT trained on 22,480 examples when strict-matching errors are excluded. Notably, LLMs significantly optimize manual annotation, accelerating speed and reducing costs.
Conclusion: ZeroTuneBio NER demonstrates that LLMs can perform high-quality NER without fine-tuning, reducing reliance on manual annotation. The framework broadens LLM applications in biomedical NER, while our analysis highlights its scalability and future research directions.
(Copyright © 2025 The Authors. Published by Elsevier B.V. All rights reserved.)
Contributed Indexing: Keywords: Artificial intelligence; Chain of thought; Entity extraction; Large language model
Entry Date(s): Date Created: 20250913 Date Completed: 20251013 Latest Revision: 20251013
Update Code: 20251013
DOI: 10.1016/j.cmpb.2025.109070
PMID: 40945000
Databáze: MEDLINE
Popis
Abstrakt:Competing Interests: Declaration of competing interest All authors of this study declare that there are no financial, personal, or other interests that could potentially affect the objectivity and fairness of the research results in relation to the content of this study. None of the authors has any financial relationships with organizations that may have an interest in the research findings. There are no relevant patents, products under development, or marketed products, and no funding that could compromise the fairness of the research has been accepted.<br />Objective: This study aims to (1) enhance the performance of large language models (LLMs) in biomedical entity extraction, (2) investigate zero-shot named entity recognition (NER) capabilities without fine-tuning, and (3) compare the proposed framework with existing models and human annotation methods. Additionally, we analyze discrepancies between human and LLM-generated annotations to refine manual labeling processes for specialized datasets.<br />Materials and Methods: We propose ZeroTuneBio NER, a three-stage NER framework integrating chain-of-thought reasoning and prompt engineering. Evaluated on three public datasets (disease, chemistry, and gene), the method requires no task-specific examples or LLM fine-tuning, addressing challenges in complex concept interpretation.<br />Results: ZeroTuneBio NER excels in tasks without strict matching, achieving an average F1-score improvement of 0.28 over direct LLM queries and a partial-matching F1-score of ∼88%. It rivals the performance of a fine-tuned LLaMA model trained on 11,240 examples and surpasses BioBERT trained on 22,480 examples when strict-matching errors are excluded. Notably, LLMs significantly optimize manual annotation, accelerating speed and reducing costs.<br />Conclusion: ZeroTuneBio NER demonstrates that LLMs can perform high-quality NER without fine-tuning, reducing reliance on manual annotation. The framework broadens LLM applications in biomedical NER, while our analysis highlights its scalability and future research directions.<br /> (Copyright © 2025 The Authors. Published by Elsevier B.V. All rights reserved.)
ISSN:1872-7565
DOI:10.1016/j.cmpb.2025.109070