ZeroTuneBio NER: A three-stage framework for zero-shot and zero-tuning biomedical entity extraction using large language models and prompt engineering.
Uloženo v:
| Název: | ZeroTuneBio NER: A three-stage framework for zero-shot and zero-tuning biomedical entity extraction using large language models and prompt engineering. |
|---|---|
| Autoři: | Qin M; Department of Dermatology, Huashan Hospital, Shanghai Institute of Dermatology, Fudan University, Shanghai, China; Greater Bay Area Institute of Precision Medicine, School of Life Sciences, Fudan University, Shanghai, China., Feng L; Greater Bay Area Institute of Precision Medicine, School of Life Sciences, Fudan University, Shanghai, China., Lu J; School of Life Sciences, Fudan University, Shanghai, China., Sun Z; School of Life Sciences, Fudan University, Shanghai, China., Yu Z; Greater Bay Area Institute of Precision Medicine, School of Life Sciences, Fudan University, Shanghai, China., Han L; Department of Dermatology, Huashan Hospital, Shanghai Institute of Dermatology, Fudan University, Shanghai, China; Greater Bay Area Institute of Precision Medicine, School of Life Sciences, Fudan University, Shanghai, China. Electronic address: hanlianyi@fudan.edu.cn. |
| Zdroj: | Computer methods and programs in biomedicine [Comput Methods Programs Biomed] 2025 Dec; Vol. 272, pp. 109070. Date of Electronic Publication: 2025 Sep 05. |
| Způsob vydávání: | Journal Article |
| Jazyk: | English |
| Informace o časopise: | Publisher: Elsevier Scientific Publishers Country of Publication: Ireland NLM ID: 8506513 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1872-7565 (Electronic) Linking ISSN: 01692607 NLM ISO Abbreviation: Comput Methods Programs Biomed Subsets: MEDLINE |
| Imprint Name(s): | Publication: Limerick : Elsevier Scientific Publishers Original Publication: Amsterdam : Elsevier Science Publishers, c1984- |
| Výrazy ze slovníku MeSH: | Natural Language Processing* , Data Mining*/methods, Humans ; Algorithms ; Language ; Large Language Models |
| Abstrakt: | Competing Interests: Declaration of competing interest All authors of this study declare that there are no financial, personal, or other interests that could potentially affect the objectivity and fairness of the research results in relation to the content of this study. None of the authors has any financial relationships with organizations that may have an interest in the research findings. There are no relevant patents, products under development, or marketed products, and no funding that could compromise the fairness of the research has been accepted. Objective: This study aims to (1) enhance the performance of large language models (LLMs) in biomedical entity extraction, (2) investigate zero-shot named entity recognition (NER) capabilities without fine-tuning, and (3) compare the proposed framework with existing models and human annotation methods. Additionally, we analyze discrepancies between human and LLM-generated annotations to refine manual labeling processes for specialized datasets. Materials and Methods: We propose ZeroTuneBio NER, a three-stage NER framework integrating chain-of-thought reasoning and prompt engineering. Evaluated on three public datasets (disease, chemistry, and gene), the method requires no task-specific examples or LLM fine-tuning, addressing challenges in complex concept interpretation. Results: ZeroTuneBio NER excels in tasks without strict matching, achieving an average F1-score improvement of 0.28 over direct LLM queries and a partial-matching F1-score of ∼88%. It rivals the performance of a fine-tuned LLaMA model trained on 11,240 examples and surpasses BioBERT trained on 22,480 examples when strict-matching errors are excluded. Notably, LLMs significantly optimize manual annotation, accelerating speed and reducing costs. Conclusion: ZeroTuneBio NER demonstrates that LLMs can perform high-quality NER without fine-tuning, reducing reliance on manual annotation. The framework broadens LLM applications in biomedical NER, while our analysis highlights its scalability and future research directions. (Copyright © 2025 The Authors. Published by Elsevier B.V. All rights reserved.) |
| Contributed Indexing: | Keywords: Artificial intelligence; Chain of thought; Entity extraction; Large language model |
| Entry Date(s): | Date Created: 20250913 Date Completed: 20251013 Latest Revision: 20251013 |
| Update Code: | 20251013 |
| DOI: | 10.1016/j.cmpb.2025.109070 |
| PMID: | 40945000 |
| Databáze: | MEDLINE |
| Abstrakt: | Competing Interests: Declaration of competing interest All authors of this study declare that there are no financial, personal, or other interests that could potentially affect the objectivity and fairness of the research results in relation to the content of this study. None of the authors has any financial relationships with organizations that may have an interest in the research findings. There are no relevant patents, products under development, or marketed products, and no funding that could compromise the fairness of the research has been accepted.<br />Objective: This study aims to (1) enhance the performance of large language models (LLMs) in biomedical entity extraction, (2) investigate zero-shot named entity recognition (NER) capabilities without fine-tuning, and (3) compare the proposed framework with existing models and human annotation methods. Additionally, we analyze discrepancies between human and LLM-generated annotations to refine manual labeling processes for specialized datasets.<br />Materials and Methods: We propose ZeroTuneBio NER, a three-stage NER framework integrating chain-of-thought reasoning and prompt engineering. Evaluated on three public datasets (disease, chemistry, and gene), the method requires no task-specific examples or LLM fine-tuning, addressing challenges in complex concept interpretation.<br />Results: ZeroTuneBio NER excels in tasks without strict matching, achieving an average F1-score improvement of 0.28 over direct LLM queries and a partial-matching F1-score of ∼88%. It rivals the performance of a fine-tuned LLaMA model trained on 11,240 examples and surpasses BioBERT trained on 22,480 examples when strict-matching errors are excluded. Notably, LLMs significantly optimize manual annotation, accelerating speed and reducing costs.<br />Conclusion: ZeroTuneBio NER demonstrates that LLMs can perform high-quality NER without fine-tuning, reducing reliance on manual annotation. The framework broadens LLM applications in biomedical NER, while our analysis highlights its scalability and future research directions.<br /> (Copyright © 2025 The Authors. Published by Elsevier B.V. All rights reserved.) |
|---|---|
| ISSN: | 1872-7565 |
| DOI: | 10.1016/j.cmpb.2025.109070 |
Full Text Finder
Nájsť tento článok vo Web of Science