Bibliographic Details
| Title: |
REaMA: Building Biomedical Relation Extraction Specialized Large Language Models Through Instruction Tuning. |
| Authors: |
Zhang Y, Yu J, Li G, He Z, Yen GG |
| Source: |
IEEE transactions on neural networks and learning systems [IEEE Trans Neural Netw Learn Syst] 2025 Dec; Vol. 36 (12), pp. 20258-20272. |
| Publication Type: |
Journal Article |
| Language: |
English |
| Journal Info: |
Publisher: Institute of Electrical and Electronics Engineeers Country of Publication: United States NLM ID: 101616214 Publication Model: Print Cited Medium: Internet ISSN: 2162-2388 (Electronic) Linking ISSN: 2162237X NLM ISO Abbreviation: IEEE Trans Neural Netw Learn Syst Subsets: MEDLINE |
| Imprint Name(s): |
Original Publication: Piscataway, NJ : Institute of Electrical and Electronics Engineeers |
| MeSH Terms: |
Data Mining*/methods , Natural Language Processing* , Neural Networks, Computer* , Language*, Humans ; Semantics ; Databases, Factual ; Algorithms ; Large Language Models |
| Abstract: |
Aiming to identify entity pairs with biomedical semantic relations and assign specific relation types, biomedical relation extraction (BioRE) plays a critical role in biomedical text mining and information extraction (IE). Recent studies indicate that general large language models (LLMs) have made some breakthroughs in general relation extraction (RE) tasks. However, even the advanced open-source LLMs struggle with BioRE tasks. For example, WizardLM-70B and LLaMA-2-70B achieve F-scores of 14.05 and 12.21 on the BioRED dataset, respectively, significantly lagging behind the state-of-the-art (SOTA) method which scores 65.17. To address this gap, a multitask instruction-tuning framework is proposed, which can transform general LLMs into BioRE-specialized models with our meticulously curated instruction dataset, REInstruct, comprising 150000 diverse and quality instruction-response pairs. Consequently, we introduce REaMA, a series of open-source LLMs with sizes of 7B and 13B specifically tailored for BioRE tasks. Experimental results on seven representative BioRE datasets show that both REaMA-2-7B and REaMA-2-13B acquire promising performance on all datasets. Remarkably, the larger REaMA-2-13B outperforms the current SOTA method on five out of seven datasets. The result exhibits the effectiveness of instruction-tuning on REInstruct in eliciting strong RE capabilities in LLMs. Furthermore, we show that incorporating chain of thought (CoT) into REInstruct can further enhance the generalization ability of REaMA. The project is available at https://github.com/stzpp/REaMA. |
| Entry Date(s): |
Date Created: 20250820 Date Completed: 20251202 Latest Revision: 20251203 |
| Update Code: |
20251203 |
| DOI: |
10.1109/TNNLS.2025.3596257 |
| PMID: |
40833895 |
| Database: |
MEDLINE |