ERNIE-UIE: Advancing information extraction in Chinese medical knowledge graph

The field of information extraction (IE) is currently exploring more versatile and efficient methods for minimization of reliance on extensive annotated datasets and integration of knowledge across tasks and domains. We aim to evaluate and refine the application of the universal IE (UIE) technology...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:PloS one Ročník 20; číslo 5; s. e0325082
Hlavní autoři: Li, Bei, Li, Changbiao, Sun, Jianwei, Zeng, Xu, Chen, Xiaofan, Zheng, Jing
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States Public Library of Science 29.05.2025
Public Library of Science (PLoS)
Témata:
ISSN:1932-6203, 1932-6203
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The field of information extraction (IE) is currently exploring more versatile and efficient methods for minimization of reliance on extensive annotated datasets and integration of knowledge across tasks and domains. We aim to evaluate and refine the application of the universal IE (UIE) technology in the field of Chinese medical expertise in terms of processing accuracy and efficiency. Our model integrates ontology modeling, web scraping, UIE, fine-tuning strategies, and graph databases, thereby covering knowledge modeling, extraction, and storage techniques. The Enhanced Representation through Knowledge Integration-UIE (ERNIE-UIE) model is fine-tuned and optimized using a small amount of annotated data. A medical knowledge graph is then constructed, followed by validating the graph and conducting knowledge mining on the data stored within it. Incorporating the characteristics of whole-course management, we implemented a comprehensive medical knowledge graph-construction model and methodology. Entities and relationships were jointly extracted using the pretrained language model, resulting in 8,525 entity data points and 9,522 triple data points. The accuracy of the knowledge graph was verified using graph algorithms. We optimized the construction process of a Chinese medical knowledge graph with minimal annotated data by utilizing a generative extraction paradigm, validating the graph's efficacy and achieving commendable results. This approach addresses the challenge of insufficient annotated training corpora in low-resource knowledge graph construction, thereby contributing to cost savings in the development of knowledge graphs.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Competing Interests: The authors have declared that no competing interests exist.
a Current Address: Department of Biomedical Informatics, School of Life Science, Central South University, Changsha, Hunan, China
b Current Address: Shenzhen Health Development Research and Data Management Center, Shenzhen, Guangdong, China
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0325082