Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus

With 214 source vocabularies, the construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus terminology integration system is costly, time-consuming, and error-prone as it primarily relies on (1) lexical and semantic processing for suggesting groupings of synon...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the ... International World-Wide Web Conference. International WWW Conference Vol. 2021; p. 2672
Main Authors: Nguyen, Vinh, Yip, Hong Yung, Bodenreider, Olivier
Format: Journal Article
Language:English
Published: Netherlands 01.04.2021
Subjects:
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract With 214 source vocabularies, the construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus terminology integration system is costly, time-consuming, and error-prone as it primarily relies on (1) lexical and semantic processing for suggesting groupings of synonymous terms, and (2) the expertise of UMLS editors for curating these synonymy predictions. This paper aims to improve the UMLS Metathesaurus construction process by developing a novel supervised learning approach for improving the task of suggesting synonymous pairs that can scale to the size and diversity of the UMLS source vocabularies. We evaluate this deep learning (DL) approach against a rule-based approach (RBA) that approximates the current UMLS Metathesaurus construction process. The key to the generalizability of our approach is the use of various degrees of lexical similarity in negative pairs during the training process. Our initial experiments demonstrate the strong performance across multiple datasets of our DL approach in terms of recall (91-92%), precision (88-99%), and F1 score (89-95%). Our DL approach largely outperforms the RBA method in recall (+23%), precision (+2.4%), and F1 score (+14.1%). This novel approach has great potential for improving the UMLS Metathesaurus construction process by providing better synonymy suggestions to the UMLS editors.
AbstractList With 214 source vocabularies, the construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus terminology integration system is costly, time-consuming, and error-prone as it primarily relies on (1) lexical and semantic processing for suggesting groupings of synonymous terms, and (2) the expertise of UMLS editors for curating these synonymy predictions. This paper aims to improve the UMLS Metathesaurus construction process by developing a novel supervised learning approach for improving the task of suggesting synonymous pairs that can scale to the size and diversity of the UMLS source vocabularies. We evaluate this deep learning (DL) approach against a rule-based approach (RBA) that approximates the current UMLS Metathesaurus construction process. The key to the generalizability of our approach is the use of various degrees of lexical similarity in negative pairs during the training process. Our initial experiments demonstrate the strong performance across multiple datasets of our DL approach in terms of recall (91-92%), precision (88-99%), and F1 score (89-95%). Our DL approach largely outperforms the RBA method in recall (+23%), precision (+2.4%), and F1 score (+14.1%). This novel approach has great potential for improving the UMLS Metathesaurus construction process by providing better synonymy suggestions to the UMLS editors.With 214 source vocabularies, the construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus terminology integration system is costly, time-consuming, and error-prone as it primarily relies on (1) lexical and semantic processing for suggesting groupings of synonymous terms, and (2) the expertise of UMLS editors for curating these synonymy predictions. This paper aims to improve the UMLS Metathesaurus construction process by developing a novel supervised learning approach for improving the task of suggesting synonymous pairs that can scale to the size and diversity of the UMLS source vocabularies. We evaluate this deep learning (DL) approach against a rule-based approach (RBA) that approximates the current UMLS Metathesaurus construction process. The key to the generalizability of our approach is the use of various degrees of lexical similarity in negative pairs during the training process. Our initial experiments demonstrate the strong performance across multiple datasets of our DL approach in terms of recall (91-92%), precision (88-99%), and F1 score (89-95%). Our DL approach largely outperforms the RBA method in recall (+23%), precision (+2.4%), and F1 score (+14.1%). This novel approach has great potential for improving the UMLS Metathesaurus construction process by providing better synonymy suggestions to the UMLS editors.
With 214 source vocabularies, the construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus terminology integration system is costly, time-consuming, and error-prone as it primarily relies on (1) lexical and semantic processing for suggesting groupings of synonymous terms, and (2) the expertise of UMLS editors for curating these synonymy predictions. This paper aims to improve the UMLS Metathesaurus construction process by developing a novel supervised learning approach for improving the task of suggesting synonymous pairs that can scale to the size and diversity of the UMLS source vocabularies. We evaluate this deep learning (DL) approach against a rule-based approach (RBA) that approximates the current UMLS Metathesaurus construction process. The key to the generalizability of our approach is the use of various degrees of lexical similarity in negative pairs during the training process. Our initial experiments demonstrate the strong performance across multiple datasets of our DL approach in terms of recall (91-92%), precision (88-99%), and F1 score (89-95%). Our DL approach largely outperforms the RBA method in recall (+23%), precision (+2.4%), and F1 score (+14.1%). This novel approach has great potential for improving the UMLS Metathesaurus construction process by providing better synonymy suggestions to the UMLS editors.
Author Yip, Hong Yung
Nguyen, Vinh
Bodenreider, Olivier
Author_xml – sequence: 1
  givenname: Vinh
  surname: Nguyen
  fullname: Nguyen, Vinh
  organization: National Library of Medicine, Bethesda, Maryland, USA
– sequence: 2
  givenname: Hong Yung
  surname: Yip
  fullname: Yip, Hong Yung
  organization: University of South Carolina, Columbia, South Carolina, USA
– sequence: 3
  givenname: Olivier
  surname: Bodenreider
  fullname: Bodenreider, Olivier
  organization: National Library of Medicine, Bethesda, Maryland, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/34514472$$D View this record in MEDLINE/PubMed
BookMark eNo1jztPw0AQhK8A8QjUdOhKGgfvPfwok4gAkiOKEFprfd6Dk852sH0F_56TCNXOzoxW-12zs37oibE7SJcASj9KpYQsYCmVTkEUF-wyKlAqF1dsvXZDR60z6PnHYLAJHscfvvLus--onznOfB9D4q7n8xfxw67a8x3NGJcJwximG3Zu0U90e5oLdtg-vW9ekurt-XWzqhKUeTknEiyYosxIQaZbbVqrEZtoWdMYZZXJWpQFgc1JlyY-l2XSaqlzsBlaa8WCPfzdPY7Dd6Bprjs3GfIeexrCVAudCwFCpyJW70_V0ES6-ji6LmLV_9ziF8kPVKo
ContentType Journal Article
DBID NPM
7X8
DOI 10.1145/3442381.3450128
DatabaseName PubMed
MEDLINE - Academic
DatabaseTitle PubMed
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
PubMed
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
ExternalDocumentID 34514472
Genre Journal Article
GrantInformation_xml – fundername: Intramural NIH HHS
  grantid: ZIA LM010010
GroupedDBID NPM
7X8
ID FETCH-LOGICAL-a379t-31f1c896e4165d5cdf5aab1c8fcbc4f4c6da38e1f7e59c472663f53571f6afff2
IEDL.DBID 7X8
ISICitedReferencesCount 14
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000733621802061&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Thu Jul 10 18:20:18 EDT 2025
Sat May 31 02:10:22 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Keywords logical rules
supervised learning
scalability
vocabulary alignment
UMLS Metathesaurus
neural networks
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a379t-31f1c896e4165d5cdf5aab1c8fcbc4f4c6da38e1f7e59c472663f53571f6afff2
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://doi.org/10.1145/3442381.3450128
PMID 34514472
PQID 2572212502
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2572212502
pubmed_primary_34514472
PublicationCentury 2000
PublicationDate 20210401
PublicationDateYYYYMMDD 2021-04-01
PublicationDate_xml – month: 4
  year: 2021
  text: 20210401
  day: 1
PublicationDecade 2020
PublicationPlace Netherlands
PublicationPlace_xml – name: Netherlands
PublicationTitle Proceedings of the ... International World-Wide Web Conference. International WWW Conference
PublicationTitleAlternate Proc Int World Wide Web Conf
PublicationYear 2021
Score 1.9137613
Snippet With 214 source vocabularies, the construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus terminology integration...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 2672
Title Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus
URI https://www.ncbi.nlm.nih.gov/pubmed/34514472
https://www.proquest.com/docview/2572212502
Volume 2021
WOSCitedRecordID wos000733621802061&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LSwMxEA5qPXjxga_6IoLXtd1NssmepBWLh7YUaqW3JZvNaKFsa7cV_PdO0i2eBMHLHjYEwpfJN18ykwwhd4r7fYIOMpbkAUcFG6ic6UAwMFkTBbfQ_qJwV_b7ajxOBtWBW1mlVW440RN1PjPujLyBphUhzYpm9DD_CFzVKBddrUpobJMaQynjrFqOVfWCT8hFg3HufNI948Ix8e8y0ruTzsF_B3JI9ishSVvrmT8iW7Y4Ju22v03vgKev6KUyl2T6RVvTyZuP-lO9pENstHRSUJR-dNTrDmnPpRy-21KvFqvyhIw6Ty-Pz0FVJSHQTCZLJFEIjUpii9JK5MLkILTO8BeCbThwE-eaKRuCtCIxXKJHZiCYkCHEGgCiU7JTzAp7TmgmTcJYDiAZ8FDbrCkjHcUAKsYmaevkdgNJilboQgu6sLNVmf6AUidna1zT-fq5jBSxx12bjC7-0PuS7EUuacSnxlyRGuAatNdk13wuJ-Xixk8vfvuD3jeMQrJr
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Biomedical+Vocabulary+Alignment+at+Scale+in+the+UMLS+Metathesaurus&rft.jtitle=Proceedings+of+the+...+International+World-Wide+Web+Conference.+International+WWW+Conference&rft.au=Nguyen%2C+Vinh&rft.au=Yip%2C+Hong+Yung&rft.au=Bodenreider%2C+Olivier&rft.date=2021-04-01&rft.volume=2021&rft.spage=2672&rft_id=info:doi/10.1145%2F3442381.3450128&rft.externalDBID=NO_FULL_TEXT