Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus
With 214 source vocabularies, the construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus terminology integration system is costly, time-consuming, and error-prone as it primarily relies on (1) lexical and semantic processing for suggesting groupings of synon...
Saved in:
| Published in: | Proceedings of the ... International World-Wide Web Conference. International WWW Conference Vol. 2021; p. 2672 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Netherlands
01.04.2021
|
| Subjects: | |
| Online Access: | Get more information |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | With 214 source vocabularies, the construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus terminology integration system is costly, time-consuming, and error-prone as it primarily relies on (1) lexical and semantic processing for suggesting groupings of synonymous terms, and (2) the expertise of UMLS editors for curating these synonymy predictions. This paper aims to improve the UMLS Metathesaurus construction process by developing a novel supervised learning approach for improving the task of suggesting synonymous pairs that can scale to the size and diversity of the UMLS source vocabularies. We evaluate this deep learning (DL) approach against a rule-based approach (RBA) that approximates the current UMLS Metathesaurus construction process. The key to the generalizability of our approach is the use of various degrees of lexical similarity in negative pairs during the training process. Our initial experiments demonstrate the strong performance across multiple datasets of our DL approach in terms of recall (91-92%), precision (88-99%), and F1 score (89-95%). Our DL approach largely outperforms the RBA method in recall (+23%), precision (+2.4%), and F1 score (+14.1%). This novel approach has great potential for improving the UMLS Metathesaurus construction process by providing better synonymy suggestions to the UMLS editors. |
|---|---|
| AbstractList | With 214 source vocabularies, the construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus terminology integration system is costly, time-consuming, and error-prone as it primarily relies on (1) lexical and semantic processing for suggesting groupings of synonymous terms, and (2) the expertise of UMLS editors for curating these synonymy predictions. This paper aims to improve the UMLS Metathesaurus construction process by developing a novel supervised learning approach for improving the task of suggesting synonymous pairs that can scale to the size and diversity of the UMLS source vocabularies. We evaluate this deep learning (DL) approach against a rule-based approach (RBA) that approximates the current UMLS Metathesaurus construction process. The key to the generalizability of our approach is the use of various degrees of lexical similarity in negative pairs during the training process. Our initial experiments demonstrate the strong performance across multiple datasets of our DL approach in terms of recall (91-92%), precision (88-99%), and F1 score (89-95%). Our DL approach largely outperforms the RBA method in recall (+23%), precision (+2.4%), and F1 score (+14.1%). This novel approach has great potential for improving the UMLS Metathesaurus construction process by providing better synonymy suggestions to the UMLS editors.With 214 source vocabularies, the construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus terminology integration system is costly, time-consuming, and error-prone as it primarily relies on (1) lexical and semantic processing for suggesting groupings of synonymous terms, and (2) the expertise of UMLS editors for curating these synonymy predictions. This paper aims to improve the UMLS Metathesaurus construction process by developing a novel supervised learning approach for improving the task of suggesting synonymous pairs that can scale to the size and diversity of the UMLS source vocabularies. We evaluate this deep learning (DL) approach against a rule-based approach (RBA) that approximates the current UMLS Metathesaurus construction process. The key to the generalizability of our approach is the use of various degrees of lexical similarity in negative pairs during the training process. Our initial experiments demonstrate the strong performance across multiple datasets of our DL approach in terms of recall (91-92%), precision (88-99%), and F1 score (89-95%). Our DL approach largely outperforms the RBA method in recall (+23%), precision (+2.4%), and F1 score (+14.1%). This novel approach has great potential for improving the UMLS Metathesaurus construction process by providing better synonymy suggestions to the UMLS editors. With 214 source vocabularies, the construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus terminology integration system is costly, time-consuming, and error-prone as it primarily relies on (1) lexical and semantic processing for suggesting groupings of synonymous terms, and (2) the expertise of UMLS editors for curating these synonymy predictions. This paper aims to improve the UMLS Metathesaurus construction process by developing a novel supervised learning approach for improving the task of suggesting synonymous pairs that can scale to the size and diversity of the UMLS source vocabularies. We evaluate this deep learning (DL) approach against a rule-based approach (RBA) that approximates the current UMLS Metathesaurus construction process. The key to the generalizability of our approach is the use of various degrees of lexical similarity in negative pairs during the training process. Our initial experiments demonstrate the strong performance across multiple datasets of our DL approach in terms of recall (91-92%), precision (88-99%), and F1 score (89-95%). Our DL approach largely outperforms the RBA method in recall (+23%), precision (+2.4%), and F1 score (+14.1%). This novel approach has great potential for improving the UMLS Metathesaurus construction process by providing better synonymy suggestions to the UMLS editors. |
| Author | Yip, Hong Yung Nguyen, Vinh Bodenreider, Olivier |
| Author_xml | – sequence: 1 givenname: Vinh surname: Nguyen fullname: Nguyen, Vinh organization: National Library of Medicine, Bethesda, Maryland, USA – sequence: 2 givenname: Hong Yung surname: Yip fullname: Yip, Hong Yung organization: University of South Carolina, Columbia, South Carolina, USA – sequence: 3 givenname: Olivier surname: Bodenreider fullname: Bodenreider, Olivier organization: National Library of Medicine, Bethesda, Maryland, USA |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/34514472$$D View this record in MEDLINE/PubMed |
| BookMark | eNo1jztPw0AQhK8A8QjUdOhKGgfvPfwok4gAkiOKEFprfd6Dk852sH0F_56TCNXOzoxW-12zs37oibE7SJcASj9KpYQsYCmVTkEUF-wyKlAqF1dsvXZDR60z6PnHYLAJHscfvvLus--onznOfB9D4q7n8xfxw67a8x3NGJcJwximG3Zu0U90e5oLdtg-vW9ekurt-XWzqhKUeTknEiyYosxIQaZbbVqrEZtoWdMYZZXJWpQFgc1JlyY-l2XSaqlzsBlaa8WCPfzdPY7Dd6Bprjs3GfIeexrCVAudCwFCpyJW70_V0ES6-ji6LmLV_9ziF8kPVKo |
| ContentType | Journal Article |
| DBID | NPM 7X8 |
| DOI | 10.1145/3442381.3450128 |
| DatabaseName | PubMed MEDLINE - Academic |
| DatabaseTitle | PubMed MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic PubMed |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| ExternalDocumentID | 34514472 |
| Genre | Journal Article |
| GrantInformation_xml | – fundername: Intramural NIH HHS grantid: ZIA LM010010 |
| GroupedDBID | NPM 7X8 |
| ID | FETCH-LOGICAL-a379t-31f1c896e4165d5cdf5aab1c8fcbc4f4c6da38e1f7e59c472663f53571f6afff2 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 14 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000733621802061&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Thu Jul 10 18:20:18 EDT 2025 Sat May 31 02:10:22 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Keywords | logical rules supervised learning scalability vocabulary alignment UMLS Metathesaurus neural networks |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a379t-31f1c896e4165d5cdf5aab1c8fcbc4f4c6da38e1f7e59c472663f53571f6afff2 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| OpenAccessLink | https://doi.org/10.1145/3442381.3450128 |
| PMID | 34514472 |
| PQID | 2572212502 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_2572212502 pubmed_primary_34514472 |
| PublicationCentury | 2000 |
| PublicationDate | 20210401 |
| PublicationDateYYYYMMDD | 2021-04-01 |
| PublicationDate_xml | – month: 4 year: 2021 text: 20210401 day: 1 |
| PublicationDecade | 2020 |
| PublicationPlace | Netherlands |
| PublicationPlace_xml | – name: Netherlands |
| PublicationTitle | Proceedings of the ... International World-Wide Web Conference. International WWW Conference |
| PublicationTitleAlternate | Proc Int World Wide Web Conf |
| PublicationYear | 2021 |
| Score | 1.9137613 |
| Snippet | With 214 source vocabularies, the construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus terminology integration... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 2672 |
| Title | Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/34514472 https://www.proquest.com/docview/2572212502 |
| Volume | 2021 |
| WOSCitedRecordID | wos000733621802061&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LSwMxEA5qPXjxga_6IoLXtd1NssmepBWLh7YUaqW3JZvNaKFsa7cV_PdO0i2eBMHLHjYEwpfJN18ykwwhd4r7fYIOMpbkAUcFG6ic6UAwMFkTBbfQ_qJwV_b7ajxOBtWBW1mlVW440RN1PjPujLyBphUhzYpm9DD_CFzVKBddrUpobJMaQynjrFqOVfWCT8hFg3HufNI948Ix8e8y0ruTzsF_B3JI9ishSVvrmT8iW7Y4Ju22v03vgKev6KUyl2T6RVvTyZuP-lO9pENstHRSUJR-dNTrDmnPpRy-21KvFqvyhIw6Ty-Pz0FVJSHQTCZLJFEIjUpii9JK5MLkILTO8BeCbThwE-eaKRuCtCIxXKJHZiCYkCHEGgCiU7JTzAp7TmgmTcJYDiAZ8FDbrCkjHcUAKsYmaevkdgNJilboQgu6sLNVmf6AUidna1zT-fq5jBSxx12bjC7-0PuS7EUuacSnxlyRGuAatNdk13wuJ-Xixk8vfvuD3jeMQrJr |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Biomedical+Vocabulary+Alignment+at+Scale+in+the+UMLS+Metathesaurus&rft.jtitle=Proceedings+of+the+...+International+World-Wide+Web+Conference.+International+WWW+Conference&rft.au=Nguyen%2C+Vinh&rft.au=Yip%2C+Hong+Yung&rft.au=Bodenreider%2C+Olivier&rft.date=2021-04-01&rft.volume=2021&rft.spage=2672&rft_id=info:doi/10.1145%2F3442381.3450128&rft.externalDBID=NO_FULL_TEXT |