A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity
Standardizing clinical laboratory test results is critical for conducting clinical data science research and analysis. However, standardized data processing tools and guidelines are inadequate. In this paper, a novel approach for standardizing categorical test results based on supervised machine lea...
Uložené v:
| Vydané v: | Heliyon Ročník 9; číslo 11; s. e21523 |
|---|---|
| Hlavní autori: | , , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
England
Elsevier Ltd
01.11.2023
Elsevier |
| Predmet: | |
| ISSN: | 2405-8440, 2405-8440 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Standardizing clinical laboratory test results is critical for conducting clinical data science research and analysis. However, standardized data processing tools and guidelines are inadequate. In this paper, a novel approach for standardizing categorical test results based on supervised machine learning and the Jaro-Winkler similarity algorithm is proposed. A supervised machine learning model is used in this approach for scalable categorization of the test results into predefined groups or clusters, while Jaro-Winkler similarity is used to map text terms into standard clinical terms within these corresponding groups. The proposed method is applied to 75062 test results from two private hospitals in Bangladesh. The Support Vector Classification algorithm with a linear kernel has a classification accuracy of 98%, which is better than the Random Forest algorithm when categorizing test results. The experiment results show that Jaro-Winkler similarity achieves a remarkable 99.93% success rate in the test result standardization for the majority of groups with manual validation. The proposed method outperforms previous studies that concentrated on standardizing test results using rule-based classifiers on a smaller number of groups and distance similarities such as Cosine similarity or Levenshtein distance. Furthermore, when applied to the publicly available MIMIC-III dataset, our approach also performs excellently. All these findings show that the proposed standardization technique can be very beneficial for clinical big data research, particularly for national clinical research data hubs in low- and middle-income countries. |
|---|---|
| AbstractList | Standardizing clinical laboratory test results is critical for conducting clinical data science research and analysis. However, standardized data processing tools and guidelines are inadequate. In this paper, a novel approach for standardizing categorical test results based on supervised machine learning and the Jaro-Winkler similarity algorithm is proposed. A supervised machine learning model is used in this approach for scalable categorization of the test results into predefined groups or clusters, while Jaro-Winkler similarity is used to map text terms into standard clinical terms within these corresponding groups. The proposed method is applied to 75062 test results from two private hospitals in Bangladesh. The Support Vector Classification algorithm with a linear kernel has a classification accuracy of 98%, which is better than the Random Forest algorithm when categorizing test results. The experiment results show that Jaro-Winkler similarity achieves a remarkable 99.93% success rate in the test result standardization for the majority of groups with manual validation. The proposed method outperforms previous studies that concentrated on standardizing test results using rule-based classifiers on a smaller number of groups and distance similarities such as Cosine similarity or Levenshtein distance. Furthermore, when applied to the publicly available MIMIC-III dataset, our approach also performs excellently. All these findings show that the proposed standardization technique can be very beneficial for clinical big data research, particularly for national clinical research data hubs in low- and middle-income countries. Standardizing clinical laboratory test results is critical for conducting clinical data science research and analysis. However, standardized data processing tools and guidelines are inadequate. In this paper, a novel approach for standardizing categorical test results based on supervised machine learning and the Jaro-Winkler similarity algorithm is proposed. A supervised machine learning model is used in this approach for scalable categorization of the test results into predefined groups or clusters, while Jaro-Winkler similarity is used to map text terms into standard clinical terms within these corresponding groups. The proposed method is applied to 75062 test results from two private hospitals in Bangladesh. The Support Vector Classification algorithm with a linear kernel has a classification accuracy of 98%, which is better than the Random Forest algorithm when categorizing test results. The experiment results show that Jaro-Winkler similarity achieves a remarkable 99.93% success rate in the test result standardization for the majority of groups with manual validation. The proposed method outperforms previous studies that concentrated on standardizing test results using rule-based classifiers on a smaller number of groups and distance similarities such as Cosine similarity or Levenshtein distance. Furthermore, when applied to the publicly available MIMIC-III dataset, our approach also performs excellently. All these findings show that the proposed standardization technique can be very beneficial for clinical big data research, particularly for national clinical research data hubs in low- and middle-income countries.Standardizing clinical laboratory test results is critical for conducting clinical data science research and analysis. However, standardized data processing tools and guidelines are inadequate. In this paper, a novel approach for standardizing categorical test results based on supervised machine learning and the Jaro-Winkler similarity algorithm is proposed. A supervised machine learning model is used in this approach for scalable categorization of the test results into predefined groups or clusters, while Jaro-Winkler similarity is used to map text terms into standard clinical terms within these corresponding groups. The proposed method is applied to 75062 test results from two private hospitals in Bangladesh. The Support Vector Classification algorithm with a linear kernel has a classification accuracy of 98%, which is better than the Random Forest algorithm when categorizing test results. The experiment results show that Jaro-Winkler similarity achieves a remarkable 99.93% success rate in the test result standardization for the majority of groups with manual validation. The proposed method outperforms previous studies that concentrated on standardizing test results using rule-based classifiers on a smaller number of groups and distance similarities such as Cosine similarity or Levenshtein distance. Furthermore, when applied to the publicly available MIMIC-III dataset, our approach also performs excellently. All these findings show that the proposed standardization technique can be very beneficial for clinical big data research, particularly for national clinical research data hubs in low- and middle-income countries. |
| ArticleNumber | e21523 |
| Author | Adibuzzaman, Mohammad Ahamed, Sheikh Iqbal Hoque, Abu Sayed Md. Latiful Mia, Md Raihan Mondal, M. Rubaiyat Hossain Ahmmed, Syed |
| Author_xml | – sequence: 1 givenname: Syed orcidid: 0009-0001-8613-6827 surname: Ahmmed fullname: Ahmmed, Syed organization: Institute of Information and Communication Technology, Bangladesh University of Engineering And Technology, Dhaka, Bangladesh – sequence: 2 givenname: M. Rubaiyat Hossain orcidid: 0000-0002-8582-9197 surname: Mondal fullname: Mondal, M. Rubaiyat Hossain email: rubaiyat97@iict.buet.ac.bd organization: Institute of Information and Communication Technology, Bangladesh University of Engineering And Technology, Dhaka, Bangladesh – sequence: 3 givenname: Md Raihan orcidid: 0000-0002-6835-832X surname: Mia fullname: Mia, Md Raihan organization: Department of Computer Science And Engineering, Bangladesh University of Engineering And Technology, Dhaka, Bangladesh – sequence: 4 givenname: Mohammad surname: Adibuzzaman fullname: Adibuzzaman, Mohammad organization: Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR, USA – sequence: 5 givenname: Abu Sayed Md. Latiful surname: Hoque fullname: Hoque, Abu Sayed Md. Latiful organization: Department of Computer Science And Engineering, Bangladesh University of Engineering And Technology, Dhaka, Bangladesh – sequence: 6 givenname: Sheikh Iqbal surname: Ahamed fullname: Ahamed, Sheikh Iqbal organization: Department of Computer Science, Marquette University, Milwaukee, WI, USA |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/38034661$$D View this record in MEDLINE/PubMed |
| BookMark | eNqFkk1vEzEQhleoiJbSnwDaI5cEe_2xXnFAVcVHpUpc4GxN7NnEkWMH26kU7vxvvE2oWi45eTR-59F8vK-bsxADNs1bSuaUUPlhPV-hd_sY5h3p2Bw7Kjr2ornoOBEzxTk5exKfN1c5rwkhVCg59OxVc84UYVxKetH8uW5DvEffwnabIphVO8bU5gLBQrLutwvL1ngXnAHfeljEBCWmfWug4DKmh3TBXNqEeedLbnd5KtlUkgvYeoQUpkTlVWqaQusmvME2u43zkFzZv2lejuAzXh3fy-bnl88_br7N7r5_vb25vpsZIfsyGygDYNIqaUVHB64IEaMFK0Y-Gsb6QRFeBwTOKFeSjKTv-wGE6GBhLe0HdtncHrg2wlpvk9tA2usITj8kYlpqSMUZj7oTfMGp6UVHJOfCKi4B2UIgSkJgwSvr04G13S02aA2GksA_gz7_CW6ll_FeUyKVoFxUwvsjIcVfu7pEvXHZoPcQMO6yZlSwnnDVk5PSTg1SEdXJvkrfPe3rsaF_N68CcRCYFHNOOD5KKNGTvfRaH-2lJ3vpg71q3cf_6owrUFycxnP-ZPVxXVjve-8w6WwcVhtYl9CUegB3gvAX0wLxHQ |
| CitedBy_id | crossref_primary_10_1016_j_oceaneng_2024_118953 crossref_primary_10_1016_j_bej_2025_109800 crossref_primary_10_3390_diagnostics14192135 |
| Cites_doi | 10.1016/j.smhl.2021.100238 10.1097/MLR.0b013e318259becd 10.1093/jamia/ocx046 10.1080/01621459.1989.10478785 10.1023/A:1022627411411 10.1197/jamia.M2273 10.2196/14083 10.4103/2224-3151.264849 10.7326/0003-4819-110-6-482 10.1023/A:1010933404324 10.1097/MLR.0b013e318257dd67 |
| ContentType | Journal Article |
| Copyright | 2023 The Author(s) 2023 The Author(s). 2023 The Author(s) 2023 |
| Copyright_xml | – notice: 2023 The Author(s) – notice: 2023 The Author(s). – notice: 2023 The Author(s) 2023 |
| DBID | 6I. AAFTH AAYXX CITATION NPM 7X8 7S9 L.6 5PM DOA |
| DOI | 10.1016/j.heliyon.2023.e21523 |
| DatabaseName | ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef PubMed MEDLINE - Academic AGRICOLA AGRICOLA - Academic PubMed Central (Full Participant titles) DOAJ Open Access Full Text |
| DatabaseTitle | CrossRef PubMed MEDLINE - Academic AGRICOLA AGRICOLA - Academic |
| DatabaseTitleList | AGRICOLA MEDLINE - Academic PubMed |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Medicine |
| EISSN | 2405-8440 |
| ExternalDocumentID | oai_doaj_org_article_254b41c75206445d846ae3b5ee600ab4 PMC10685145 38034661 10_1016_j_heliyon_2023_e21523 S2405844023087315 |
| Genre | Journal Article |
| GeographicLocations | Bangladesh |
| GeographicLocations_xml | – name: Bangladesh |
| GrantInformation_xml | – fundername: NCATS NIH HHS grantid: UL1 TR002369 |
| GroupedDBID | 0R~ 457 53G 5VS 6I. AAEDW AAFTH AAFWJ AALRI AAYWO ABMAC ACGFS ACLIJ ACVFH ADBBV ADCNI ADEZE ADVLN AEUPX AEXQZ AFJKZ AFPKN AFPUW AFTJW AGHFR AIGII AITUG AKBMS AKRWK AKYEP ALMA_UNASSIGNED_HOLDINGS AMRAJ AOIJS APXCP BAWUL BCNDV DIK EBS EJD FDB GROUPED_DOAJ HYE IPNFZ KQ8 M~E O9- OK1 RIG ROL RPM SSZ AAYXX CITATION NPM 7X8 7S9 L.6 5PM |
| ID | FETCH-LOGICAL-c567t-913aa36d86d521948005fdad5f4fc3379804973a4314860f07779a552abdd1793 |
| IEDL.DBID | DOA |
| ISICitedReferencesCount | 4 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001114286500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2405-8440 |
| IngestDate | Tue Oct 14 19:08:51 EDT 2025 Thu Aug 21 18:36:19 EDT 2025 Fri Aug 22 21:01:14 EDT 2025 Fri Jul 11 12:43:04 EDT 2025 Tue Jul 22 01:42:01 EDT 2025 Thu Nov 27 00:59:43 EST 2025 Tue Nov 18 21:38:05 EST 2025 Sat Nov 29 17:02:31 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 11 |
| Keywords | SNOMED CT Standardization Machine learning LOINC Data science String distance similarity Data quality Electronic health records |
| Language | English |
| License | This is an open access article under the CC BY-NC-ND license. 2023 The Author(s). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c567t-913aa36d86d521948005fdad5f4fc3379804973a4314860f07779a552abdd1793 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0009-0001-8613-6827 0000-0002-8582-9197 0000-0002-6835-832X |
| OpenAccessLink | https://doaj.org/article/254b41c75206445d846ae3b5ee600ab4 |
| PMID | 38034661 |
| PQID | 2896808267 |
| PQPubID | 23479 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_254b41c75206445d846ae3b5ee600ab4 pubmedcentral_primary_oai_pubmedcentral_nih_gov_10685145 proquest_miscellaneous_3153704870 proquest_miscellaneous_2896808267 pubmed_primary_38034661 crossref_primary_10_1016_j_heliyon_2023_e21523 crossref_citationtrail_10_1016_j_heliyon_2023_e21523 elsevier_sciencedirect_doi_10_1016_j_heliyon_2023_e21523 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-11-01 |
| PublicationDateYYYYMMDD | 2023-11-01 |
| PublicationDate_xml | – month: 11 year: 2023 text: 2023-11-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | England |
| PublicationPlace_xml | – name: England |
| PublicationTitle | Heliyon |
| PublicationTitleAlternate | Heliyon |
| PublicationYear | 2023 |
| Publisher | Elsevier Ltd Elsevier |
| Publisher_xml | – name: Elsevier Ltd – name: Elsevier |
| References | Euzenat, Shvaiko (br0280) 2007 Hauser, Quine, Ryder (br0120) 2018; 25 Levenshtein (br0270) 1966; 10 Ristad, Yianilos (br0230) 1998; vol. 1 Winkler (br0250) 1990; 85 Cortes, Vapnik (br0180) 1995; 20 Burnum (br0050) 1989; 110 Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg, Vanderplas, Passos, Cournapeau, Brucher, Perrot, Duchesnay (br0160) 2011; 12 Safran, Bloomrosen, Hammond, Labkoff, Markel-Fox, Tang, Detmer (br0030) 2007; 14 Botsis, Hartvigsen, Chen, Weng (br0060) 2010; 2010 Kahn, Raebel, Glanz, Riedlinger, Steiner (br0040) 2012; 50 br0080 Khan, Hoque (br0100) 2016 Kim, Shin, Kang, Yi, Chang (br0130) 2019; 7 Kluyver, Ragan-Kelley, Pérez, Granger, Bussonnier, Frederic, Kelley, Hamrick, Grout, Corlay, Ivanov, Avila, Abdalla, Willing (br0200) 2016 Gomaa, Fahmy (br0210) 2017 Kenter, De Rijke (br0220) 2015 Jaro (br0240) 1989; 84 Khan, Hoque (br0010) 2016; vol. 2 Cohen, Ravikumar, Fienberg (br0260) 2003 Powers (br0190) 2011; 2 Khan, Azad, de Oliveira Cruz (br0090) 2019; 8 F. Alhazmi, The ethical challenge of conflicts of interest in healthcare, 2019. Johnson, Pollard, Mark (br0150) 2016 Lopez, Holve, Sarkar, Segal (br0020) 2012 Breiman (br0170) 2001; 45 Mia, Hoque, Khan, Ahamed (br0110) 2022; 23 br0070 Pedregosa (10.1016/j.heliyon.2023.e21523_br0160) 2011; 12 10.1016/j.heliyon.2023.e21523_br0140 Jaro (10.1016/j.heliyon.2023.e21523_br0240) 1989; 84 Khan (10.1016/j.heliyon.2023.e21523_br0090) 2019; 8 Hauser (10.1016/j.heliyon.2023.e21523_br0120) 2018; 25 Khan (10.1016/j.heliyon.2023.e21523_br0010) 2016; vol. 2 Lopez (10.1016/j.heliyon.2023.e21523_br0020) 2012 Kluyver (10.1016/j.heliyon.2023.e21523_br0200) 2016 Mia (10.1016/j.heliyon.2023.e21523_br0110) 2022; 23 Powers (10.1016/j.heliyon.2023.e21523_br0190) 2011; 2 Winkler (10.1016/j.heliyon.2023.e21523_br0250) 1990; 85 Khan (10.1016/j.heliyon.2023.e21523_br0100) 2016 Gomaa (10.1016/j.heliyon.2023.e21523_br0210) 2017 Ristad (10.1016/j.heliyon.2023.e21523_br0230) 1998; vol. 1 Kim (10.1016/j.heliyon.2023.e21523_br0130) 2019; 7 Kahn (10.1016/j.heliyon.2023.e21523_br0040) 2012; 50 Cortes (10.1016/j.heliyon.2023.e21523_br0180) 1995; 20 Burnum (10.1016/j.heliyon.2023.e21523_br0050) 1989; 110 Breiman (10.1016/j.heliyon.2023.e21523_br0170) 2001; 45 Botsis (10.1016/j.heliyon.2023.e21523_br0060) 2010; 2010 Johnson (10.1016/j.heliyon.2023.e21523_br0150) Cohen (10.1016/j.heliyon.2023.e21523_br0260) 2003 Levenshtein (10.1016/j.heliyon.2023.e21523_br0270) 1966; 10 Kenter (10.1016/j.heliyon.2023.e21523_br0220) 2015 Euzenat (10.1016/j.heliyon.2023.e21523_br0280) 2007 Safran (10.1016/j.heliyon.2023.e21523_br0030) 2007; 14 |
| References_xml | – start-page: 87 year: 2016 end-page: 90 ident: br0200 article-title: Jupyter notebooks – a publishing format for reproducible computational workflows publication-title: Positioning and Power in Academic Publishing: Players, Agents and Agendas – volume: 23 year: 2022 ident: br0110 article-title: A privacy-preserving national clinical data warehouse: architecture and analysis publication-title: Smart Health – ident: br0070 – volume: 50 year: 2012 ident: br0040 article-title: A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research publication-title: Med. Care – volume: vol. 2 start-page: 413 year: 2016 end-page: 421 ident: br0010 article-title: Towards Development of National Health Data Warehouse for Knowledge Discovery publication-title: Intelligent Systems Technologies and Applications – volume: 25 start-page: 121 year: 2018 end-page: 126 ident: br0120 article-title: Labrs: a Rosetta stone for retrospective standardization of clinical laboratory test results publication-title: J. Am. Med. Inform. Assoc. – start-page: S38 year: 2012 end-page: S48 ident: br0020 article-title: Building the informatics infrastructure for comparative effectiveness research (cer): a review of the literature publication-title: Med. Care – volume: 7 year: 2019 ident: br0130 article-title: Developing a standardization algorithm for categorical laboratory tests for clinical big data research: retrospective study publication-title: JMIR Med. Inform. – volume: 8 start-page: 71 year: 2019 end-page: 76 ident: br0090 article-title: Bangladesh's digital health journey: reflections on a decade of quiet revolution publication-title: WHO Southeast Asia J. Public Health – volume: 84 start-page: 414 year: 1989 end-page: 420 ident: br0240 article-title: Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida publication-title: J. Am. Stat. Assoc. – ident: br0080 – start-page: 73 year: 2003 end-page: 78 ident: br0260 article-title: A comparison of string distance metrics for name-matching tasks publication-title: Proceedings of the IJCAI-03 Workshop on Information Integration, volume 3 – volume: vol. 1 start-page: 412 year: 1998 end-page: 420 ident: br0230 article-title: Learning String Edit Distance publication-title: Proceedings of the Fifteenth International Conference on Machine Learning – volume: 12 start-page: 2825 year: 2011 end-page: 2830 ident: br0160 article-title: Scikit-learn: machine learning in python publication-title: J. Mach. Learn. Res. – volume: 10 start-page: 707 year: 1966 end-page: 710 ident: br0270 article-title: Binary codes capable of correcting deletions, insertions, and reversals publication-title: Sov. Phys. Dokl. – volume: 110 start-page: 482 year: 1989 end-page: 484 ident: br0050 article-title: The misinformation era: the fall of the medical record publication-title: Ann. Intern. Med. – year: 2016 ident: br0150 article-title: Mimic-iii clinical database (version 1.4) – volume: 2 start-page: 37 year: 2011 end-page: 63 ident: br0190 article-title: Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation publication-title: J. Mach. Learn. Technol. – volume: 14 start-page: 1 year: 2007 end-page: 9 ident: br0030 article-title: Toward a national framework for the secondary use of health data: an American medical informatics association white paper publication-title: J. Am. Med. Inform. Assoc. – volume: 2010 start-page: 1 year: 2010 ident: br0060 article-title: Secondary use of ehr: data quality issues and informatics opportunities publication-title: Summit Transl. Bioinform. – volume: 20 start-page: 273 year: 1995 end-page: 297 ident: br0180 article-title: Support-vector networks publication-title: Mach. Learn. – reference: F. Alhazmi, The ethical challenge of conflicts of interest in healthcare, 2019. – start-page: 122 year: 2017 end-page: 127 ident: br0210 article-title: Simall: a flexible tool for text similarity publication-title: The Seventeenth Conference on Language Engineering ESOLEC, volume 17 – year: 2007 ident: br0280 article-title: Ontology Matching, vol. 18 – volume: 45 start-page: 5 year: 2001 end-page: 32 ident: br0170 article-title: Random forests publication-title: Mach. Learn. – start-page: 1 year: 2016 end-page: 6 ident: br0100 article-title: Privacy and security problems of national health data warehouse: a convenient solution for developing countries publication-title: 2016 International Conference on Networking Systems and Security (NSysS) – volume: 85 start-page: 274 year: 1990 end-page: 284 ident: br0250 article-title: String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage publication-title: J. Am. Stat. Assoc. – start-page: 1411 year: 2015 end-page: 1420 ident: br0220 article-title: Short text similarity with word embeddings publication-title: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management – ident: 10.1016/j.heliyon.2023.e21523_br0140 – volume: 10 start-page: 707 year: 1966 ident: 10.1016/j.heliyon.2023.e21523_br0270 article-title: Binary codes capable of correcting deletions, insertions, and reversals publication-title: Sov. Phys. Dokl. – volume: 85 start-page: 274 year: 1990 ident: 10.1016/j.heliyon.2023.e21523_br0250 article-title: String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage publication-title: J. Am. Stat. Assoc. – volume: 23 year: 2022 ident: 10.1016/j.heliyon.2023.e21523_br0110 article-title: A privacy-preserving national clinical data warehouse: architecture and analysis publication-title: Smart Health doi: 10.1016/j.smhl.2021.100238 – ident: 10.1016/j.heliyon.2023.e21523_br0150 – start-page: S38 year: 2012 ident: 10.1016/j.heliyon.2023.e21523_br0020 article-title: Building the informatics infrastructure for comparative effectiveness research (cer): a review of the literature publication-title: Med. Care doi: 10.1097/MLR.0b013e318259becd – volume: 25 start-page: 121 year: 2018 ident: 10.1016/j.heliyon.2023.e21523_br0120 article-title: Labrs: a Rosetta stone for retrospective standardization of clinical laboratory test results publication-title: J. Am. Med. Inform. Assoc. doi: 10.1093/jamia/ocx046 – start-page: 1411 year: 2015 ident: 10.1016/j.heliyon.2023.e21523_br0220 article-title: Short text similarity with word embeddings – volume: 84 start-page: 414 year: 1989 ident: 10.1016/j.heliyon.2023.e21523_br0240 article-title: Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida publication-title: J. Am. Stat. Assoc. doi: 10.1080/01621459.1989.10478785 – volume: 20 start-page: 273 year: 1995 ident: 10.1016/j.heliyon.2023.e21523_br0180 article-title: Support-vector networks publication-title: Mach. Learn. doi: 10.1023/A:1022627411411 – volume: vol. 1 start-page: 412 year: 1998 ident: 10.1016/j.heliyon.2023.e21523_br0230 article-title: Learning String Edit Distance – start-page: 1 year: 2016 ident: 10.1016/j.heliyon.2023.e21523_br0100 article-title: Privacy and security problems of national health data warehouse: a convenient solution for developing countries – volume: 14 start-page: 1 year: 2007 ident: 10.1016/j.heliyon.2023.e21523_br0030 article-title: Toward a national framework for the secondary use of health data: an American medical informatics association white paper publication-title: J. Am. Med. Inform. Assoc. doi: 10.1197/jamia.M2273 – volume: 12 start-page: 2825 year: 2011 ident: 10.1016/j.heliyon.2023.e21523_br0160 article-title: Scikit-learn: machine learning in python publication-title: J. Mach. Learn. Res. – start-page: 122 year: 2017 ident: 10.1016/j.heliyon.2023.e21523_br0210 article-title: Simall: a flexible tool for text similarity – volume: 7 year: 2019 ident: 10.1016/j.heliyon.2023.e21523_br0130 article-title: Developing a standardization algorithm for categorical laboratory tests for clinical big data research: retrospective study publication-title: JMIR Med. Inform. doi: 10.2196/14083 – volume: 2010 start-page: 1 year: 2010 ident: 10.1016/j.heliyon.2023.e21523_br0060 article-title: Secondary use of ehr: data quality issues and informatics opportunities publication-title: Summit Transl. Bioinform. – start-page: 87 year: 2016 ident: 10.1016/j.heliyon.2023.e21523_br0200 article-title: Jupyter notebooks – a publishing format for reproducible computational workflows – volume: 8 start-page: 71 year: 2019 ident: 10.1016/j.heliyon.2023.e21523_br0090 article-title: Bangladesh's digital health journey: reflections on a decade of quiet revolution publication-title: WHO Southeast Asia J. Public Health doi: 10.4103/2224-3151.264849 – volume: vol. 2 start-page: 413 year: 2016 ident: 10.1016/j.heliyon.2023.e21523_br0010 article-title: Towards Development of National Health Data Warehouse for Knowledge Discovery – year: 2007 ident: 10.1016/j.heliyon.2023.e21523_br0280 – volume: 110 start-page: 482 year: 1989 ident: 10.1016/j.heliyon.2023.e21523_br0050 article-title: The misinformation era: the fall of the medical record publication-title: Ann. Intern. Med. doi: 10.7326/0003-4819-110-6-482 – volume: 45 start-page: 5 year: 2001 ident: 10.1016/j.heliyon.2023.e21523_br0170 article-title: Random forests publication-title: Mach. Learn. doi: 10.1023/A:1010933404324 – start-page: 73 year: 2003 ident: 10.1016/j.heliyon.2023.e21523_br0260 article-title: A comparison of string distance metrics for name-matching tasks – volume: 2 start-page: 37 year: 2011 ident: 10.1016/j.heliyon.2023.e21523_br0190 article-title: Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation publication-title: J. Mach. Learn. Technol. – volume: 50 year: 2012 ident: 10.1016/j.heliyon.2023.e21523_br0040 article-title: A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research publication-title: Med. Care doi: 10.1097/MLR.0b013e318257dd67 |
| SSID | ssj0001586973 |
| Score | 2.2719343 |
| Snippet | Standardizing clinical laboratory test results is critical for conducting clinical data science research and analysis. However, standardized data processing... |
| SourceID | doaj pubmedcentral proquest pubmed crossref elsevier |
| SourceType | Open Website Open Access Repository Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | e21523 |
| SubjectTerms | algorithms Bangladesh biomedical research data collection Data quality Data science Electronic health records laboratory experimentation LOINC Machine learning SNOMED CT Standardization String distance similarity |
| Title | A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity |
| URI | https://dx.doi.org/10.1016/j.heliyon.2023.e21523 https://www.ncbi.nlm.nih.gov/pubmed/38034661 https://www.proquest.com/docview/2896808267 https://www.proquest.com/docview/3153704870 https://pubmed.ncbi.nlm.nih.gov/PMC10685145 https://doaj.org/article/254b41c75206445d846ae3b5ee600ab4 |
| Volume | 9 |
| WOSCitedRecordID | wos001114286500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2405-8440 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001586973 issn: 2405-8440 databaseCode: DOA dateStart: 20150101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2405-8440 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001586973 issn: 2405-8440 databaseCode: M~E dateStart: 20150101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3Pb9MwFLZgQogL4jflx2QkrunSPDtOjgNt4rKJA0i9WY7tdJm6BLXppHHYbf8378VJaUCoFy5V1cZJbL_0-179_H2MfRQeQBSqjHJbQiSSGKLM2BkVriFgOGfAB7MJdX6ezef51x2rL6oJC_LAYeCOMIEpxMwqmSB4CukQL42HQnqPUG2KTgkUWc9OMhX2B2dpruD3lp2jy-mFX1Y3DWmeJjD15OcKIzDqNPtHmPQ35_yzdHIHi06fsMc9ieTH4eafsnu-fsYenvXL5M_Z3TGvm2u_5INiOEdqyod_DaqfCFd82BLJ-zBoVjecqqMWQTSEIwVtOebim2W75lQdv-BXXeGl573TxILj-Tj5fuBbRzwUA4ivq6sKs2Uk9y_Y99OTb5-_RL3fQmRlqlpahDcGUpelDkE9F8glZemMk6UoLYDKM0wnFBjkHGRdVcZKqdxImZjCOXrQX7KDuqn9a8adAmxqU5uSoJy1RUxuebLIXQ5WyHLCxDDw2vZi5OSJsdRD1dml7udL03zpMF8TNt02-xHUOPY1-ESzuj2YxLS7DzDEdB9iel-ITVg2xITueUngG3iqat_1PwwxpPG5pcUYU_tms9aY6JLrSZKqfx8DCEcKf2JVPGGvQtxtewJZDALJFd7cKCJHXR1_U1cXnX74LE6RZwv55n8Mzlv2iPobtme-YwftauPfswf2uq3Wq0N2X82zw-7ZxNez25Nf1MxBtg |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+novel+approach+for+standardizing+clinical+laboratory+categorical+test+results+using+machine+learning+and+string+distance+similarity&rft.jtitle=Heliyon&rft.au=Ahmmed%2C+Syed&rft.au=Mondal%2C+M.+Rubaiyat+Hossain&rft.au=Mia%2C+Md+Raihan&rft.au=Adibuzzaman%2C+Mohammad&rft.date=2023-11-01&rft.pub=Elsevier&rft.eissn=2405-8440&rft.volume=9&rft.issue=11&rft_id=info:doi/10.1016%2Fj.heliyon.2023.e21523&rft.externalDocID=PMC10685145 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2405-8440&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2405-8440&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2405-8440&client=summon |