A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity

Standardizing clinical laboratory test results is critical for conducting clinical data science research and analysis. However, standardized data processing tools and guidelines are inadequate. In this paper, a novel approach for standardizing categorical test results based on supervised machine lea...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Heliyon Ročník 9; číslo 11; s. e21523
Hlavní autori: Ahmmed, Syed, Mondal, M. Rubaiyat Hossain, Mia, Md Raihan, Adibuzzaman, Mohammad, Hoque, Abu Sayed Md. Latiful, Ahamed, Sheikh Iqbal
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: England Elsevier Ltd 01.11.2023
Elsevier
Predmet:
ISSN:2405-8440, 2405-8440
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Standardizing clinical laboratory test results is critical for conducting clinical data science research and analysis. However, standardized data processing tools and guidelines are inadequate. In this paper, a novel approach for standardizing categorical test results based on supervised machine learning and the Jaro-Winkler similarity algorithm is proposed. A supervised machine learning model is used in this approach for scalable categorization of the test results into predefined groups or clusters, while Jaro-Winkler similarity is used to map text terms into standard clinical terms within these corresponding groups. The proposed method is applied to 75062 test results from two private hospitals in Bangladesh. The Support Vector Classification algorithm with a linear kernel has a classification accuracy of 98%, which is better than the Random Forest algorithm when categorizing test results. The experiment results show that Jaro-Winkler similarity achieves a remarkable 99.93% success rate in the test result standardization for the majority of groups with manual validation. The proposed method outperforms previous studies that concentrated on standardizing test results using rule-based classifiers on a smaller number of groups and distance similarities such as Cosine similarity or Levenshtein distance. Furthermore, when applied to the publicly available MIMIC-III dataset, our approach also performs excellently. All these findings show that the proposed standardization technique can be very beneficial for clinical big data research, particularly for national clinical research data hubs in low- and middle-income countries.
AbstractList Standardizing clinical laboratory test results is critical for conducting clinical data science research and analysis. However, standardized data processing tools and guidelines are inadequate. In this paper, a novel approach for standardizing categorical test results based on supervised machine learning and the Jaro-Winkler similarity algorithm is proposed. A supervised machine learning model is used in this approach for scalable categorization of the test results into predefined groups or clusters, while Jaro-Winkler similarity is used to map text terms into standard clinical terms within these corresponding groups. The proposed method is applied to 75062 test results from two private hospitals in Bangladesh. The Support Vector Classification algorithm with a linear kernel has a classification accuracy of 98%, which is better than the Random Forest algorithm when categorizing test results. The experiment results show that Jaro-Winkler similarity achieves a remarkable 99.93% success rate in the test result standardization for the majority of groups with manual validation. The proposed method outperforms previous studies that concentrated on standardizing test results using rule-based classifiers on a smaller number of groups and distance similarities such as Cosine similarity or Levenshtein distance. Furthermore, when applied to the publicly available MIMIC-III dataset, our approach also performs excellently. All these findings show that the proposed standardization technique can be very beneficial for clinical big data research, particularly for national clinical research data hubs in low- and middle-income countries.
Standardizing clinical laboratory test results is critical for conducting clinical data science research and analysis. However, standardized data processing tools and guidelines are inadequate. In this paper, a novel approach for standardizing categorical test results based on supervised machine learning and the Jaro-Winkler similarity algorithm is proposed. A supervised machine learning model is used in this approach for scalable categorization of the test results into predefined groups or clusters, while Jaro-Winkler similarity is used to map text terms into standard clinical terms within these corresponding groups. The proposed method is applied to 75062 test results from two private hospitals in Bangladesh. The Support Vector Classification algorithm with a linear kernel has a classification accuracy of 98%, which is better than the Random Forest algorithm when categorizing test results. The experiment results show that Jaro-Winkler similarity achieves a remarkable 99.93% success rate in the test result standardization for the majority of groups with manual validation. The proposed method outperforms previous studies that concentrated on standardizing test results using rule-based classifiers on a smaller number of groups and distance similarities such as Cosine similarity or Levenshtein distance. Furthermore, when applied to the publicly available MIMIC-III dataset, our approach also performs excellently. All these findings show that the proposed standardization technique can be very beneficial for clinical big data research, particularly for national clinical research data hubs in low- and middle-income countries.Standardizing clinical laboratory test results is critical for conducting clinical data science research and analysis. However, standardized data processing tools and guidelines are inadequate. In this paper, a novel approach for standardizing categorical test results based on supervised machine learning and the Jaro-Winkler similarity algorithm is proposed. A supervised machine learning model is used in this approach for scalable categorization of the test results into predefined groups or clusters, while Jaro-Winkler similarity is used to map text terms into standard clinical terms within these corresponding groups. The proposed method is applied to 75062 test results from two private hospitals in Bangladesh. The Support Vector Classification algorithm with a linear kernel has a classification accuracy of 98%, which is better than the Random Forest algorithm when categorizing test results. The experiment results show that Jaro-Winkler similarity achieves a remarkable 99.93% success rate in the test result standardization for the majority of groups with manual validation. The proposed method outperforms previous studies that concentrated on standardizing test results using rule-based classifiers on a smaller number of groups and distance similarities such as Cosine similarity or Levenshtein distance. Furthermore, when applied to the publicly available MIMIC-III dataset, our approach also performs excellently. All these findings show that the proposed standardization technique can be very beneficial for clinical big data research, particularly for national clinical research data hubs in low- and middle-income countries.
ArticleNumber e21523
Author Adibuzzaman, Mohammad
Ahamed, Sheikh Iqbal
Hoque, Abu Sayed Md. Latiful
Mia, Md Raihan
Mondal, M. Rubaiyat Hossain
Ahmmed, Syed
Author_xml – sequence: 1
  givenname: Syed
  orcidid: 0009-0001-8613-6827
  surname: Ahmmed
  fullname: Ahmmed, Syed
  organization: Institute of Information and Communication Technology, Bangladesh University of Engineering And Technology, Dhaka, Bangladesh
– sequence: 2
  givenname: M. Rubaiyat Hossain
  orcidid: 0000-0002-8582-9197
  surname: Mondal
  fullname: Mondal, M. Rubaiyat Hossain
  email: rubaiyat97@iict.buet.ac.bd
  organization: Institute of Information and Communication Technology, Bangladesh University of Engineering And Technology, Dhaka, Bangladesh
– sequence: 3
  givenname: Md Raihan
  orcidid: 0000-0002-6835-832X
  surname: Mia
  fullname: Mia, Md Raihan
  organization: Department of Computer Science And Engineering, Bangladesh University of Engineering And Technology, Dhaka, Bangladesh
– sequence: 4
  givenname: Mohammad
  surname: Adibuzzaman
  fullname: Adibuzzaman, Mohammad
  organization: Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR, USA
– sequence: 5
  givenname: Abu Sayed Md. Latiful
  surname: Hoque
  fullname: Hoque, Abu Sayed Md. Latiful
  organization: Department of Computer Science And Engineering, Bangladesh University of Engineering And Technology, Dhaka, Bangladesh
– sequence: 6
  givenname: Sheikh Iqbal
  surname: Ahamed
  fullname: Ahamed, Sheikh Iqbal
  organization: Department of Computer Science, Marquette University, Milwaukee, WI, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/38034661$$D View this record in MEDLINE/PubMed
BookMark eNqFkk1vEzEQhleoiJbSnwDaI5cEe_2xXnFAVcVHpUpc4GxN7NnEkWMH26kU7vxvvE2oWi45eTR-59F8vK-bsxADNs1bSuaUUPlhPV-hd_sY5h3p2Bw7Kjr2ornoOBEzxTk5exKfN1c5rwkhVCg59OxVc84UYVxKetH8uW5DvEffwnabIphVO8bU5gLBQrLutwvL1ngXnAHfeljEBCWmfWug4DKmh3TBXNqEeedLbnd5KtlUkgvYeoQUpkTlVWqaQusmvME2u43zkFzZv2lejuAzXh3fy-bnl88_br7N7r5_vb25vpsZIfsyGygDYNIqaUVHB64IEaMFK0Y-Gsb6QRFeBwTOKFeSjKTv-wGE6GBhLe0HdtncHrg2wlpvk9tA2usITj8kYlpqSMUZj7oTfMGp6UVHJOfCKi4B2UIgSkJgwSvr04G13S02aA2GksA_gz7_CW6ll_FeUyKVoFxUwvsjIcVfu7pEvXHZoPcQMO6yZlSwnnDVk5PSTg1SEdXJvkrfPe3rsaF_N68CcRCYFHNOOD5KKNGTvfRaH-2lJ3vpg71q3cf_6owrUFycxnP-ZPVxXVjve-8w6WwcVhtYl9CUegB3gvAX0wLxHQ
CitedBy_id crossref_primary_10_1016_j_oceaneng_2024_118953
crossref_primary_10_1016_j_bej_2025_109800
crossref_primary_10_3390_diagnostics14192135
Cites_doi 10.1016/j.smhl.2021.100238
10.1097/MLR.0b013e318259becd
10.1093/jamia/ocx046
10.1080/01621459.1989.10478785
10.1023/A:1022627411411
10.1197/jamia.M2273
10.2196/14083
10.4103/2224-3151.264849
10.7326/0003-4819-110-6-482
10.1023/A:1010933404324
10.1097/MLR.0b013e318257dd67
ContentType Journal Article
Copyright 2023 The Author(s)
2023 The Author(s).
2023 The Author(s) 2023
Copyright_xml – notice: 2023 The Author(s)
– notice: 2023 The Author(s).
– notice: 2023 The Author(s) 2023
DBID 6I.
AAFTH
AAYXX
CITATION
NPM
7X8
7S9
L.6
5PM
DOA
DOI 10.1016/j.heliyon.2023.e21523
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
PubMed
MEDLINE - Academic
AGRICOLA
AGRICOLA - Academic
PubMed Central (Full Participant titles)
DOAJ Open Access Full Text
DatabaseTitle CrossRef
PubMed
MEDLINE - Academic
AGRICOLA
AGRICOLA - Academic
DatabaseTitleList
AGRICOLA

MEDLINE - Academic

PubMed
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
EISSN 2405-8440
ExternalDocumentID oai_doaj_org_article_254b41c75206445d846ae3b5ee600ab4
PMC10685145
38034661
10_1016_j_heliyon_2023_e21523
S2405844023087315
Genre Journal Article
GeographicLocations Bangladesh
GeographicLocations_xml – name: Bangladesh
GrantInformation_xml – fundername: NCATS NIH HHS
  grantid: UL1 TR002369
GroupedDBID 0R~
457
53G
5VS
6I.
AAEDW
AAFTH
AAFWJ
AALRI
AAYWO
ABMAC
ACGFS
ACLIJ
ACVFH
ADBBV
ADCNI
ADEZE
ADVLN
AEUPX
AEXQZ
AFJKZ
AFPKN
AFPUW
AFTJW
AGHFR
AIGII
AITUG
AKBMS
AKRWK
AKYEP
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
AOIJS
APXCP
BAWUL
BCNDV
DIK
EBS
EJD
FDB
GROUPED_DOAJ
HYE
IPNFZ
KQ8
M~E
O9-
OK1
RIG
ROL
RPM
SSZ
AAYXX
CITATION
NPM
7X8
7S9
L.6
5PM
ID FETCH-LOGICAL-c567t-913aa36d86d521948005fdad5f4fc3379804973a4314860f07779a552abdd1793
IEDL.DBID DOA
ISICitedReferencesCount 4
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001114286500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2405-8440
IngestDate Tue Oct 14 19:08:51 EDT 2025
Thu Aug 21 18:36:19 EDT 2025
Fri Aug 22 21:01:14 EDT 2025
Fri Jul 11 12:43:04 EDT 2025
Tue Jul 22 01:42:01 EDT 2025
Thu Nov 27 00:59:43 EST 2025
Tue Nov 18 21:38:05 EST 2025
Sat Nov 29 17:02:31 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 11
Keywords SNOMED CT
Standardization
Machine learning
LOINC
Data science
String distance similarity
Data quality
Electronic health records
Language English
License This is an open access article under the CC BY-NC-ND license.
2023 The Author(s).
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c567t-913aa36d86d521948005fdad5f4fc3379804973a4314860f07779a552abdd1793
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0009-0001-8613-6827
0000-0002-8582-9197
0000-0002-6835-832X
OpenAccessLink https://doaj.org/article/254b41c75206445d846ae3b5ee600ab4
PMID 38034661
PQID 2896808267
PQPubID 23479
ParticipantIDs doaj_primary_oai_doaj_org_article_254b41c75206445d846ae3b5ee600ab4
pubmedcentral_primary_oai_pubmedcentral_nih_gov_10685145
proquest_miscellaneous_3153704870
proquest_miscellaneous_2896808267
pubmed_primary_38034661
crossref_primary_10_1016_j_heliyon_2023_e21523
crossref_citationtrail_10_1016_j_heliyon_2023_e21523
elsevier_sciencedirect_doi_10_1016_j_heliyon_2023_e21523
PublicationCentury 2000
PublicationDate 2023-11-01
PublicationDateYYYYMMDD 2023-11-01
PublicationDate_xml – month: 11
  year: 2023
  text: 2023-11-01
  day: 01
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Heliyon
PublicationTitleAlternate Heliyon
PublicationYear 2023
Publisher Elsevier Ltd
Elsevier
Publisher_xml – name: Elsevier Ltd
– name: Elsevier
References Euzenat, Shvaiko (br0280) 2007
Hauser, Quine, Ryder (br0120) 2018; 25
Levenshtein (br0270) 1966; 10
Ristad, Yianilos (br0230) 1998; vol. 1
Winkler (br0250) 1990; 85
Cortes, Vapnik (br0180) 1995; 20
Burnum (br0050) 1989; 110
Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg, Vanderplas, Passos, Cournapeau, Brucher, Perrot, Duchesnay (br0160) 2011; 12
Safran, Bloomrosen, Hammond, Labkoff, Markel-Fox, Tang, Detmer (br0030) 2007; 14
Botsis, Hartvigsen, Chen, Weng (br0060) 2010; 2010
Kahn, Raebel, Glanz, Riedlinger, Steiner (br0040) 2012; 50
br0080
Khan, Hoque (br0100) 2016
Kim, Shin, Kang, Yi, Chang (br0130) 2019; 7
Kluyver, Ragan-Kelley, Pérez, Granger, Bussonnier, Frederic, Kelley, Hamrick, Grout, Corlay, Ivanov, Avila, Abdalla, Willing (br0200) 2016
Gomaa, Fahmy (br0210) 2017
Kenter, De Rijke (br0220) 2015
Jaro (br0240) 1989; 84
Khan, Hoque (br0010) 2016; vol. 2
Cohen, Ravikumar, Fienberg (br0260) 2003
Powers (br0190) 2011; 2
Khan, Azad, de Oliveira Cruz (br0090) 2019; 8
F. Alhazmi, The ethical challenge of conflicts of interest in healthcare, 2019.
Johnson, Pollard, Mark (br0150) 2016
Lopez, Holve, Sarkar, Segal (br0020) 2012
Breiman (br0170) 2001; 45
Mia, Hoque, Khan, Ahamed (br0110) 2022; 23
br0070
Pedregosa (10.1016/j.heliyon.2023.e21523_br0160) 2011; 12
10.1016/j.heliyon.2023.e21523_br0140
Jaro (10.1016/j.heliyon.2023.e21523_br0240) 1989; 84
Khan (10.1016/j.heliyon.2023.e21523_br0090) 2019; 8
Hauser (10.1016/j.heliyon.2023.e21523_br0120) 2018; 25
Khan (10.1016/j.heliyon.2023.e21523_br0010) 2016; vol. 2
Lopez (10.1016/j.heliyon.2023.e21523_br0020) 2012
Kluyver (10.1016/j.heliyon.2023.e21523_br0200) 2016
Mia (10.1016/j.heliyon.2023.e21523_br0110) 2022; 23
Powers (10.1016/j.heliyon.2023.e21523_br0190) 2011; 2
Winkler (10.1016/j.heliyon.2023.e21523_br0250) 1990; 85
Khan (10.1016/j.heliyon.2023.e21523_br0100) 2016
Gomaa (10.1016/j.heliyon.2023.e21523_br0210) 2017
Ristad (10.1016/j.heliyon.2023.e21523_br0230) 1998; vol. 1
Kim (10.1016/j.heliyon.2023.e21523_br0130) 2019; 7
Kahn (10.1016/j.heliyon.2023.e21523_br0040) 2012; 50
Cortes (10.1016/j.heliyon.2023.e21523_br0180) 1995; 20
Burnum (10.1016/j.heliyon.2023.e21523_br0050) 1989; 110
Breiman (10.1016/j.heliyon.2023.e21523_br0170) 2001; 45
Botsis (10.1016/j.heliyon.2023.e21523_br0060) 2010; 2010
Johnson (10.1016/j.heliyon.2023.e21523_br0150)
Cohen (10.1016/j.heliyon.2023.e21523_br0260) 2003
Levenshtein (10.1016/j.heliyon.2023.e21523_br0270) 1966; 10
Kenter (10.1016/j.heliyon.2023.e21523_br0220) 2015
Euzenat (10.1016/j.heliyon.2023.e21523_br0280) 2007
Safran (10.1016/j.heliyon.2023.e21523_br0030) 2007; 14
References_xml – start-page: 87
  year: 2016
  end-page: 90
  ident: br0200
  article-title: Jupyter notebooks – a publishing format for reproducible computational workflows
  publication-title: Positioning and Power in Academic Publishing: Players, Agents and Agendas
– volume: 23
  year: 2022
  ident: br0110
  article-title: A privacy-preserving national clinical data warehouse: architecture and analysis
  publication-title: Smart Health
– ident: br0070
– volume: 50
  year: 2012
  ident: br0040
  article-title: A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research
  publication-title: Med. Care
– volume: vol. 2
  start-page: 413
  year: 2016
  end-page: 421
  ident: br0010
  article-title: Towards Development of National Health Data Warehouse for Knowledge Discovery
  publication-title: Intelligent Systems Technologies and Applications
– volume: 25
  start-page: 121
  year: 2018
  end-page: 126
  ident: br0120
  article-title: Labrs: a Rosetta stone for retrospective standardization of clinical laboratory test results
  publication-title: J. Am. Med. Inform. Assoc.
– start-page: S38
  year: 2012
  end-page: S48
  ident: br0020
  article-title: Building the informatics infrastructure for comparative effectiveness research (cer): a review of the literature
  publication-title: Med. Care
– volume: 7
  year: 2019
  ident: br0130
  article-title: Developing a standardization algorithm for categorical laboratory tests for clinical big data research: retrospective study
  publication-title: JMIR Med. Inform.
– volume: 8
  start-page: 71
  year: 2019
  end-page: 76
  ident: br0090
  article-title: Bangladesh's digital health journey: reflections on a decade of quiet revolution
  publication-title: WHO Southeast Asia J. Public Health
– volume: 84
  start-page: 414
  year: 1989
  end-page: 420
  ident: br0240
  article-title: Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida
  publication-title: J. Am. Stat. Assoc.
– ident: br0080
– start-page: 73
  year: 2003
  end-page: 78
  ident: br0260
  article-title: A comparison of string distance metrics for name-matching tasks
  publication-title: Proceedings of the IJCAI-03 Workshop on Information Integration, volume 3
– volume: vol. 1
  start-page: 412
  year: 1998
  end-page: 420
  ident: br0230
  article-title: Learning String Edit Distance
  publication-title: Proceedings of the Fifteenth International Conference on Machine Learning
– volume: 12
  start-page: 2825
  year: 2011
  end-page: 2830
  ident: br0160
  article-title: Scikit-learn: machine learning in python
  publication-title: J. Mach. Learn. Res.
– volume: 10
  start-page: 707
  year: 1966
  end-page: 710
  ident: br0270
  article-title: Binary codes capable of correcting deletions, insertions, and reversals
  publication-title: Sov. Phys. Dokl.
– volume: 110
  start-page: 482
  year: 1989
  end-page: 484
  ident: br0050
  article-title: The misinformation era: the fall of the medical record
  publication-title: Ann. Intern. Med.
– year: 2016
  ident: br0150
  article-title: Mimic-iii clinical database (version 1.4)
– volume: 2
  start-page: 37
  year: 2011
  end-page: 63
  ident: br0190
  article-title: Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation
  publication-title: J. Mach. Learn. Technol.
– volume: 14
  start-page: 1
  year: 2007
  end-page: 9
  ident: br0030
  article-title: Toward a national framework for the secondary use of health data: an American medical informatics association white paper
  publication-title: J. Am. Med. Inform. Assoc.
– volume: 2010
  start-page: 1
  year: 2010
  ident: br0060
  article-title: Secondary use of ehr: data quality issues and informatics opportunities
  publication-title: Summit Transl. Bioinform.
– volume: 20
  start-page: 273
  year: 1995
  end-page: 297
  ident: br0180
  article-title: Support-vector networks
  publication-title: Mach. Learn.
– reference: F. Alhazmi, The ethical challenge of conflicts of interest in healthcare, 2019.
– start-page: 122
  year: 2017
  end-page: 127
  ident: br0210
  article-title: Simall: a flexible tool for text similarity
  publication-title: The Seventeenth Conference on Language Engineering ESOLEC, volume 17
– year: 2007
  ident: br0280
  article-title: Ontology Matching, vol. 18
– volume: 45
  start-page: 5
  year: 2001
  end-page: 32
  ident: br0170
  article-title: Random forests
  publication-title: Mach. Learn.
– start-page: 1
  year: 2016
  end-page: 6
  ident: br0100
  article-title: Privacy and security problems of national health data warehouse: a convenient solution for developing countries
  publication-title: 2016 International Conference on Networking Systems and Security (NSysS)
– volume: 85
  start-page: 274
  year: 1990
  end-page: 284
  ident: br0250
  article-title: String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage
  publication-title: J. Am. Stat. Assoc.
– start-page: 1411
  year: 2015
  end-page: 1420
  ident: br0220
  article-title: Short text similarity with word embeddings
  publication-title: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
– ident: 10.1016/j.heliyon.2023.e21523_br0140
– volume: 10
  start-page: 707
  year: 1966
  ident: 10.1016/j.heliyon.2023.e21523_br0270
  article-title: Binary codes capable of correcting deletions, insertions, and reversals
  publication-title: Sov. Phys. Dokl.
– volume: 85
  start-page: 274
  year: 1990
  ident: 10.1016/j.heliyon.2023.e21523_br0250
  article-title: String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage
  publication-title: J. Am. Stat. Assoc.
– volume: 23
  year: 2022
  ident: 10.1016/j.heliyon.2023.e21523_br0110
  article-title: A privacy-preserving national clinical data warehouse: architecture and analysis
  publication-title: Smart Health
  doi: 10.1016/j.smhl.2021.100238
– ident: 10.1016/j.heliyon.2023.e21523_br0150
– start-page: S38
  year: 2012
  ident: 10.1016/j.heliyon.2023.e21523_br0020
  article-title: Building the informatics infrastructure for comparative effectiveness research (cer): a review of the literature
  publication-title: Med. Care
  doi: 10.1097/MLR.0b013e318259becd
– volume: 25
  start-page: 121
  year: 2018
  ident: 10.1016/j.heliyon.2023.e21523_br0120
  article-title: Labrs: a Rosetta stone for retrospective standardization of clinical laboratory test results
  publication-title: J. Am. Med. Inform. Assoc.
  doi: 10.1093/jamia/ocx046
– start-page: 1411
  year: 2015
  ident: 10.1016/j.heliyon.2023.e21523_br0220
  article-title: Short text similarity with word embeddings
– volume: 84
  start-page: 414
  year: 1989
  ident: 10.1016/j.heliyon.2023.e21523_br0240
  article-title: Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida
  publication-title: J. Am. Stat. Assoc.
  doi: 10.1080/01621459.1989.10478785
– volume: 20
  start-page: 273
  year: 1995
  ident: 10.1016/j.heliyon.2023.e21523_br0180
  article-title: Support-vector networks
  publication-title: Mach. Learn.
  doi: 10.1023/A:1022627411411
– volume: vol. 1
  start-page: 412
  year: 1998
  ident: 10.1016/j.heliyon.2023.e21523_br0230
  article-title: Learning String Edit Distance
– start-page: 1
  year: 2016
  ident: 10.1016/j.heliyon.2023.e21523_br0100
  article-title: Privacy and security problems of national health data warehouse: a convenient solution for developing countries
– volume: 14
  start-page: 1
  year: 2007
  ident: 10.1016/j.heliyon.2023.e21523_br0030
  article-title: Toward a national framework for the secondary use of health data: an American medical informatics association white paper
  publication-title: J. Am. Med. Inform. Assoc.
  doi: 10.1197/jamia.M2273
– volume: 12
  start-page: 2825
  year: 2011
  ident: 10.1016/j.heliyon.2023.e21523_br0160
  article-title: Scikit-learn: machine learning in python
  publication-title: J. Mach. Learn. Res.
– start-page: 122
  year: 2017
  ident: 10.1016/j.heliyon.2023.e21523_br0210
  article-title: Simall: a flexible tool for text similarity
– volume: 7
  year: 2019
  ident: 10.1016/j.heliyon.2023.e21523_br0130
  article-title: Developing a standardization algorithm for categorical laboratory tests for clinical big data research: retrospective study
  publication-title: JMIR Med. Inform.
  doi: 10.2196/14083
– volume: 2010
  start-page: 1
  year: 2010
  ident: 10.1016/j.heliyon.2023.e21523_br0060
  article-title: Secondary use of ehr: data quality issues and informatics opportunities
  publication-title: Summit Transl. Bioinform.
– start-page: 87
  year: 2016
  ident: 10.1016/j.heliyon.2023.e21523_br0200
  article-title: Jupyter notebooks – a publishing format for reproducible computational workflows
– volume: 8
  start-page: 71
  year: 2019
  ident: 10.1016/j.heliyon.2023.e21523_br0090
  article-title: Bangladesh's digital health journey: reflections on a decade of quiet revolution
  publication-title: WHO Southeast Asia J. Public Health
  doi: 10.4103/2224-3151.264849
– volume: vol. 2
  start-page: 413
  year: 2016
  ident: 10.1016/j.heliyon.2023.e21523_br0010
  article-title: Towards Development of National Health Data Warehouse for Knowledge Discovery
– year: 2007
  ident: 10.1016/j.heliyon.2023.e21523_br0280
– volume: 110
  start-page: 482
  year: 1989
  ident: 10.1016/j.heliyon.2023.e21523_br0050
  article-title: The misinformation era: the fall of the medical record
  publication-title: Ann. Intern. Med.
  doi: 10.7326/0003-4819-110-6-482
– volume: 45
  start-page: 5
  year: 2001
  ident: 10.1016/j.heliyon.2023.e21523_br0170
  article-title: Random forests
  publication-title: Mach. Learn.
  doi: 10.1023/A:1010933404324
– start-page: 73
  year: 2003
  ident: 10.1016/j.heliyon.2023.e21523_br0260
  article-title: A comparison of string distance metrics for name-matching tasks
– volume: 2
  start-page: 37
  year: 2011
  ident: 10.1016/j.heliyon.2023.e21523_br0190
  article-title: Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation
  publication-title: J. Mach. Learn. Technol.
– volume: 50
  year: 2012
  ident: 10.1016/j.heliyon.2023.e21523_br0040
  article-title: A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research
  publication-title: Med. Care
  doi: 10.1097/MLR.0b013e318257dd67
SSID ssj0001586973
Score 2.2719343
Snippet Standardizing clinical laboratory test results is critical for conducting clinical data science research and analysis. However, standardized data processing...
SourceID doaj
pubmedcentral
proquest
pubmed
crossref
elsevier
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage e21523
SubjectTerms algorithms
Bangladesh
biomedical research
data collection
Data quality
Data science
Electronic health records
laboratory experimentation
LOINC
Machine learning
SNOMED CT
Standardization
String distance similarity
Title A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity
URI https://dx.doi.org/10.1016/j.heliyon.2023.e21523
https://www.ncbi.nlm.nih.gov/pubmed/38034661
https://www.proquest.com/docview/2896808267
https://www.proquest.com/docview/3153704870
https://pubmed.ncbi.nlm.nih.gov/PMC10685145
https://doaj.org/article/254b41c75206445d846ae3b5ee600ab4
Volume 9
WOSCitedRecordID wos001114286500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2405-8440
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001586973
  issn: 2405-8440
  databaseCode: DOA
  dateStart: 20150101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2405-8440
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001586973
  issn: 2405-8440
  databaseCode: M~E
  dateStart: 20150101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3Pb9MwFLZgQogL4jflx2QkrunSPDtOjgNt4rKJA0i9WY7tdJm6BLXppHHYbf8378VJaUCoFy5V1cZJbL_0-179_H2MfRQeQBSqjHJbQiSSGKLM2BkVriFgOGfAB7MJdX6ezef51x2rL6oJC_LAYeCOMIEpxMwqmSB4CukQL42HQnqPUG2KTgkUWc9OMhX2B2dpruD3lp2jy-mFX1Y3DWmeJjD15OcKIzDqNPtHmPQ35_yzdHIHi06fsMc9ieTH4eafsnu-fsYenvXL5M_Z3TGvm2u_5INiOEdqyod_DaqfCFd82BLJ-zBoVjecqqMWQTSEIwVtOebim2W75lQdv-BXXeGl573TxILj-Tj5fuBbRzwUA4ivq6sKs2Uk9y_Y99OTb5-_RL3fQmRlqlpahDcGUpelDkE9F8glZemMk6UoLYDKM0wnFBjkHGRdVcZKqdxImZjCOXrQX7KDuqn9a8adAmxqU5uSoJy1RUxuebLIXQ5WyHLCxDDw2vZi5OSJsdRD1dml7udL03zpMF8TNt02-xHUOPY1-ESzuj2YxLS7DzDEdB9iel-ITVg2xITueUngG3iqat_1PwwxpPG5pcUYU_tms9aY6JLrSZKqfx8DCEcKf2JVPGGvQtxtewJZDALJFd7cKCJHXR1_U1cXnX74LE6RZwv55n8Mzlv2iPobtme-YwftauPfswf2uq3Wq0N2X82zw-7ZxNez25Nf1MxBtg
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+novel+approach+for+standardizing+clinical+laboratory+categorical+test+results+using+machine+learning+and+string+distance+similarity&rft.jtitle=Heliyon&rft.au=Ahmmed%2C+Syed&rft.au=Mondal%2C+M.+Rubaiyat+Hossain&rft.au=Mia%2C+Md+Raihan&rft.au=Adibuzzaman%2C+Mohammad&rft.date=2023-11-01&rft.pub=Elsevier&rft.eissn=2405-8440&rft.volume=9&rft.issue=11&rft_id=info:doi/10.1016%2Fj.heliyon.2023.e21523&rft.externalDocID=PMC10685145
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2405-8440&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2405-8440&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2405-8440&client=summon