Joining Datasets Without Identifiers: Probabilistic Linkage of Virtual Pediatric Systems and PEDSnet
To 1) probabilistically link two important pediatric data sources, Virtual Pediatric Systems and PEDSnet, 2) evaluate linkage accuracy overall and in patients with severe sepsis or septic shock, and 3) identify variables important to linkage accuracy. Retrospective linkage of prospectively collected...
Saved in:
| Published in: | Pediatric critical care medicine Vol. 21; no. 9; p. e628 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
United States
01.09.2020
|
| Subjects: | |
| ISSN: | 1529-7535 |
| Online Access: | Get more information |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | To 1) probabilistically link two important pediatric data sources, Virtual Pediatric Systems and PEDSnet, 2) evaluate linkage accuracy overall and in patients with severe sepsis or septic shock, and 3) identify variables important to linkage accuracy.
Retrospective linkage of prospectively collected datasets from Virtual Pediatrics Systems, Inc (Los Angeles, CA) and the PEDSnet consortium.
Single-center academic PICU.
All PICU encounters between January 1, 2012, and December 31, 2017, that were deterministically matched between the two datasets.
None.
We abstracted records from Virtual Pediatric Systems and PEDSnet corresponding to PICU encounters and probabilistically linked using 44 features shared by the two datasets. We generated a gold standard deterministic linkage using protected health information elements, which were then removed from datasets. We then calculated candidate pair log-likelihood ratios for all pairs of subjects and selected optimal pairs in a two-stage algorithm. A total of 22,051 gold standard PICU encounter pairs were identified over the study period. The optimal linkage model demonstrated excellent discrimination (area under the receiver operating characteristic curve > 0.99); 19,801 cases (89.9%) were matched with 13 false positives. The addition of two protected health information dates (admission month, birth day-of-year) increased to 20,189 (91.6%) the cases matched, with three false positives. Restricting to patients with Virtual Pediatric Systems diagnosis of severe sepsis or septic shock (n = 1,340 [6.1%]) matched 1,250 cases (93.2%) with zero false positives. Increased number of laboratory values present in the first 12 hours of admission significantly increased log-likelihood ratios, suggesting stronger candidate pair matching.
We demonstrated the use of probabilistic linkage to accurately join two complementary pediatric critical care datasets at a single academic PICU in the absence of protected health information. Combining datasets with curated diagnoses and granular measurements can validate patient acuity metrics and facilitate multicenter machine learning algorithms. We anticipate these methods will generalize to other common PICU diagnoses. |
|---|---|
| AbstractList | To 1) probabilistically link two important pediatric data sources, Virtual Pediatric Systems and PEDSnet, 2) evaluate linkage accuracy overall and in patients with severe sepsis or septic shock, and 3) identify variables important to linkage accuracy.
Retrospective linkage of prospectively collected datasets from Virtual Pediatrics Systems, Inc (Los Angeles, CA) and the PEDSnet consortium.
Single-center academic PICU.
All PICU encounters between January 1, 2012, and December 31, 2017, that were deterministically matched between the two datasets.
None.
We abstracted records from Virtual Pediatric Systems and PEDSnet corresponding to PICU encounters and probabilistically linked using 44 features shared by the two datasets. We generated a gold standard deterministic linkage using protected health information elements, which were then removed from datasets. We then calculated candidate pair log-likelihood ratios for all pairs of subjects and selected optimal pairs in a two-stage algorithm. A total of 22,051 gold standard PICU encounter pairs were identified over the study period. The optimal linkage model demonstrated excellent discrimination (area under the receiver operating characteristic curve > 0.99); 19,801 cases (89.9%) were matched with 13 false positives. The addition of two protected health information dates (admission month, birth day-of-year) increased to 20,189 (91.6%) the cases matched, with three false positives. Restricting to patients with Virtual Pediatric Systems diagnosis of severe sepsis or septic shock (n = 1,340 [6.1%]) matched 1,250 cases (93.2%) with zero false positives. Increased number of laboratory values present in the first 12 hours of admission significantly increased log-likelihood ratios, suggesting stronger candidate pair matching.
We demonstrated the use of probabilistic linkage to accurately join two complementary pediatric critical care datasets at a single academic PICU in the absence of protected health information. Combining datasets with curated diagnoses and granular measurements can validate patient acuity metrics and facilitate multicenter machine learning algorithms. We anticipate these methods will generalize to other common PICU diagnoses. To 1) probabilistically link two important pediatric data sources, Virtual Pediatric Systems and PEDSnet, 2) evaluate linkage accuracy overall and in patients with severe sepsis or septic shock, and 3) identify variables important to linkage accuracy.OBJECTIVESTo 1) probabilistically link two important pediatric data sources, Virtual Pediatric Systems and PEDSnet, 2) evaluate linkage accuracy overall and in patients with severe sepsis or septic shock, and 3) identify variables important to linkage accuracy.Retrospective linkage of prospectively collected datasets from Virtual Pediatrics Systems, Inc (Los Angeles, CA) and the PEDSnet consortium.DESIGNRetrospective linkage of prospectively collected datasets from Virtual Pediatrics Systems, Inc (Los Angeles, CA) and the PEDSnet consortium.Single-center academic PICU.SETTINGSingle-center academic PICU.All PICU encounters between January 1, 2012, and December 31, 2017, that were deterministically matched between the two datasets.PATIENTSAll PICU encounters between January 1, 2012, and December 31, 2017, that were deterministically matched between the two datasets.None.INTERVENTIONSNone.We abstracted records from Virtual Pediatric Systems and PEDSnet corresponding to PICU encounters and probabilistically linked using 44 features shared by the two datasets. We generated a gold standard deterministic linkage using protected health information elements, which were then removed from datasets. We then calculated candidate pair log-likelihood ratios for all pairs of subjects and selected optimal pairs in a two-stage algorithm. A total of 22,051 gold standard PICU encounter pairs were identified over the study period. The optimal linkage model demonstrated excellent discrimination (area under the receiver operating characteristic curve > 0.99); 19,801 cases (89.9%) were matched with 13 false positives. The addition of two protected health information dates (admission month, birth day-of-year) increased to 20,189 (91.6%) the cases matched, with three false positives. Restricting to patients with Virtual Pediatric Systems diagnosis of severe sepsis or septic shock (n = 1,340 [6.1%]) matched 1,250 cases (93.2%) with zero false positives. Increased number of laboratory values present in the first 12 hours of admission significantly increased log-likelihood ratios, suggesting stronger candidate pair matching.MEASUREMENTS AND MAIN RESULTSWe abstracted records from Virtual Pediatric Systems and PEDSnet corresponding to PICU encounters and probabilistically linked using 44 features shared by the two datasets. We generated a gold standard deterministic linkage using protected health information elements, which were then removed from datasets. We then calculated candidate pair log-likelihood ratios for all pairs of subjects and selected optimal pairs in a two-stage algorithm. A total of 22,051 gold standard PICU encounter pairs were identified over the study period. The optimal linkage model demonstrated excellent discrimination (area under the receiver operating characteristic curve > 0.99); 19,801 cases (89.9%) were matched with 13 false positives. The addition of two protected health information dates (admission month, birth day-of-year) increased to 20,189 (91.6%) the cases matched, with three false positives. Restricting to patients with Virtual Pediatric Systems diagnosis of severe sepsis or septic shock (n = 1,340 [6.1%]) matched 1,250 cases (93.2%) with zero false positives. Increased number of laboratory values present in the first 12 hours of admission significantly increased log-likelihood ratios, suggesting stronger candidate pair matching.We demonstrated the use of probabilistic linkage to accurately join two complementary pediatric critical care datasets at a single academic PICU in the absence of protected health information. Combining datasets with curated diagnoses and granular measurements can validate patient acuity metrics and facilitate multicenter machine learning algorithms. We anticipate these methods will generalize to other common PICU diagnoses.CONCLUSIONSWe demonstrated the use of probabilistic linkage to accurately join two complementary pediatric critical care datasets at a single academic PICU in the absence of protected health information. Combining datasets with curated diagnoses and granular measurements can validate patient acuity metrics and facilitate multicenter machine learning algorithms. We anticipate these methods will generalize to other common PICU diagnoses. |
| Author | Bennett, Tellen D Dziorny, Adam C Lindell, Robert B Bailey, L Charles |
| Author_xml | – sequence: 1 givenname: Adam C surname: Dziorny fullname: Dziorny, Adam C organization: Division of Critical Care, Department of Anesthesiology and Critical Care, Children's Hospital of Philadelphia, Philadelphia, PA – sequence: 2 givenname: Robert B surname: Lindell fullname: Lindell, Robert B organization: Division of Critical Care, Department of Anesthesiology and Critical Care, Children's Hospital of Philadelphia, Philadelphia, PA – sequence: 3 givenname: Tellen D surname: Bennett fullname: Bennett, Tellen D organization: Sections of Informatics and Data Science and Critical Care Medicine, Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO – sequence: 4 givenname: L Charles surname: Bailey fullname: Bailey, L Charles organization: Division of Oncology, Department of Pediatrics, Children's Hospital of Philadelphia, Philadelphia, PA |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/32511201$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNkD1PwzAYhD0U0Q_4Bwh5ZEnx6yR1wobSFooqEal8jJETvymGxCm2M_TfE4kiccsN9-ikuykZmc4gIVfA5sBScZtn2Zz9Ew8TNiITiHkaiDiMx2Tq3CdjkC4icU7GIY8BOIMJUU-dNtrs6VJ66dA7-q79R9d7ulFovK41WndHc9uVstSNdl5XdKvNl9wj7Wr6pq3vZUNzVFp6O4S7o_PYOiqNovlquTPoL8hZLRuHlyefkdf16iV7DLbPD5vsfhtUYcyTIGIgBpVRgophpRKo05oLOawIh5VCibQUtShrIRIYEhnKUkhMeLSoBgr4jNz89h5s992j80WrXYVNIw12vSt4BAAMkuGXGbk-oX3ZoioOVrfSHou_Z_gPQuZlzA |
| CitedBy_id | crossref_primary_10_1016_j_childyouth_2023_107284 crossref_primary_10_1097_PCC_0000000000002392 crossref_primary_10_1080_02739615_2023_2294782 crossref_primary_10_1542_hpeds_2023_007397 |
| ContentType | Journal Article |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1097/PCC.0000000000002380 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Medicine |
| ExternalDocumentID | 32511201 |
| Genre | Multicenter Study Journal Article |
| GeographicLocations | Los Angeles |
| GeographicLocations_xml | – name: Los Angeles |
| GroupedDBID | --- .-D .Z2 0R~ 123 53G 5VS 71W AAAAV AAHPQ AAIQE AAJCS AAMTA AARTV AASCR AAUEB AAWTL AAYEP ABASU ABBUW ABDIG ABJNI ABPPZ ABPXF ABVCZ ABXVJ ABXYN ABZAD ABZZY ACDDN ACDOF ACEWG ACGFO ACGFS ACILI ACWDW ACWRI ACXJB ACXNZ ADFPA ADGGA ADHPY ADNKB AE3 AEBDS AEETU AENEX AFBFQ AFDTB AFEXH AFMBP AFNMH AFSOK AGINI AHQNM AHQVU AHVBC AINUH AJCLO AJIOK AJNWD AJNYG AJZMW AKCTQ ALKUP ALMA_UNASSIGNED_HOLDINGS ALMTX AMJPA AMKUR AMNEI AOHHW AOQMC BQLVK BS7 C45 CGR CS3 CUY CVF DIWNM DU5 DUNZO E.X EBS ECM EEVPB EIF EJD EX3 F5P FCALG FL- GNXGY GQDEL H0~ HLJTE HZ~ IKREB IN~ IPNFZ JK3 JK8 K8S KD2 L-C N9A NPM N~M O9- OCUKA ODA ODMTH OHYEH OPUJH ORVUJ OUVQU OVD OVDNE OVOZU OXXIT P2P R58 RIG RLZ S4R S4S TEORI TSPGW V2I W3M WOQ WOW X3V X3W YFH ZFV ZZMQN 7X8 ADKSD |
| ID | FETCH-LOGICAL-c3528-4017777b48ed0ecd81f9f27a52930977d79b7f7bf77819f2a3ab7ae8246c7a512 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 5 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000571080600005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1529-7535 |
| IngestDate | Sun Sep 28 10:19:24 EDT 2025 Mon Jul 21 05:55:54 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 9 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c3528-4017777b48ed0ecd81f9f27a52930977d79b7f7bf77819f2a3ab7ae8246c7a512 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| PMID | 32511201 |
| PQID | 2411101815 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_2411101815 pubmed_primary_32511201 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-09-00 20200901 |
| PublicationDateYYYYMMDD | 2020-09-01 |
| PublicationDate_xml | – month: 09 year: 2020 text: 2020-09-00 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Pediatric critical care medicine |
| PublicationTitleAlternate | Pediatr Crit Care Med |
| PublicationYear | 2020 |
| References | 32890090 - Pediatr Crit Care Med. 2020 Sep;21(9):848-849 |
| References_xml | – reference: 32890090 - Pediatr Crit Care Med. 2020 Sep;21(9):848-849 |
| SSID | ssj0019647 |
| Score | 2.2934384 |
| Snippet | To 1) probabilistically link two important pediatric data sources, Virtual Pediatric Systems and PEDSnet, 2) evaluate linkage accuracy overall and in patients... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | e628 |
| SubjectTerms | Child Humans Infant Intensive Care Units, Pediatric Los Angeles Pediatrics Retrospective Studies Sepsis Shock, Septic |
| Title | Joining Datasets Without Identifiers: Probabilistic Linkage of Virtual Pediatric Systems and PEDSnet |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/32511201 https://www.proquest.com/docview/2411101815 |
| Volume | 21 |
| WOSCitedRecordID | wos000571080600005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEA7qinjx_VhfRPAa1qbNpvUisg9E2KXga28lzQP30q7brr_fSZp1T4JgDj01UKaTzHyZzPchdGOYoFEuBVGcahIpakjSjRTJpZZxJACRhMKJTfDxOJ5MktQfuFX-WuVyT3QbtSqlPSPvQKQJHLsUu599EqsaZaurXkJjHbVCSGWsV_PJqopguywdXypNCKTlbNk6l_BO2us11IV-QOS6_T3JdMFmuPvfz9xDOz7NxA-NX-yjNV0coK2RL6QfIvVUOmUI3Bc1xLG6wu_T-qNc1Ljp3DVWIfsOp3NY7vb6rGVzxha3wvaDS4PfpnPbeIJ_lD6wpz7HolA4HfSfC10fodfh4KX3SLzgApGW5AWwZMBh5FGs1a2WKg5MYigXYL8QjMYVT3JueG44h0TCUBGKnAsd06gr4a2AHqONoiz0KcJU8S7ThjIdqkhSFisNyCU0MgREyVjeRtdL-2Xg0LZKIQpdLqpsZcE2Oml-QjZrmDey0AIiSFnO_jD7HG1Ti43dfbAL1DKwnPUl2pRf9bSaXzlPgec4HX0DtcLHOQ |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Joining+Datasets+Without+Identifiers%3A+Probabilistic+Linkage+of+Virtual+Pediatric+Systems+and+PEDSnet&rft.jtitle=Pediatric+critical+care+medicine&rft.au=Dziorny%2C+Adam+C&rft.au=Lindell%2C+Robert+B&rft.au=Bennett%2C+Tellen+D&rft.au=Bailey%2C+L+Charles&rft.date=2020-09-01&rft.issn=1529-7535&rft.volume=21&rft.issue=9&rft.spage=e628&rft_id=info:doi/10.1097%2FPCC.0000000000002380&rft_id=info%3Apmid%2F32511201&rft_id=info%3Apmid%2F32511201&rft.externalDocID=32511201 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1529-7535&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1529-7535&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1529-7535&client=summon |