Joining Datasets Without Identifiers: Probabilistic Linkage of Virtual Pediatric Systems and PEDSnet

To 1) probabilistically link two important pediatric data sources, Virtual Pediatric Systems and PEDSnet, 2) evaluate linkage accuracy overall and in patients with severe sepsis or septic shock, and 3) identify variables important to linkage accuracy. Retrospective linkage of prospectively collected...

Full description

Saved in:
Bibliographic Details
Published in:Pediatric critical care medicine Vol. 21; no. 9; p. e628
Main Authors: Dziorny, Adam C, Lindell, Robert B, Bennett, Tellen D, Bailey, L Charles
Format: Journal Article
Language:English
Published: United States 01.09.2020
Subjects:
ISSN:1529-7535
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract To 1) probabilistically link two important pediatric data sources, Virtual Pediatric Systems and PEDSnet, 2) evaluate linkage accuracy overall and in patients with severe sepsis or septic shock, and 3) identify variables important to linkage accuracy. Retrospective linkage of prospectively collected datasets from Virtual Pediatrics Systems, Inc (Los Angeles, CA) and the PEDSnet consortium. Single-center academic PICU. All PICU encounters between January 1, 2012, and December 31, 2017, that were deterministically matched between the two datasets. None. We abstracted records from Virtual Pediatric Systems and PEDSnet corresponding to PICU encounters and probabilistically linked using 44 features shared by the two datasets. We generated a gold standard deterministic linkage using protected health information elements, which were then removed from datasets. We then calculated candidate pair log-likelihood ratios for all pairs of subjects and selected optimal pairs in a two-stage algorithm. A total of 22,051 gold standard PICU encounter pairs were identified over the study period. The optimal linkage model demonstrated excellent discrimination (area under the receiver operating characteristic curve > 0.99); 19,801 cases (89.9%) were matched with 13 false positives. The addition of two protected health information dates (admission month, birth day-of-year) increased to 20,189 (91.6%) the cases matched, with three false positives. Restricting to patients with Virtual Pediatric Systems diagnosis of severe sepsis or septic shock (n = 1,340 [6.1%]) matched 1,250 cases (93.2%) with zero false positives. Increased number of laboratory values present in the first 12 hours of admission significantly increased log-likelihood ratios, suggesting stronger candidate pair matching. We demonstrated the use of probabilistic linkage to accurately join two complementary pediatric critical care datasets at a single academic PICU in the absence of protected health information. Combining datasets with curated diagnoses and granular measurements can validate patient acuity metrics and facilitate multicenter machine learning algorithms. We anticipate these methods will generalize to other common PICU diagnoses.
AbstractList To 1) probabilistically link two important pediatric data sources, Virtual Pediatric Systems and PEDSnet, 2) evaluate linkage accuracy overall and in patients with severe sepsis or septic shock, and 3) identify variables important to linkage accuracy. Retrospective linkage of prospectively collected datasets from Virtual Pediatrics Systems, Inc (Los Angeles, CA) and the PEDSnet consortium. Single-center academic PICU. All PICU encounters between January 1, 2012, and December 31, 2017, that were deterministically matched between the two datasets. None. We abstracted records from Virtual Pediatric Systems and PEDSnet corresponding to PICU encounters and probabilistically linked using 44 features shared by the two datasets. We generated a gold standard deterministic linkage using protected health information elements, which were then removed from datasets. We then calculated candidate pair log-likelihood ratios for all pairs of subjects and selected optimal pairs in a two-stage algorithm. A total of 22,051 gold standard PICU encounter pairs were identified over the study period. The optimal linkage model demonstrated excellent discrimination (area under the receiver operating characteristic curve > 0.99); 19,801 cases (89.9%) were matched with 13 false positives. The addition of two protected health information dates (admission month, birth day-of-year) increased to 20,189 (91.6%) the cases matched, with three false positives. Restricting to patients with Virtual Pediatric Systems diagnosis of severe sepsis or septic shock (n = 1,340 [6.1%]) matched 1,250 cases (93.2%) with zero false positives. Increased number of laboratory values present in the first 12 hours of admission significantly increased log-likelihood ratios, suggesting stronger candidate pair matching. We demonstrated the use of probabilistic linkage to accurately join two complementary pediatric critical care datasets at a single academic PICU in the absence of protected health information. Combining datasets with curated diagnoses and granular measurements can validate patient acuity metrics and facilitate multicenter machine learning algorithms. We anticipate these methods will generalize to other common PICU diagnoses.
To 1) probabilistically link two important pediatric data sources, Virtual Pediatric Systems and PEDSnet, 2) evaluate linkage accuracy overall and in patients with severe sepsis or septic shock, and 3) identify variables important to linkage accuracy.OBJECTIVESTo 1) probabilistically link two important pediatric data sources, Virtual Pediatric Systems and PEDSnet, 2) evaluate linkage accuracy overall and in patients with severe sepsis or septic shock, and 3) identify variables important to linkage accuracy.Retrospective linkage of prospectively collected datasets from Virtual Pediatrics Systems, Inc (Los Angeles, CA) and the PEDSnet consortium.DESIGNRetrospective linkage of prospectively collected datasets from Virtual Pediatrics Systems, Inc (Los Angeles, CA) and the PEDSnet consortium.Single-center academic PICU.SETTINGSingle-center academic PICU.All PICU encounters between January 1, 2012, and December 31, 2017, that were deterministically matched between the two datasets.PATIENTSAll PICU encounters between January 1, 2012, and December 31, 2017, that were deterministically matched between the two datasets.None.INTERVENTIONSNone.We abstracted records from Virtual Pediatric Systems and PEDSnet corresponding to PICU encounters and probabilistically linked using 44 features shared by the two datasets. We generated a gold standard deterministic linkage using protected health information elements, which were then removed from datasets. We then calculated candidate pair log-likelihood ratios for all pairs of subjects and selected optimal pairs in a two-stage algorithm. A total of 22,051 gold standard PICU encounter pairs were identified over the study period. The optimal linkage model demonstrated excellent discrimination (area under the receiver operating characteristic curve > 0.99); 19,801 cases (89.9%) were matched with 13 false positives. The addition of two protected health information dates (admission month, birth day-of-year) increased to 20,189 (91.6%) the cases matched, with three false positives. Restricting to patients with Virtual Pediatric Systems diagnosis of severe sepsis or septic shock (n = 1,340 [6.1%]) matched 1,250 cases (93.2%) with zero false positives. Increased number of laboratory values present in the first 12 hours of admission significantly increased log-likelihood ratios, suggesting stronger candidate pair matching.MEASUREMENTS AND MAIN RESULTSWe abstracted records from Virtual Pediatric Systems and PEDSnet corresponding to PICU encounters and probabilistically linked using 44 features shared by the two datasets. We generated a gold standard deterministic linkage using protected health information elements, which were then removed from datasets. We then calculated candidate pair log-likelihood ratios for all pairs of subjects and selected optimal pairs in a two-stage algorithm. A total of 22,051 gold standard PICU encounter pairs were identified over the study period. The optimal linkage model demonstrated excellent discrimination (area under the receiver operating characteristic curve > 0.99); 19,801 cases (89.9%) were matched with 13 false positives. The addition of two protected health information dates (admission month, birth day-of-year) increased to 20,189 (91.6%) the cases matched, with three false positives. Restricting to patients with Virtual Pediatric Systems diagnosis of severe sepsis or septic shock (n = 1,340 [6.1%]) matched 1,250 cases (93.2%) with zero false positives. Increased number of laboratory values present in the first 12 hours of admission significantly increased log-likelihood ratios, suggesting stronger candidate pair matching.We demonstrated the use of probabilistic linkage to accurately join two complementary pediatric critical care datasets at a single academic PICU in the absence of protected health information. Combining datasets with curated diagnoses and granular measurements can validate patient acuity metrics and facilitate multicenter machine learning algorithms. We anticipate these methods will generalize to other common PICU diagnoses.CONCLUSIONSWe demonstrated the use of probabilistic linkage to accurately join two complementary pediatric critical care datasets at a single academic PICU in the absence of protected health information. Combining datasets with curated diagnoses and granular measurements can validate patient acuity metrics and facilitate multicenter machine learning algorithms. We anticipate these methods will generalize to other common PICU diagnoses.
Author Bennett, Tellen D
Dziorny, Adam C
Lindell, Robert B
Bailey, L Charles
Author_xml – sequence: 1
  givenname: Adam C
  surname: Dziorny
  fullname: Dziorny, Adam C
  organization: Division of Critical Care, Department of Anesthesiology and Critical Care, Children's Hospital of Philadelphia, Philadelphia, PA
– sequence: 2
  givenname: Robert B
  surname: Lindell
  fullname: Lindell, Robert B
  organization: Division of Critical Care, Department of Anesthesiology and Critical Care, Children's Hospital of Philadelphia, Philadelphia, PA
– sequence: 3
  givenname: Tellen D
  surname: Bennett
  fullname: Bennett, Tellen D
  organization: Sections of Informatics and Data Science and Critical Care Medicine, Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO
– sequence: 4
  givenname: L Charles
  surname: Bailey
  fullname: Bailey, L Charles
  organization: Division of Oncology, Department of Pediatrics, Children's Hospital of Philadelphia, Philadelphia, PA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/32511201$$D View this record in MEDLINE/PubMed
BookMark eNpNkD1PwzAYhD0U0Q_4Bwh5ZEnx6yR1wobSFooqEal8jJETvymGxCm2M_TfE4kiccsN9-ikuykZmc4gIVfA5sBScZtn2Zz9Ew8TNiITiHkaiDiMx2Tq3CdjkC4icU7GIY8BOIMJUU-dNtrs6VJ66dA7-q79R9d7ulFovK41WndHc9uVstSNdl5XdKvNl9wj7Wr6pq3vZUNzVFp6O4S7o_PYOiqNovlquTPoL8hZLRuHlyefkdf16iV7DLbPD5vsfhtUYcyTIGIgBpVRgophpRKo05oLOawIh5VCibQUtShrIRIYEhnKUkhMeLSoBgr4jNz89h5s992j80WrXYVNIw12vSt4BAAMkuGXGbk-oX3ZoioOVrfSHou_Z_gPQuZlzA
CitedBy_id crossref_primary_10_1016_j_childyouth_2023_107284
crossref_primary_10_1097_PCC_0000000000002392
crossref_primary_10_1080_02739615_2023_2294782
crossref_primary_10_1542_hpeds_2023_007397
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1097/PCC.0000000000002380
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Medicine
ExternalDocumentID 32511201
Genre Multicenter Study
Journal Article
GeographicLocations Los Angeles
GeographicLocations_xml – name: Los Angeles
GroupedDBID ---
.-D
.Z2
0R~
123
53G
5VS
71W
AAAAV
AAHPQ
AAIQE
AAJCS
AAMTA
AARTV
AASCR
AAUEB
AAWTL
AAYEP
ABASU
ABBUW
ABDIG
ABJNI
ABPPZ
ABPXF
ABVCZ
ABXVJ
ABXYN
ABZAD
ABZZY
ACDDN
ACDOF
ACEWG
ACGFO
ACGFS
ACILI
ACWDW
ACWRI
ACXJB
ACXNZ
ADFPA
ADGGA
ADHPY
ADNKB
AE3
AEBDS
AEETU
AENEX
AFBFQ
AFDTB
AFEXH
AFMBP
AFNMH
AFSOK
AGINI
AHQNM
AHQVU
AHVBC
AINUH
AJCLO
AJIOK
AJNWD
AJNYG
AJZMW
AKCTQ
ALKUP
ALMA_UNASSIGNED_HOLDINGS
ALMTX
AMJPA
AMKUR
AMNEI
AOHHW
AOQMC
BQLVK
BS7
C45
CGR
CS3
CUY
CVF
DIWNM
DU5
DUNZO
E.X
EBS
ECM
EEVPB
EIF
EJD
EX3
F5P
FCALG
FL-
GNXGY
GQDEL
H0~
HLJTE
HZ~
IKREB
IN~
IPNFZ
JK3
JK8
K8S
KD2
L-C
N9A
NPM
N~M
O9-
OCUKA
ODA
ODMTH
OHYEH
OPUJH
ORVUJ
OUVQU
OVD
OVDNE
OVOZU
OXXIT
P2P
R58
RIG
RLZ
S4R
S4S
TEORI
TSPGW
V2I
W3M
WOQ
WOW
X3V
X3W
YFH
ZFV
ZZMQN
7X8
ADKSD
ID FETCH-LOGICAL-c3528-4017777b48ed0ecd81f9f27a52930977d79b7f7bf77819f2a3ab7ae8246c7a512
IEDL.DBID 7X8
ISICitedReferencesCount 5
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000571080600005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1529-7535
IngestDate Sun Sep 28 10:19:24 EDT 2025
Mon Jul 21 05:55:54 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 9
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c3528-4017777b48ed0ecd81f9f27a52930977d79b7f7bf77819f2a3ab7ae8246c7a512
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PMID 32511201
PQID 2411101815
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2411101815
pubmed_primary_32511201
PublicationCentury 2000
PublicationDate 2020-09-00
20200901
PublicationDateYYYYMMDD 2020-09-01
PublicationDate_xml – month: 09
  year: 2020
  text: 2020-09-00
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Pediatric critical care medicine
PublicationTitleAlternate Pediatr Crit Care Med
PublicationYear 2020
References 32890090 - Pediatr Crit Care Med. 2020 Sep;21(9):848-849
References_xml – reference: 32890090 - Pediatr Crit Care Med. 2020 Sep;21(9):848-849
SSID ssj0019647
Score 2.2934384
Snippet To 1) probabilistically link two important pediatric data sources, Virtual Pediatric Systems and PEDSnet, 2) evaluate linkage accuracy overall and in patients...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage e628
SubjectTerms Child
Humans
Infant
Intensive Care Units, Pediatric
Los Angeles
Pediatrics
Retrospective Studies
Sepsis
Shock, Septic
Title Joining Datasets Without Identifiers: Probabilistic Linkage of Virtual Pediatric Systems and PEDSnet
URI https://www.ncbi.nlm.nih.gov/pubmed/32511201
https://www.proquest.com/docview/2411101815
Volume 21
WOSCitedRecordID wos000571080600005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3JTsMwELWAIsSFfSmbjMQ1auI0ccwFoS5CiFaR2HqLHC-il6Q0Kd_P2HHpCQkJH3JKpGg89rzn8bxB6EYroSTTzOOsCwRFxsTLGUyIT1SUA4EGyGB1Zp_oeJxMJix1B26Vu1a53BPtRi1LYc7IOxBpAqsuFd3NPj3TNcpkV10LjXXUCgHKGK-mk1UWwVRZWr1UwjyA5dGydI7RTtrrNdKFbkDk8n8HmTbYDHf_-5t7aMfBTHzf-MU-WlPFAdoauUT6IZKPpe0Mgfu8hjhWV_h9Wn-Uixo3lbvadMi-xekclru5PmvUnLHhrbD94FLjt-ncFJ7gn04f2EmfY15InA76z4Wqj9DrcPDSe_BcwwVPGJEX4JIBhZF3EyV9JWQSaKYJ5WC_EIxGJWU51TTXlAKQ0ISHPKdcJaQbC3grIMdooygLdYqwpr6WTBh8JoHhJFyQWIW-4MLnsfJVG10v7ZeBQ5ssBS9UuaiylQXb6KSZhGzWKG9koSFEAFnO_vD1Odomhhvb-2AXqKVhOatLtCm-6mk1v7KeAs9xOvoG2L7G4A
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Joining+Datasets+Without+Identifiers%3A+Probabilistic+Linkage+of+Virtual+Pediatric+Systems+and+PEDSnet&rft.jtitle=Pediatric+critical+care+medicine&rft.au=Dziorny%2C+Adam+C&rft.au=Lindell%2C+Robert+B&rft.au=Bennett%2C+Tellen+D&rft.au=Bailey%2C+L+Charles&rft.date=2020-09-01&rft.issn=1529-7535&rft.volume=21&rft.issue=9&rft.spage=e628&rft_id=info:doi/10.1097%2FPCC.0000000000002380&rft_id=info%3Apmid%2F32511201&rft_id=info%3Apmid%2F32511201&rft.externalDocID=32511201
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1529-7535&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1529-7535&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1529-7535&client=summon