Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm

We propose a one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) across multiple clinical sites. ODAL effectively utilizes the information from the local site (where the patient-level data are accessible) and incorporates the first-order (ODAL1) and second-order...

Full description

Saved in:
Bibliographic Details
Published in:Journal of the American Medical Informatics Association : JAMIA Vol. 27; no. 3; p. 376
Main Authors: Duan, Rui, Boland, Mary Regina, Liu, Zixuan, Liu, Yue, Chang, Howard H, Xu, Hua, Chu, Haitao, Schmid, Christopher H, Forrest, Christopher B, Holmes, John H, Schuemie, Martijn J, Berlin, Jesse A, Moore, Jason H, Chen, Yong
Format: Journal Article
Language:English
Published: England 01.03.2020
Subjects:
ISSN:1527-974X, 1527-974X
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract We propose a one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) across multiple clinical sites. ODAL effectively utilizes the information from the local site (where the patient-level data are accessible) and incorporates the first-order (ODAL1) and second-order (ODAL2) gradients of the likelihood function from other sites to construct an estimator without requiring iterative communication across sites or transferring patient-level data. We evaluated ODAL via extensive simulation studies and an application to a dataset from the University of Pennsylvania Health System. The estimation accuracy was evaluated by comparing it with the estimator based on the combined individual participant data or pooled data (ie, gold standard). Our simulation studies revealed that the relative estimation bias of ODAL1 compared with the pooled estimates was <3%, and the ratio of standard errors was <1.25 for all scenarios. ODAL2 achieved higher accuracy (with relative bias <0.1% and ratio of standard errors <1.05). In real data analysis, we investigated the associations of 100 medications with fetal loss during pregnancy. We found that ODAL1 provided estimates with relative bias <10% for 85% of medications, and ODAL2 has relative bias <10% for 99% of medications. For communication cost, ODAL1 requires transferring p numbers from each site to the local site and ODAL2 requires transferring (p×p+p) numbers from each site to the local site, where p is the number of parameters in the regression model. This study demonstrates that ODAL is privacy-preserving and communication-efficient with small bias and high statistical efficiency.
AbstractList We propose a one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) across multiple clinical sites. ODAL effectively utilizes the information from the local site (where the patient-level data are accessible) and incorporates the first-order (ODAL1) and second-order (ODAL2) gradients of the likelihood function from other sites to construct an estimator without requiring iterative communication across sites or transferring patient-level data. We evaluated ODAL via extensive simulation studies and an application to a dataset from the University of Pennsylvania Health System. The estimation accuracy was evaluated by comparing it with the estimator based on the combined individual participant data or pooled data (ie, gold standard). Our simulation studies revealed that the relative estimation bias of ODAL1 compared with the pooled estimates was <3%, and the ratio of standard errors was <1.25 for all scenarios. ODAL2 achieved higher accuracy (with relative bias <0.1% and ratio of standard errors <1.05). In real data analysis, we investigated the associations of 100 medications with fetal loss during pregnancy. We found that ODAL1 provided estimates with relative bias <10% for 85% of medications, and ODAL2 has relative bias <10% for 99% of medications. For communication cost, ODAL1 requires transferring p numbers from each site to the local site and ODAL2 requires transferring (p×p+p) numbers from each site to the local site, where p is the number of parameters in the regression model. This study demonstrates that ODAL is privacy-preserving and communication-efficient with small bias and high statistical efficiency.
We propose a one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) across multiple clinical sites.OBJECTIVESWe propose a one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) across multiple clinical sites.ODAL effectively utilizes the information from the local site (where the patient-level data are accessible) and incorporates the first-order (ODAL1) and second-order (ODAL2) gradients of the likelihood function from other sites to construct an estimator without requiring iterative communication across sites or transferring patient-level data. We evaluated ODAL via extensive simulation studies and an application to a dataset from the University of Pennsylvania Health System. The estimation accuracy was evaluated by comparing it with the estimator based on the combined individual participant data or pooled data (ie, gold standard).MATERIALS AND METHODSODAL effectively utilizes the information from the local site (where the patient-level data are accessible) and incorporates the first-order (ODAL1) and second-order (ODAL2) gradients of the likelihood function from other sites to construct an estimator without requiring iterative communication across sites or transferring patient-level data. We evaluated ODAL via extensive simulation studies and an application to a dataset from the University of Pennsylvania Health System. The estimation accuracy was evaluated by comparing it with the estimator based on the combined individual participant data or pooled data (ie, gold standard).Our simulation studies revealed that the relative estimation bias of ODAL1 compared with the pooled estimates was <3%, and the ratio of standard errors was <1.25 for all scenarios. ODAL2 achieved higher accuracy (with relative bias <0.1% and ratio of standard errors <1.05). In real data analysis, we investigated the associations of 100 medications with fetal loss during pregnancy. We found that ODAL1 provided estimates with relative bias <10% for 85% of medications, and ODAL2 has relative bias <10% for 99% of medications. For communication cost, ODAL1 requires transferring p numbers from each site to the local site and ODAL2 requires transferring (p×p+p) numbers from each site to the local site, where p is the number of parameters in the regression model.RESULTSOur simulation studies revealed that the relative estimation bias of ODAL1 compared with the pooled estimates was <3%, and the ratio of standard errors was <1.25 for all scenarios. ODAL2 achieved higher accuracy (with relative bias <0.1% and ratio of standard errors <1.05). In real data analysis, we investigated the associations of 100 medications with fetal loss during pregnancy. We found that ODAL1 provided estimates with relative bias <10% for 85% of medications, and ODAL2 has relative bias <10% for 99% of medications. For communication cost, ODAL1 requires transferring p numbers from each site to the local site and ODAL2 requires transferring (p×p+p) numbers from each site to the local site, where p is the number of parameters in the regression model.This study demonstrates that ODAL is privacy-preserving and communication-efficient with small bias and high statistical efficiency.CONCLUSIONSThis study demonstrates that ODAL is privacy-preserving and communication-efficient with small bias and high statistical efficiency.
Author Forrest, Christopher B
Chu, Haitao
Chang, Howard H
Xu, Hua
Berlin, Jesse A
Moore, Jason H
Duan, Rui
Schuemie, Martijn J
Holmes, John H
Liu, Yue
Boland, Mary Regina
Schmid, Christopher H
Chen, Yong
Liu, Zixuan
Author_xml – sequence: 1
  givenname: Rui
  surname: Duan
  fullname: Duan, Rui
  organization: Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
– sequence: 2
  givenname: Mary Regina
  surname: Boland
  fullname: Boland, Mary Regina
  organization: Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
– sequence: 3
  givenname: Zixuan
  surname: Liu
  fullname: Liu, Zixuan
  organization: Department of Electrical Engineering, Stanford University, Stanford, California, USA
– sequence: 4
  givenname: Yue
  surname: Liu
  fullname: Liu, Yue
  organization: Department of Statistics, Harvard University, Cambridge, Massachusetts, USA
– sequence: 5
  givenname: Howard H
  surname: Chang
  fullname: Chang, Howard H
  organization: Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, USA
– sequence: 6
  givenname: Hua
  surname: Xu
  fullname: Xu, Hua
  organization: School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
– sequence: 7
  givenname: Haitao
  surname: Chu
  fullname: Chu, Haitao
  organization: Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, USA
– sequence: 8
  givenname: Christopher H
  surname: Schmid
  fullname: Schmid, Christopher H
  organization: Department of Biostatistics, Brown University, Providence, Rhode Island, USA
– sequence: 9
  givenname: Christopher B
  surname: Forrest
  fullname: Forrest, Christopher B
  organization: Division of General Pediatrics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
– sequence: 10
  givenname: John H
  surname: Holmes
  fullname: Holmes, John H
  organization: Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
– sequence: 11
  givenname: Martijn J
  surname: Schuemie
  fullname: Schuemie, Martijn J
  organization: Janssen Research and Development LLC, Titusville, New Jersey, USA
– sequence: 12
  givenname: Jesse A
  surname: Berlin
  fullname: Berlin, Jesse A
  organization: Janssen Research and Development LLC, Titusville, New Jersey, USA
– sequence: 13
  givenname: Jason H
  surname: Moore
  fullname: Moore, Jason H
  organization: Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
– sequence: 14
  givenname: Yong
  surname: Chen
  fullname: Chen, Yong
  organization: Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/31816040$$D View this record in MEDLINE/PubMed
BookMark eNpNkL1PwzAUxC1UxEdhY0YeWQKxkzg1W1XxJVVi6cAWOc8vravYLrZTqez837SiSEx3w-_u6d0lGTnvkJAblt-zXBYPa2WNevDwxaQ8IRes4nUm6_Jj9M-fk8sY13nOBC-qM3JesAkTeZlfkO85quCMW9IueEuxR0jBOwN0hapPKxoQfNCRKgg-RmqHPplNjzSahPGRTil4a4d9QCXjXYZdZ8CgS1Q5TTfBbBXssk3AiGF7OKNNTMG0Q0JNVb_0waSVvSKnneojXh91TBbPT4vZazZ_f3mbTecZFJMyZVJBp0qZK8ZVratStqIt67podTkRwACkgFpUXDOdM9m1exa4Fl1VtFDzio_J3W_tJvjPAWNqrImAfa8c-iE2vOBFWQshDujtER1ai7rZf2JV2DV_y_EfIMF2vA
CitedBy_id crossref_primary_10_1016_j_jisa_2022_103201
crossref_primary_10_1002_sim_9868
crossref_primary_10_1016_j_jbi_2024_104595
crossref_primary_10_2196_53622
crossref_primary_10_1038_s41598_022_14029_9
crossref_primary_10_1146_annurev_biodatasci_103123_094441
crossref_primary_10_1139_facets_2020_0050
crossref_primary_10_1038_s41746_025_01781_1
crossref_primary_10_1016_j_artmed_2021_102024
crossref_primary_10_3389_fams_2023_1267034
crossref_primary_10_1038_s41576_020_0224_1
crossref_primary_10_1038_s41746_021_00494_5
crossref_primary_10_1145_3514500
crossref_primary_10_1007_s41666_020_00082_4
crossref_primary_10_1016_j_csbj_2024_03_028
crossref_primary_10_1016_j_jbi_2023_104485
crossref_primary_10_1093_jamia_ocad170
crossref_primary_10_1038_s41746_022_00615_8
crossref_primary_10_1146_annurev_biodatasci_122220_115746
crossref_primary_10_1109_ACCESS_2023_3281832
crossref_primary_10_1016_j_eswa_2023_123024
crossref_primary_10_1055_s_0041_1731784
crossref_primary_10_1038_s41598_021_99078_2
crossref_primary_10_1093_biostatistics_kxac006
crossref_primary_10_1002_cpe_8257
crossref_primary_10_1093_jamia_ocaa096
crossref_primary_10_1016_j_jbi_2022_104097
crossref_primary_10_1148_ryai_230006
crossref_primary_10_1038_s41746_025_01803_y
crossref_primary_10_1007_s00259_021_05339_7
crossref_primary_10_1007_s12561_024_09445_6
crossref_primary_10_1007_s12561_024_09449_2
crossref_primary_10_1093_biomet_asab007
crossref_primary_10_1080_01621459_2021_1904958
crossref_primary_10_1177_20552076251353694
crossref_primary_10_1093_jamia_ocae027
crossref_primary_10_1007_s11222_024_10523_4
crossref_primary_10_1109_JIOT_2023_3307675
crossref_primary_10_1093_jamia_ocac067
crossref_primary_10_1007_s12561_025_09496_3
crossref_primary_10_3233_HIS_220006
crossref_primary_10_1016_j_compbiomed_2025_111084
crossref_primary_10_1016_j_jbi_2022_104243
crossref_primary_10_1016_j_jbi_2025_104780
crossref_primary_10_1371_journal_pone_0280192
crossref_primary_10_1016_j_neuroimage_2021_118822
crossref_primary_10_1038_s43018_021_00236_2
crossref_primary_10_1097_CCM_0000000000006777
crossref_primary_10_1038_s41467_020_20211_2
crossref_primary_10_1080_01621459_2025_2453249
crossref_primary_10_1093_jamia_ocae075
crossref_primary_10_1038_s41467_022_29160_4
crossref_primary_10_1002_sim_10250
crossref_primary_10_1186_s12859_022_04934_1
crossref_primary_10_1093_jamia_ocae313
crossref_primary_10_3389_frai_2021_746497
crossref_primary_10_1038_s41598_022_09069_0
ContentType Journal Article
Copyright The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Copyright_xml – notice: The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1093/jamia/ocz199
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Medicine
EISSN 1527-974X
ExternalDocumentID 31816040
Genre Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NIMH NIH HHS
  grantid: P50 MH113840
– fundername: NIAID NIH HHS
  grantid: R01 AI130460
– fundername: NIAID NIH HHS
  grantid: R01 AI116794
– fundername: NCI NIH HHS
  grantid: P30 CA077598
– fundername: NCATS NIH HHS
  grantid: UL1 TR002494
– fundername: NLM NIH HHS
  grantid: R01 LM012607
– fundername: NLM NIH HHS
  grantid: R01 LM009012
GroupedDBID ---
.DC
0R~
18M
29L
2WC
4.4
48X
53G
5GY
5RE
5WD
6PF
7~T
AABZA
AACZT
AAMVS
AAOGV
AAPQZ
AAPXW
AARHZ
AAUAY
AAUQX
AAVAP
AAWTL
ABDFA
ABEJV
ABEUO
ABGNP
ABIXL
ABJNI
ABNHQ
ABOCM
ABPQP
ABPTD
ABQLI
ABQNK
ABVGC
ABWST
ABXVV
ACGFO
ACGFS
ACGOD
ACHQT
ACUFI
ACUTJ
ACYHN
ADBBV
ADGZP
ADHKW
ADHZD
ADIPN
ADNBA
ADQBN
ADRTK
ADVEK
ADYVW
AEGPL
AEJOX
AEKSI
AEMDU
AEMQT
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFZL
AFIYH
AFOFC
AFXAL
AFYAG
AGINJ
AGQXC
AGSYK
AGUTN
AHMBA
AHMMS
AJEEA
AJNCP
ALIPV
ALMA_UNASSIGNED_HOLDINGS
ALUQC
ALXQX
APIBT
ATGXG
AVWKF
AXUDD
AYCSE
BAWUL
BAYMD
BCRHZ
BEYMZ
BHONS
BTRTY
BVRKM
C45
CDBKE
CGR
CS3
CUY
CVF
DAKXR
DIK
DILTD
DU5
E3Z
EBD
EBS
ECM
EIF
EMOBN
ENERS
F5P
FDB
FECEO
FLUFQ
FOEOM
FOTVD
FQBLK
G-Q
GAUVT
GJXCC
GX1
H13
HAR
IH2
IHE
J21
JXSIZ
KBUDW
KOP
KSI
KSN
LSO
MHKGH
NOMLY
NOYVH
NPM
NQ-
O9-
OAUYM
OAWHX
OCZFY
ODMLO
OJQWA
OJZSN
OK1
OPAEJ
OVD
OWPYF
P2P
PAFKI
PEELM
Q5Y
ROX
ROZ
RPM
RPZ
RUSNO
RWL
RXO
SV3
TAE
TEORI
TJX
TMA
WOW
YAYTL
YKOAZ
YXANX
~S-
77I
7X8
AHGBF
AJBYB
ID FETCH-LOGICAL-c384t-9acfa490a12a7d549b6b4773bd486c1cc96c7652d1d019fba49c2d6f53bc7252
IEDL.DBID 7X8
ISICitedReferencesCount 70
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000548302800005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1527-974X
IngestDate Mon Nov 17 22:46:32 EST 2025
Thu Apr 03 07:09:19 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 3
Keywords distributed algorithm
learning health system
logistic regression
electronic health record
Language English
License The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c384t-9acfa490a12a7d549b6b4773bd486c1cc96c7652d1d019fba49c2d6f53bc7252
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://academic.oup.com/jamia/article-pdf/27/3/376/34152548/ocz199.pdf
PMID 31816040
PQID 2323476665
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2323476665
pubmed_primary_31816040
PublicationCentury 2000
PublicationDate 2020-03-01
PublicationDateYYYYMMDD 2020-03-01
PublicationDate_xml – month: 03
  year: 2020
  text: 2020-03-01
  day: 01
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Journal of the American Medical Informatics Association : JAMIA
PublicationTitleAlternate J Am Med Inform Assoc
PublicationYear 2020
SSID ssj0016235
Score 2.5863717
Snippet We propose a one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) across multiple clinical sites. ODAL effectively utilizes...
We propose a one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) across multiple clinical sites.OBJECTIVESWe propose a...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 376
SubjectTerms Algorithms
Computer Simulation
Confidentiality
Data Analysis
Datasets as Topic
Drug-Related Side Effects and Adverse Reactions
Electronic Health Records
Female
Fetal Death - etiology
Humans
Logistic Models
Odds Ratio
Pregnancy
Title Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm
URI https://www.ncbi.nlm.nih.gov/pubmed/31816040
https://www.proquest.com/docview/2323476665
Volume 27
WOSCitedRecordID wos000548302800005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELaAIsTC-1FeMhKr1TycOGFBFaJioFWHDtki-5yUSpCUplSCnf_N2UlVFiQklkyObNnnu-98d98RcoMHqZVUGQPhe4wrwZl0wGdSKrAQPea2UPhJDAZRksTD5sGtatIqlzrRKmpdgnkj76Dl97lAsB3cTd-Y6RploqtNC4110vIRypiULpGsogho2gPLl-oJhrg5aRLf0Yk3rEMT2Snhs2Z9_QVcWiPT2_3v8vbITgMvabeWh32ylhUHZKvfBNAPyVfDpzqmpq6Erprg0LogktaPNhWV1nzSZb4hNfNUt7RL4WdFCcssAwUaLioLTaezyULCBzPJtUYH4TTaEPOanlqZpvJljEueP78ekVHvYXT_yJpODAz8iM9ZLCGXPHak60mh0aVUoeJC-ErzKAQXIA5BhIGnXY2QMVc4Fjwd5oGvQHiBd0w2irLITgkV2gnzmOcxSgVXDkRSCPShVITXP_eDvE2ul_uboqCb6IUssvK9Slc73CYn9SGl05qRI0XF5Iaojs7-8Pc52faMz2zzyC5IK8drnl2STVjMJ9XsykoQfgfD_je1FdXX
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Learning+from+electronic+health+records+across+multiple+sites%3A+A+communication-efficient+and+privacy-preserving+distributed+algorithm&rft.jtitle=Journal+of+the+American+Medical+Informatics+Association+%3A+JAMIA&rft.au=Duan%2C+Rui&rft.au=Boland%2C+Mary+Regina&rft.au=Liu%2C+Zixuan&rft.au=Liu%2C+Yue&rft.date=2020-03-01&rft.issn=1527-974X&rft.eissn=1527-974X&rft.volume=27&rft.issue=3&rft.spage=376&rft_id=info:doi/10.1093%2Fjamia%2Focz199&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1527-974X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1527-974X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1527-974X&client=summon