Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm
We propose a one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) across multiple clinical sites. ODAL effectively utilizes the information from the local site (where the patient-level data are accessible) and incorporates the first-order (ODAL1) and second-order...
Saved in:
| Published in: | Journal of the American Medical Informatics Association : JAMIA Vol. 27; no. 3; p. 376 |
|---|---|
| Main Authors: | , , , , , , , , , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
England
01.03.2020
|
| Subjects: | |
| ISSN: | 1527-974X, 1527-974X |
| Online Access: | Get more information |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | We propose a one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) across multiple clinical sites.
ODAL effectively utilizes the information from the local site (where the patient-level data are accessible) and incorporates the first-order (ODAL1) and second-order (ODAL2) gradients of the likelihood function from other sites to construct an estimator without requiring iterative communication across sites or transferring patient-level data. We evaluated ODAL via extensive simulation studies and an application to a dataset from the University of Pennsylvania Health System. The estimation accuracy was evaluated by comparing it with the estimator based on the combined individual participant data or pooled data (ie, gold standard).
Our simulation studies revealed that the relative estimation bias of ODAL1 compared with the pooled estimates was <3%, and the ratio of standard errors was <1.25 for all scenarios. ODAL2 achieved higher accuracy (with relative bias <0.1% and ratio of standard errors <1.05). In real data analysis, we investigated the associations of 100 medications with fetal loss during pregnancy. We found that ODAL1 provided estimates with relative bias <10% for 85% of medications, and ODAL2 has relative bias <10% for 99% of medications. For communication cost, ODAL1 requires transferring p numbers from each site to the local site and ODAL2 requires transferring (p×p+p) numbers from each site to the local site, where p is the number of parameters in the regression model.
This study demonstrates that ODAL is privacy-preserving and communication-efficient with small bias and high statistical efficiency. |
|---|---|
| AbstractList | We propose a one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) across multiple clinical sites.
ODAL effectively utilizes the information from the local site (where the patient-level data are accessible) and incorporates the first-order (ODAL1) and second-order (ODAL2) gradients of the likelihood function from other sites to construct an estimator without requiring iterative communication across sites or transferring patient-level data. We evaluated ODAL via extensive simulation studies and an application to a dataset from the University of Pennsylvania Health System. The estimation accuracy was evaluated by comparing it with the estimator based on the combined individual participant data or pooled data (ie, gold standard).
Our simulation studies revealed that the relative estimation bias of ODAL1 compared with the pooled estimates was <3%, and the ratio of standard errors was <1.25 for all scenarios. ODAL2 achieved higher accuracy (with relative bias <0.1% and ratio of standard errors <1.05). In real data analysis, we investigated the associations of 100 medications with fetal loss during pregnancy. We found that ODAL1 provided estimates with relative bias <10% for 85% of medications, and ODAL2 has relative bias <10% for 99% of medications. For communication cost, ODAL1 requires transferring p numbers from each site to the local site and ODAL2 requires transferring (p×p+p) numbers from each site to the local site, where p is the number of parameters in the regression model.
This study demonstrates that ODAL is privacy-preserving and communication-efficient with small bias and high statistical efficiency. We propose a one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) across multiple clinical sites.OBJECTIVESWe propose a one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) across multiple clinical sites.ODAL effectively utilizes the information from the local site (where the patient-level data are accessible) and incorporates the first-order (ODAL1) and second-order (ODAL2) gradients of the likelihood function from other sites to construct an estimator without requiring iterative communication across sites or transferring patient-level data. We evaluated ODAL via extensive simulation studies and an application to a dataset from the University of Pennsylvania Health System. The estimation accuracy was evaluated by comparing it with the estimator based on the combined individual participant data or pooled data (ie, gold standard).MATERIALS AND METHODSODAL effectively utilizes the information from the local site (where the patient-level data are accessible) and incorporates the first-order (ODAL1) and second-order (ODAL2) gradients of the likelihood function from other sites to construct an estimator without requiring iterative communication across sites or transferring patient-level data. We evaluated ODAL via extensive simulation studies and an application to a dataset from the University of Pennsylvania Health System. The estimation accuracy was evaluated by comparing it with the estimator based on the combined individual participant data or pooled data (ie, gold standard).Our simulation studies revealed that the relative estimation bias of ODAL1 compared with the pooled estimates was <3%, and the ratio of standard errors was <1.25 for all scenarios. ODAL2 achieved higher accuracy (with relative bias <0.1% and ratio of standard errors <1.05). In real data analysis, we investigated the associations of 100 medications with fetal loss during pregnancy. We found that ODAL1 provided estimates with relative bias <10% for 85% of medications, and ODAL2 has relative bias <10% for 99% of medications. For communication cost, ODAL1 requires transferring p numbers from each site to the local site and ODAL2 requires transferring (p×p+p) numbers from each site to the local site, where p is the number of parameters in the regression model.RESULTSOur simulation studies revealed that the relative estimation bias of ODAL1 compared with the pooled estimates was <3%, and the ratio of standard errors was <1.25 for all scenarios. ODAL2 achieved higher accuracy (with relative bias <0.1% and ratio of standard errors <1.05). In real data analysis, we investigated the associations of 100 medications with fetal loss during pregnancy. We found that ODAL1 provided estimates with relative bias <10% for 85% of medications, and ODAL2 has relative bias <10% for 99% of medications. For communication cost, ODAL1 requires transferring p numbers from each site to the local site and ODAL2 requires transferring (p×p+p) numbers from each site to the local site, where p is the number of parameters in the regression model.This study demonstrates that ODAL is privacy-preserving and communication-efficient with small bias and high statistical efficiency.CONCLUSIONSThis study demonstrates that ODAL is privacy-preserving and communication-efficient with small bias and high statistical efficiency. |
| Author | Forrest, Christopher B Chu, Haitao Chang, Howard H Xu, Hua Berlin, Jesse A Moore, Jason H Duan, Rui Schuemie, Martijn J Holmes, John H Liu, Yue Boland, Mary Regina Schmid, Christopher H Chen, Yong Liu, Zixuan |
| Author_xml | – sequence: 1 givenname: Rui surname: Duan fullname: Duan, Rui organization: Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA – sequence: 2 givenname: Mary Regina surname: Boland fullname: Boland, Mary Regina organization: Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA – sequence: 3 givenname: Zixuan surname: Liu fullname: Liu, Zixuan organization: Department of Electrical Engineering, Stanford University, Stanford, California, USA – sequence: 4 givenname: Yue surname: Liu fullname: Liu, Yue organization: Department of Statistics, Harvard University, Cambridge, Massachusetts, USA – sequence: 5 givenname: Howard H surname: Chang fullname: Chang, Howard H organization: Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, USA – sequence: 6 givenname: Hua surname: Xu fullname: Xu, Hua organization: School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA – sequence: 7 givenname: Haitao surname: Chu fullname: Chu, Haitao organization: Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, USA – sequence: 8 givenname: Christopher H surname: Schmid fullname: Schmid, Christopher H organization: Department of Biostatistics, Brown University, Providence, Rhode Island, USA – sequence: 9 givenname: Christopher B surname: Forrest fullname: Forrest, Christopher B organization: Division of General Pediatrics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA – sequence: 10 givenname: John H surname: Holmes fullname: Holmes, John H organization: Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA – sequence: 11 givenname: Martijn J surname: Schuemie fullname: Schuemie, Martijn J organization: Janssen Research and Development LLC, Titusville, New Jersey, USA – sequence: 12 givenname: Jesse A surname: Berlin fullname: Berlin, Jesse A organization: Janssen Research and Development LLC, Titusville, New Jersey, USA – sequence: 13 givenname: Jason H surname: Moore fullname: Moore, Jason H organization: Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA – sequence: 14 givenname: Yong surname: Chen fullname: Chen, Yong organization: Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/31816040$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNkL1PwzAUxC1UxEdhY0YeWQKxkzg1W1XxJVVi6cAWOc8vravYLrZTqez837SiSEx3w-_u6d0lGTnvkJAblt-zXBYPa2WNevDwxaQ8IRes4nUm6_Jj9M-fk8sY13nOBC-qM3JesAkTeZlfkO85quCMW9IueEuxR0jBOwN0hapPKxoQfNCRKgg-RmqHPplNjzSahPGRTil4a4d9QCXjXYZdZ8CgS1Q5TTfBbBXssk3AiGF7OKNNTMG0Q0JNVb_0waSVvSKnneojXh91TBbPT4vZazZ_f3mbTecZFJMyZVJBp0qZK8ZVratStqIt67podTkRwACkgFpUXDOdM9m1exa4Fl1VtFDzio_J3W_tJvjPAWNqrImAfa8c-iE2vOBFWQshDujtER1ai7rZf2JV2DV_y_EfIMF2vA |
| CitedBy_id | crossref_primary_10_1016_j_jisa_2022_103201 crossref_primary_10_1002_sim_9868 crossref_primary_10_1016_j_jbi_2024_104595 crossref_primary_10_2196_53622 crossref_primary_10_1038_s41598_022_14029_9 crossref_primary_10_1146_annurev_biodatasci_103123_094441 crossref_primary_10_1139_facets_2020_0050 crossref_primary_10_1038_s41746_025_01781_1 crossref_primary_10_1016_j_artmed_2021_102024 crossref_primary_10_3389_fams_2023_1267034 crossref_primary_10_1038_s41576_020_0224_1 crossref_primary_10_1038_s41746_021_00494_5 crossref_primary_10_1145_3514500 crossref_primary_10_1007_s41666_020_00082_4 crossref_primary_10_1016_j_csbj_2024_03_028 crossref_primary_10_1016_j_jbi_2023_104485 crossref_primary_10_1093_jamia_ocad170 crossref_primary_10_1038_s41746_022_00615_8 crossref_primary_10_1146_annurev_biodatasci_122220_115746 crossref_primary_10_1109_ACCESS_2023_3281832 crossref_primary_10_1016_j_eswa_2023_123024 crossref_primary_10_1055_s_0041_1731784 crossref_primary_10_1038_s41598_021_99078_2 crossref_primary_10_1093_biostatistics_kxac006 crossref_primary_10_1002_cpe_8257 crossref_primary_10_1093_jamia_ocaa096 crossref_primary_10_1016_j_jbi_2022_104097 crossref_primary_10_1148_ryai_230006 crossref_primary_10_1038_s41746_025_01803_y crossref_primary_10_1007_s00259_021_05339_7 crossref_primary_10_1007_s12561_024_09445_6 crossref_primary_10_1007_s12561_024_09449_2 crossref_primary_10_1093_biomet_asab007 crossref_primary_10_1080_01621459_2021_1904958 crossref_primary_10_1177_20552076251353694 crossref_primary_10_1093_jamia_ocae027 crossref_primary_10_1007_s11222_024_10523_4 crossref_primary_10_1109_JIOT_2023_3307675 crossref_primary_10_1093_jamia_ocac067 crossref_primary_10_1007_s12561_025_09496_3 crossref_primary_10_3233_HIS_220006 crossref_primary_10_1016_j_compbiomed_2025_111084 crossref_primary_10_1016_j_jbi_2022_104243 crossref_primary_10_1016_j_jbi_2025_104780 crossref_primary_10_1371_journal_pone_0280192 crossref_primary_10_1016_j_neuroimage_2021_118822 crossref_primary_10_1038_s43018_021_00236_2 crossref_primary_10_1097_CCM_0000000000006777 crossref_primary_10_1038_s41467_020_20211_2 crossref_primary_10_1080_01621459_2025_2453249 crossref_primary_10_1093_jamia_ocae075 crossref_primary_10_1038_s41467_022_29160_4 crossref_primary_10_1002_sim_10250 crossref_primary_10_1186_s12859_022_04934_1 crossref_primary_10_1093_jamia_ocae313 crossref_primary_10_3389_frai_2021_746497 crossref_primary_10_1038_s41598_022_09069_0 |
| ContentType | Journal Article |
| Copyright | The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com. |
| Copyright_xml | – notice: The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com. |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1093/jamia/ocz199 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Medicine |
| EISSN | 1527-974X |
| ExternalDocumentID | 31816040 |
| Genre | Journal Article Research Support, N.I.H., Extramural |
| GrantInformation_xml | – fundername: NIMH NIH HHS grantid: P50 MH113840 – fundername: NIAID NIH HHS grantid: R01 AI130460 – fundername: NIAID NIH HHS grantid: R01 AI116794 – fundername: NCI NIH HHS grantid: P30 CA077598 – fundername: NCATS NIH HHS grantid: UL1 TR002494 – fundername: NLM NIH HHS grantid: R01 LM012607 – fundername: NLM NIH HHS grantid: R01 LM009012 |
| GroupedDBID | --- .DC 0R~ 18M 29L 2WC 4.4 48X 53G 5GY 5RE 5WD 6PF 7~T AABZA AACZT AAMVS AAOGV AAPQZ AAPXW AARHZ AAUAY AAUQX AAVAP AAWTL ABDFA ABEJV ABEUO ABGNP ABIXL ABJNI ABNHQ ABOCM ABPQP ABPTD ABQLI ABQNK ABVGC ABWST ABXVV ACGFO ACGFS ACGOD ACHQT ACUFI ACUTJ ACYHN ADBBV ADGZP ADHKW ADHZD ADIPN ADNBA ADQBN ADRTK ADVEK ADYVW AEGPL AEJOX AEKSI AEMDU AEMQT AENEX AENZO AEPUE AETBJ AEWNT AFFZL AFIYH AFOFC AFXAL AFYAG AGINJ AGQXC AGSYK AGUTN AHMBA AHMMS AJEEA AJNCP ALIPV ALMA_UNASSIGNED_HOLDINGS ALUQC ALXQX APIBT ATGXG AVWKF AXUDD AYCSE BAWUL BAYMD BCRHZ BEYMZ BHONS BTRTY BVRKM C45 CDBKE CGR CS3 CUY CVF DAKXR DIK DILTD DU5 E3Z EBD EBS ECM EIF EMOBN ENERS F5P FDB FECEO FLUFQ FOEOM FOTVD FQBLK G-Q GAUVT GJXCC GX1 H13 HAR IH2 IHE J21 JXSIZ KBUDW KOP KSI KSN LSO MHKGH NOMLY NOYVH NPM NQ- O9- OAUYM OAWHX OCZFY ODMLO OJQWA OJZSN OK1 OPAEJ OVD OWPYF P2P PAFKI PEELM Q5Y ROX ROZ RPM RPZ RUSNO RWL RXO SV3 TAE TEORI TJX TMA WOW YAYTL YKOAZ YXANX ~S- 77I 7X8 AHGBF AJBYB |
| ID | FETCH-LOGICAL-c384t-9acfa490a12a7d549b6b4773bd486c1cc96c7652d1d019fba49c2d6f53bc7252 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 70 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000548302800005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1527-974X |
| IngestDate | Mon Nov 17 22:46:32 EST 2025 Thu Apr 03 07:09:19 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 3 |
| Keywords | distributed algorithm learning health system logistic regression electronic health record |
| Language | English |
| License | The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c384t-9acfa490a12a7d549b6b4773bd486c1cc96c7652d1d019fba49c2d6f53bc7252 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| OpenAccessLink | https://academic.oup.com/jamia/article-pdf/27/3/376/34152548/ocz199.pdf |
| PMID | 31816040 |
| PQID | 2323476665 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_2323476665 pubmed_primary_31816040 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-03-01 |
| PublicationDateYYYYMMDD | 2020-03-01 |
| PublicationDate_xml | – month: 03 year: 2020 text: 2020-03-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | England |
| PublicationPlace_xml | – name: England |
| PublicationTitle | Journal of the American Medical Informatics Association : JAMIA |
| PublicationTitleAlternate | J Am Med Inform Assoc |
| PublicationYear | 2020 |
| SSID | ssj0016235 |
| Score | 2.5863717 |
| Snippet | We propose a one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) across multiple clinical sites.
ODAL effectively utilizes... We propose a one-shot, privacy-preserving distributed algorithm to perform logistic regression (ODAL) across multiple clinical sites.OBJECTIVESWe propose a... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 376 |
| SubjectTerms | Algorithms Computer Simulation Confidentiality Data Analysis Datasets as Topic Drug-Related Side Effects and Adverse Reactions Electronic Health Records Female Fetal Death - etiology Humans Logistic Models Odds Ratio Pregnancy |
| Title | Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/31816040 https://www.proquest.com/docview/2323476665 |
| Volume | 27 |
| WOSCitedRecordID | wos000548302800005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELaAIsTC-1FeMhKr1TycOGFBFaJioFWHDtki-5yUSpCUplSCnf_N2UlVFiQklkyObNnnu-98d98RcoMHqZVUGQPhe4wrwZl0wGdSKrAQPea2UPhJDAZRksTD5sGtatIqlzrRKmpdgnkj76Dl97lAsB3cTd-Y6RploqtNC4110vIRypiULpGsogho2gPLl-oJhrg5aRLf0Yk3rEMT2Snhs2Z9_QVcWiPT2_3v8vbITgMvabeWh32ylhUHZKvfBNAPyVfDpzqmpq6Erprg0LogktaPNhWV1nzSZb4hNfNUt7RL4WdFCcssAwUaLioLTaezyULCBzPJtUYH4TTaEPOanlqZpvJljEueP78ekVHvYXT_yJpODAz8iM9ZLCGXPHak60mh0aVUoeJC-ErzKAQXIA5BhIGnXY2QMVc4Fjwd5oGvQHiBd0w2irLITgkV2gnzmOcxSgVXDkRSCPShVITXP_eDvE2ul_uboqCb6IUssvK9Slc73CYn9SGl05qRI0XF5Iaojs7-8Pc52faMz2zzyC5IK8drnl2STVjMJ9XsykoQfgfD_je1FdXX |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Learning+from+electronic+health+records+across+multiple+sites%3A+A+communication-efficient+and+privacy-preserving+distributed+algorithm&rft.jtitle=Journal+of+the+American+Medical+Informatics+Association+%3A+JAMIA&rft.au=Duan%2C+Rui&rft.au=Boland%2C+Mary+Regina&rft.au=Liu%2C+Zixuan&rft.au=Liu%2C+Yue&rft.date=2020-03-01&rft.issn=1527-974X&rft.eissn=1527-974X&rft.volume=27&rft.issue=3&rft.spage=376&rft_id=info:doi/10.1093%2Fjamia%2Focz199&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1527-974X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1527-974X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1527-974X&client=summon |