Generating sequential electronic health records using dual adversarial autoencoder
Recent studies on electronic health records (EHRs) started to learn deep generative models and synthesize a huge amount of realistic records, in order to address significant privacy issues surrounding the EHR. However, most of them only focus on structured records about patients' independent vi...
Uloženo v:
| Vydáno v: | Journal of the American Medical Informatics Association : JAMIA Ročník 27; číslo 9; s. 1411 |
|---|---|
| Hlavní autoři: | , , , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
England
01.09.2020
|
| Témata: | |
| ISSN: | 1527-974X, 1527-974X |
| On-line přístup: | Zjistit podrobnosti o přístupu |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Recent studies on electronic health records (EHRs) started to learn deep generative models and synthesize a huge amount of realistic records, in order to address significant privacy issues surrounding the EHR. However, most of them only focus on structured records about patients' independent visits, rather than on chronological clinical records. In this article, we aim to learn and synthesize realistic sequences of EHRs based on the generative autoencoder.
We propose a dual adversarial autoencoder (DAAE), which learns set-valued sequences of medical entities, by combining a recurrent autoencoder with 2 generative adversarial networks (GANs). DAAE improves the mode coverage and quality of generated sequences by adversarially learning both the continuous latent distribution and the discrete data distribution. Using the MIMIC-III (Medical Information Mart for Intensive Care-III) and UT Physicians clinical databases, we evaluated the performances of DAAE in terms of predictive modeling, plausibility, and privacy preservation.
Our generated sequences of EHRs showed the comparable performances to real data for a predictive modeling task, and achieved the best score in plausibility evaluation conducted by medical experts among all baseline models. In addition, differentially private optimization of our model enables to generate synthetic sequences without increasing the privacy leakage of patients' data.
DAAE can effectively synthesize sequential EHRs by addressing its main challenges: the synthetic records should be realistic enough not to be distinguished from the real records, and they should cover all the training patients to reproduce the performance of specific downstream tasks. |
|---|---|
| AbstractList | Recent studies on electronic health records (EHRs) started to learn deep generative models and synthesize a huge amount of realistic records, in order to address significant privacy issues surrounding the EHR. However, most of them only focus on structured records about patients' independent visits, rather than on chronological clinical records. In this article, we aim to learn and synthesize realistic sequences of EHRs based on the generative autoencoder.OBJECTIVERecent studies on electronic health records (EHRs) started to learn deep generative models and synthesize a huge amount of realistic records, in order to address significant privacy issues surrounding the EHR. However, most of them only focus on structured records about patients' independent visits, rather than on chronological clinical records. In this article, we aim to learn and synthesize realistic sequences of EHRs based on the generative autoencoder.We propose a dual adversarial autoencoder (DAAE), which learns set-valued sequences of medical entities, by combining a recurrent autoencoder with 2 generative adversarial networks (GANs). DAAE improves the mode coverage and quality of generated sequences by adversarially learning both the continuous latent distribution and the discrete data distribution. Using the MIMIC-III (Medical Information Mart for Intensive Care-III) and UT Physicians clinical databases, we evaluated the performances of DAAE in terms of predictive modeling, plausibility, and privacy preservation.MATERIALS AND METHODSWe propose a dual adversarial autoencoder (DAAE), which learns set-valued sequences of medical entities, by combining a recurrent autoencoder with 2 generative adversarial networks (GANs). DAAE improves the mode coverage and quality of generated sequences by adversarially learning both the continuous latent distribution and the discrete data distribution. Using the MIMIC-III (Medical Information Mart for Intensive Care-III) and UT Physicians clinical databases, we evaluated the performances of DAAE in terms of predictive modeling, plausibility, and privacy preservation.Our generated sequences of EHRs showed the comparable performances to real data for a predictive modeling task, and achieved the best score in plausibility evaluation conducted by medical experts among all baseline models. In addition, differentially private optimization of our model enables to generate synthetic sequences without increasing the privacy leakage of patients' data.RESULTSOur generated sequences of EHRs showed the comparable performances to real data for a predictive modeling task, and achieved the best score in plausibility evaluation conducted by medical experts among all baseline models. In addition, differentially private optimization of our model enables to generate synthetic sequences without increasing the privacy leakage of patients' data.DAAE can effectively synthesize sequential EHRs by addressing its main challenges: the synthetic records should be realistic enough not to be distinguished from the real records, and they should cover all the training patients to reproduce the performance of specific downstream tasks.CONCLUSIONSDAAE can effectively synthesize sequential EHRs by addressing its main challenges: the synthetic records should be realistic enough not to be distinguished from the real records, and they should cover all the training patients to reproduce the performance of specific downstream tasks. Recent studies on electronic health records (EHRs) started to learn deep generative models and synthesize a huge amount of realistic records, in order to address significant privacy issues surrounding the EHR. However, most of them only focus on structured records about patients' independent visits, rather than on chronological clinical records. In this article, we aim to learn and synthesize realistic sequences of EHRs based on the generative autoencoder. We propose a dual adversarial autoencoder (DAAE), which learns set-valued sequences of medical entities, by combining a recurrent autoencoder with 2 generative adversarial networks (GANs). DAAE improves the mode coverage and quality of generated sequences by adversarially learning both the continuous latent distribution and the discrete data distribution. Using the MIMIC-III (Medical Information Mart for Intensive Care-III) and UT Physicians clinical databases, we evaluated the performances of DAAE in terms of predictive modeling, plausibility, and privacy preservation. Our generated sequences of EHRs showed the comparable performances to real data for a predictive modeling task, and achieved the best score in plausibility evaluation conducted by medical experts among all baseline models. In addition, differentially private optimization of our model enables to generate synthetic sequences without increasing the privacy leakage of patients' data. DAAE can effectively synthesize sequential EHRs by addressing its main challenges: the synthetic records should be realistic enough not to be distinguished from the real records, and they should cover all the training patients to reproduce the performance of specific downstream tasks. |
| Author | Yu, Hwanjo Rogith, Deevakar Xiong, Li Gudala, Meghana Tejani, Mubeen Zhang, Qiuchen Jiang, Xiaoqian Lee, Dongha |
| Author_xml | – sequence: 1 givenname: Dongha surname: Lee fullname: Lee, Dongha organization: Department of Computer Science and Engineering, Pohang University of Science and Technology, Pohang, South Korea – sequence: 2 givenname: Hwanjo surname: Yu fullname: Yu, Hwanjo organization: Department of Computer Science and Engineering, Pohang University of Science and Technology, Pohang, South Korea – sequence: 3 givenname: Xiaoqian surname: Jiang fullname: Jiang, Xiaoqian organization: School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA – sequence: 4 givenname: Deevakar surname: Rogith fullname: Rogith, Deevakar organization: School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA – sequence: 5 givenname: Meghana surname: Gudala fullname: Gudala, Meghana organization: School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA – sequence: 6 givenname: Mubeen surname: Tejani fullname: Tejani, Mubeen organization: School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA – sequence: 7 givenname: Qiuchen surname: Zhang fullname: Zhang, Qiuchen organization: Department of Computer Science, Emory University, Atlanta, Georgia, USA – sequence: 8 givenname: Li surname: Xiong fullname: Xiong, Li organization: Department of Computer Science, Emory University, Atlanta, Georgia, USA |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/32989459$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNkE1LxDAYhIOsuB969Co9eqmbj6ZpjrLoKiwIouCtvE3eul3aZE1SwX_viit4mjk8MwwzJxPnHRJyyegNo1osdzB0sPQGgDF9QmZMcpVrVbxN_vkpmce4o5SVXMgzMhVcV7qQekae1-gwQOrcexbxY0SXOugz7NGk4F1nsi1Cn7ZZQOODjdkYf1A7HiCwnxgihJ8AjMmjM95iOCenLfQRL466IK_3dy-rh3zztH5c3W5yU3Caci40IOXStNbIQijGdcMqFJI3bWmN4Iq1FdDSlqAR26pBLA1YVQEvtVTAF-T6t3cf_GF4TPXQRYN9Dw79GGteFEowpkR1QK-O6NgMaOt96AYIX_XfD_wb8g9kRQ |
| CitedBy_id | crossref_primary_10_1055_s_0042_1760247 crossref_primary_10_1093_jamia_ocad166 crossref_primary_10_1093_jamia_ocac131 crossref_primary_10_3389_fgene_2025_1451290 crossref_primary_10_1038_s41746_024_01409_w crossref_primary_10_2196_53008 crossref_primary_10_1016_j_cosrev_2023_100546 crossref_primary_10_1038_s41746_023_00834_7 crossref_primary_10_1145_3636424 crossref_primary_10_1186_s12874_024_02304_4 crossref_primary_10_3390_biomedicines11061749 crossref_primary_10_1016_j_ijmedinf_2024_105413 crossref_primary_10_1016_j_neucom_2022_04_053 crossref_primary_10_2196_68830 crossref_primary_10_1016_j_jbi_2021_103977 crossref_primary_10_1109_TKDE_2023_3310909 crossref_primary_10_1007_s41060_024_00653_3 crossref_primary_10_1109_ACCESS_2024_3523330 crossref_primary_10_1038_s41598_022_07545_1 crossref_primary_10_1007_s10726_024_09902_z crossref_primary_10_1093_jamia_ocab135 crossref_primary_10_1093_jamia_ocaa187 crossref_primary_10_1145_3614425 crossref_primary_10_1016_j_cmpb_2024_108571 crossref_primary_10_1007_s10462_023_10624_y crossref_primary_10_1093_jamia_ocaf082 crossref_primary_10_1016_j_compbiomed_2023_107655 crossref_primary_10_1016_j_neucom_2024_128253 crossref_primary_10_1145_3459992 crossref_primary_10_3390_s23146571 |
| ContentType | Journal Article |
| Copyright | The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com. |
| Copyright_xml | – notice: The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com. |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1093/jamia/ocaa119 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Medicine |
| EISSN | 1527-974X |
| ExternalDocumentID | 32989459 |
| Genre | Research Support, Non-U.S. Gov't Journal Article Research Support, N.I.H., Extramural |
| GrantInformation_xml | – fundername: NIGMS NIH HHS grantid: R01 GM114612 – fundername: NIGMS NIH HHS grantid: R01 GM118574 – fundername: NCATS NIH HHS grantid: U01 TR002062 |
| GroupedDBID | --- .DC 0R~ 18M 29L 2WC 4.4 48X 53G 5GY 5RE 5WD 6PF 7~T AABZA AACZT AAMVS AAOGV AAPQZ AAPXW AARHZ AAUAY AAUQX AAVAP AAWTL ABDFA ABEJV ABEUO ABGNP ABIXL ABJNI ABNHQ ABOCM ABPQP ABPTD ABQLI ABQNK ABVGC ABWST ABXVV ACGFO ACGFS ACGOD ACHQT ACUFI ACUTJ ACYHN ADBBV ADGZP ADHKW ADHZD ADIPN ADNBA ADQBN ADRTK ADVEK ADYVW AEGPL AEJOX AEKSI AEMDU AEMQT AENEX AENZO AEPUE AETBJ AEWNT AFFZL AFIYH AFOFC AFXAL AFYAG AGINJ AGORE AGQXC AGSYK AGUTN AHGBF AHMBA AHMMS AJBYB AJEEA AJNCP ALIPV ALMA_UNASSIGNED_HOLDINGS ALUQC ALXQX APIBT ATGXG AVWKF AXUDD AYCSE BAWUL BAYMD BCRHZ BEYMZ BHONS BTRTY BVRKM C45 CDBKE CGR CS3 CUY CVF DAKXR DIK DILTD DU5 E3Z EBD EBS ECM EIF EMOBN ENERS F5P FDB FECEO FLUFQ FOEOM FOTVD FQBLK G-Q GAUVT GJXCC GX1 H13 HAR IH2 IHE J21 JXSIZ KBUDW KOP KSI KSN LSO MHKGH NOMLY NOYVH NPM NQ- O9- OAUYM OAWHX OCZFY ODMLO OJQWA OJZSN OK1 OPAEJ OVD OWPYF P2P PAFKI PEELM Q5Y ROX ROZ RPM RPZ RUSNO RWL RXO SV3 TAE TEORI TJX TMA WOW YAYTL YKOAZ YXANX ~S- 77I 7X8 |
| ID | FETCH-LOGICAL-c420t-239ae025cfdc5437129b18e352bf6dc3271f8a06d6a9eef8bee6cad78a26957a2 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 40 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000593113300010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1527-974X |
| IngestDate | Sat Sep 27 21:25:56 EDT 2025 Mon Jul 21 05:51:37 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 9 |
| Keywords | sequential data generation generative autoencoder generative adversarial networks (GANs) electornic health records (EHRs) differential privacy |
| Language | English |
| License | The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c420t-239ae025cfdc5437129b18e352bf6dc3271f8a06d6a9eef8bee6cad78a26957a2 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| OpenAccessLink | https://www.ncbi.nlm.nih.gov/pmc/articles/7647348 |
| PMID | 32989459 |
| PQID | 2447311738 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_2447311738 pubmed_primary_32989459 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-09-01 |
| PublicationDateYYYYMMDD | 2020-09-01 |
| PublicationDate_xml | – month: 09 year: 2020 text: 2020-09-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | England |
| PublicationPlace_xml | – name: England |
| PublicationTitle | Journal of the American Medical Informatics Association : JAMIA |
| PublicationTitleAlternate | J Am Med Inform Assoc |
| PublicationYear | 2020 |
| SSID | ssj0016235 |
| Score | 2.5553613 |
| Snippet | Recent studies on electronic health records (EHRs) started to learn deep generative models and synthesize a huge amount of realistic records, in order to... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 1411 |
| SubjectTerms | Computer Simulation Confidentiality Electronic Health Records Humans Machine Learning Neural Networks, Computer Software |
| Title | Generating sequential electronic health records using dual adversarial autoencoder |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/32989459 https://www.proquest.com/docview/2447311738 |
| Volume | 27 |
| WOSCitedRecordID | wos000593113300010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEA7qinjx_VhfRPAatk2apjmJiIsHd1lEpbeSV0WQdt3u-vudNF32JoKX3krLMJnvyzy-QegGQNIwqymRqc5IkiaG6JQJIimNdQn4FMbH3p7EeJzluZx0Cbema6tcxsQ2UNva-Bz5AGBIsDgWLLudfhG_NcpXV7sVGuuox4DK-JYuka-qCADtvNVLpfBxkeSdxiZc4r3q0IcaAFyoOP6FXbYoM9z97__toZ2OX-K74BD7aM1VB2hr1FXQD9Fz0Jn2zc44tFHDEf_Eq204OExG4pC9abBvjH_HfmILK7-8uVHeZbFazGuvgWnd7Ai9Dh9e7h9Jt1eBmIRGc0KZVA64jimt4QkTAPk6zhxQMV2m1jAq4jJTUWpTJZ0rM-1capQVmaKp5ELRY7RR1ZU7RVjGiY6MoJHmFqgIV47LzJnSQSgwius-ul5aqwC_9cUIVbl60RQre_XRSTB5MQ0CGwVrZeG5PPvD2-dom_orcNv2dYF6JZxad4k2zff8o5ldtQ4Bz_Fk9AMv-MK2 |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Generating+sequential+electronic+health+records+using+dual+adversarial+autoencoder&rft.jtitle=Journal+of+the+American+Medical+Informatics+Association+%3A+JAMIA&rft.au=Lee%2C+Dongha&rft.au=Yu%2C+Hwanjo&rft.au=Jiang%2C+Xiaoqian&rft.au=Rogith%2C+Deevakar&rft.date=2020-09-01&rft.eissn=1527-974X&rft.volume=27&rft.issue=9&rft.spage=1411&rft_id=info:doi/10.1093%2Fjamia%2Focaa119&rft_id=info%3Apmid%2F32989459&rft_id=info%3Apmid%2F32989459&rft.externalDocID=32989459 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1527-974X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1527-974X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1527-974X&client=summon |