Generating sequential electronic health records using dual adversarial autoencoder

Recent studies on electronic health records (EHRs) started to learn deep generative models and synthesize a huge amount of realistic records, in order to address significant privacy issues surrounding the EHR. However, most of them only focus on structured records about patients' independent vi...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of the American Medical Informatics Association : JAMIA Ročník 27; číslo 9; s. 1411
Hlavní autoři: Lee, Dongha, Yu, Hwanjo, Jiang, Xiaoqian, Rogith, Deevakar, Gudala, Meghana, Tejani, Mubeen, Zhang, Qiuchen, Xiong, Li
Médium: Journal Article
Jazyk:angličtina
Vydáno: England 01.09.2020
Témata:
ISSN:1527-974X, 1527-974X
On-line přístup:Zjistit podrobnosti o přístupu
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Recent studies on electronic health records (EHRs) started to learn deep generative models and synthesize a huge amount of realistic records, in order to address significant privacy issues surrounding the EHR. However, most of them only focus on structured records about patients' independent visits, rather than on chronological clinical records. In this article, we aim to learn and synthesize realistic sequences of EHRs based on the generative autoencoder. We propose a dual adversarial autoencoder (DAAE), which learns set-valued sequences of medical entities, by combining a recurrent autoencoder with 2 generative adversarial networks (GANs). DAAE improves the mode coverage and quality of generated sequences by adversarially learning both the continuous latent distribution and the discrete data distribution. Using the MIMIC-III (Medical Information Mart for Intensive Care-III) and UT Physicians clinical databases, we evaluated the performances of DAAE in terms of predictive modeling, plausibility, and privacy preservation. Our generated sequences of EHRs showed the comparable performances to real data for a predictive modeling task, and achieved the best score in plausibility evaluation conducted by medical experts among all baseline models. In addition, differentially private optimization of our model enables to generate synthetic sequences without increasing the privacy leakage of patients' data. DAAE can effectively synthesize sequential EHRs by addressing its main challenges: the synthetic records should be realistic enough not to be distinguished from the real records, and they should cover all the training patients to reproduce the performance of specific downstream tasks.
AbstractList Recent studies on electronic health records (EHRs) started to learn deep generative models and synthesize a huge amount of realistic records, in order to address significant privacy issues surrounding the EHR. However, most of them only focus on structured records about patients' independent visits, rather than on chronological clinical records. In this article, we aim to learn and synthesize realistic sequences of EHRs based on the generative autoencoder.OBJECTIVERecent studies on electronic health records (EHRs) started to learn deep generative models and synthesize a huge amount of realistic records, in order to address significant privacy issues surrounding the EHR. However, most of them only focus on structured records about patients' independent visits, rather than on chronological clinical records. In this article, we aim to learn and synthesize realistic sequences of EHRs based on the generative autoencoder.We propose a dual adversarial autoencoder (DAAE), which learns set-valued sequences of medical entities, by combining a recurrent autoencoder with 2 generative adversarial networks (GANs). DAAE improves the mode coverage and quality of generated sequences by adversarially learning both the continuous latent distribution and the discrete data distribution. Using the MIMIC-III (Medical Information Mart for Intensive Care-III) and UT Physicians clinical databases, we evaluated the performances of DAAE in terms of predictive modeling, plausibility, and privacy preservation.MATERIALS AND METHODSWe propose a dual adversarial autoencoder (DAAE), which learns set-valued sequences of medical entities, by combining a recurrent autoencoder with 2 generative adversarial networks (GANs). DAAE improves the mode coverage and quality of generated sequences by adversarially learning both the continuous latent distribution and the discrete data distribution. Using the MIMIC-III (Medical Information Mart for Intensive Care-III) and UT Physicians clinical databases, we evaluated the performances of DAAE in terms of predictive modeling, plausibility, and privacy preservation.Our generated sequences of EHRs showed the comparable performances to real data for a predictive modeling task, and achieved the best score in plausibility evaluation conducted by medical experts among all baseline models. In addition, differentially private optimization of our model enables to generate synthetic sequences without increasing the privacy leakage of patients' data.RESULTSOur generated sequences of EHRs showed the comparable performances to real data for a predictive modeling task, and achieved the best score in plausibility evaluation conducted by medical experts among all baseline models. In addition, differentially private optimization of our model enables to generate synthetic sequences without increasing the privacy leakage of patients' data.DAAE can effectively synthesize sequential EHRs by addressing its main challenges: the synthetic records should be realistic enough not to be distinguished from the real records, and they should cover all the training patients to reproduce the performance of specific downstream tasks.CONCLUSIONSDAAE can effectively synthesize sequential EHRs by addressing its main challenges: the synthetic records should be realistic enough not to be distinguished from the real records, and they should cover all the training patients to reproduce the performance of specific downstream tasks.
Recent studies on electronic health records (EHRs) started to learn deep generative models and synthesize a huge amount of realistic records, in order to address significant privacy issues surrounding the EHR. However, most of them only focus on structured records about patients' independent visits, rather than on chronological clinical records. In this article, we aim to learn and synthesize realistic sequences of EHRs based on the generative autoencoder. We propose a dual adversarial autoencoder (DAAE), which learns set-valued sequences of medical entities, by combining a recurrent autoencoder with 2 generative adversarial networks (GANs). DAAE improves the mode coverage and quality of generated sequences by adversarially learning both the continuous latent distribution and the discrete data distribution. Using the MIMIC-III (Medical Information Mart for Intensive Care-III) and UT Physicians clinical databases, we evaluated the performances of DAAE in terms of predictive modeling, plausibility, and privacy preservation. Our generated sequences of EHRs showed the comparable performances to real data for a predictive modeling task, and achieved the best score in plausibility evaluation conducted by medical experts among all baseline models. In addition, differentially private optimization of our model enables to generate synthetic sequences without increasing the privacy leakage of patients' data. DAAE can effectively synthesize sequential EHRs by addressing its main challenges: the synthetic records should be realistic enough not to be distinguished from the real records, and they should cover all the training patients to reproduce the performance of specific downstream tasks.
Author Yu, Hwanjo
Rogith, Deevakar
Xiong, Li
Gudala, Meghana
Tejani, Mubeen
Zhang, Qiuchen
Jiang, Xiaoqian
Lee, Dongha
Author_xml – sequence: 1
  givenname: Dongha
  surname: Lee
  fullname: Lee, Dongha
  organization: Department of Computer Science and Engineering, Pohang University of Science and Technology, Pohang, South Korea
– sequence: 2
  givenname: Hwanjo
  surname: Yu
  fullname: Yu, Hwanjo
  organization: Department of Computer Science and Engineering, Pohang University of Science and Technology, Pohang, South Korea
– sequence: 3
  givenname: Xiaoqian
  surname: Jiang
  fullname: Jiang, Xiaoqian
  organization: School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
– sequence: 4
  givenname: Deevakar
  surname: Rogith
  fullname: Rogith, Deevakar
  organization: School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
– sequence: 5
  givenname: Meghana
  surname: Gudala
  fullname: Gudala, Meghana
  organization: School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
– sequence: 6
  givenname: Mubeen
  surname: Tejani
  fullname: Tejani, Mubeen
  organization: School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
– sequence: 7
  givenname: Qiuchen
  surname: Zhang
  fullname: Zhang, Qiuchen
  organization: Department of Computer Science, Emory University, Atlanta, Georgia, USA
– sequence: 8
  givenname: Li
  surname: Xiong
  fullname: Xiong, Li
  organization: Department of Computer Science, Emory University, Atlanta, Georgia, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/32989459$$D View this record in MEDLINE/PubMed
BookMark eNpNkE1LxDAYhIOsuB969Co9eqmbj6ZpjrLoKiwIouCtvE3eul3aZE1SwX_viit4mjk8MwwzJxPnHRJyyegNo1osdzB0sPQGgDF9QmZMcpVrVbxN_vkpmce4o5SVXMgzMhVcV7qQekae1-gwQOrcexbxY0SXOugz7NGk4F1nsi1Cn7ZZQOODjdkYf1A7HiCwnxgihJ8AjMmjM95iOCenLfQRL466IK_3dy-rh3zztH5c3W5yU3Caci40IOXStNbIQijGdcMqFJI3bWmN4Iq1FdDSlqAR26pBLA1YVQEvtVTAF-T6t3cf_GF4TPXQRYN9Dw79GGteFEowpkR1QK-O6NgMaOt96AYIX_XfD_wb8g9kRQ
CitedBy_id crossref_primary_10_1055_s_0042_1760247
crossref_primary_10_1093_jamia_ocad166
crossref_primary_10_1093_jamia_ocac131
crossref_primary_10_3389_fgene_2025_1451290
crossref_primary_10_1038_s41746_024_01409_w
crossref_primary_10_2196_53008
crossref_primary_10_1016_j_cosrev_2023_100546
crossref_primary_10_1038_s41746_023_00834_7
crossref_primary_10_1145_3636424
crossref_primary_10_1186_s12874_024_02304_4
crossref_primary_10_3390_biomedicines11061749
crossref_primary_10_1016_j_ijmedinf_2024_105413
crossref_primary_10_1016_j_neucom_2022_04_053
crossref_primary_10_2196_68830
crossref_primary_10_1016_j_jbi_2021_103977
crossref_primary_10_1109_TKDE_2023_3310909
crossref_primary_10_1007_s41060_024_00653_3
crossref_primary_10_1109_ACCESS_2024_3523330
crossref_primary_10_1038_s41598_022_07545_1
crossref_primary_10_1007_s10726_024_09902_z
crossref_primary_10_1093_jamia_ocab135
crossref_primary_10_1093_jamia_ocaa187
crossref_primary_10_1145_3614425
crossref_primary_10_1016_j_cmpb_2024_108571
crossref_primary_10_1007_s10462_023_10624_y
crossref_primary_10_1093_jamia_ocaf082
crossref_primary_10_1016_j_compbiomed_2023_107655
crossref_primary_10_1016_j_neucom_2024_128253
crossref_primary_10_1145_3459992
crossref_primary_10_3390_s23146571
ContentType Journal Article
Copyright The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Copyright_xml – notice: The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1093/jamia/ocaa119
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Medicine
EISSN 1527-974X
ExternalDocumentID 32989459
Genre Research Support, Non-U.S. Gov't
Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NIGMS NIH HHS
  grantid: R01 GM114612
– fundername: NIGMS NIH HHS
  grantid: R01 GM118574
– fundername: NCATS NIH HHS
  grantid: U01 TR002062
GroupedDBID ---
.DC
0R~
18M
29L
2WC
4.4
48X
53G
5GY
5RE
5WD
6PF
7~T
AABZA
AACZT
AAMVS
AAOGV
AAPQZ
AAPXW
AARHZ
AAUAY
AAUQX
AAVAP
AAWTL
ABDFA
ABEJV
ABEUO
ABGNP
ABIXL
ABJNI
ABNHQ
ABOCM
ABPQP
ABPTD
ABQLI
ABQNK
ABVGC
ABWST
ABXVV
ACGFO
ACGFS
ACGOD
ACHQT
ACUFI
ACUTJ
ACYHN
ADBBV
ADGZP
ADHKW
ADHZD
ADIPN
ADNBA
ADQBN
ADRTK
ADVEK
ADYVW
AEGPL
AEJOX
AEKSI
AEMDU
AEMQT
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFZL
AFIYH
AFOFC
AFXAL
AFYAG
AGINJ
AGORE
AGQXC
AGSYK
AGUTN
AHGBF
AHMBA
AHMMS
AJBYB
AJEEA
AJNCP
ALIPV
ALMA_UNASSIGNED_HOLDINGS
ALUQC
ALXQX
APIBT
ATGXG
AVWKF
AXUDD
AYCSE
BAWUL
BAYMD
BCRHZ
BEYMZ
BHONS
BTRTY
BVRKM
C45
CDBKE
CGR
CS3
CUY
CVF
DAKXR
DIK
DILTD
DU5
E3Z
EBD
EBS
ECM
EIF
EMOBN
ENERS
F5P
FDB
FECEO
FLUFQ
FOEOM
FOTVD
FQBLK
G-Q
GAUVT
GJXCC
GX1
H13
HAR
IH2
IHE
J21
JXSIZ
KBUDW
KOP
KSI
KSN
LSO
MHKGH
NOMLY
NOYVH
NPM
NQ-
O9-
OAUYM
OAWHX
OCZFY
ODMLO
OJQWA
OJZSN
OK1
OPAEJ
OVD
OWPYF
P2P
PAFKI
PEELM
Q5Y
ROX
ROZ
RPM
RPZ
RUSNO
RWL
RXO
SV3
TAE
TEORI
TJX
TMA
WOW
YAYTL
YKOAZ
YXANX
~S-
77I
7X8
ID FETCH-LOGICAL-c420t-239ae025cfdc5437129b18e352bf6dc3271f8a06d6a9eef8bee6cad78a26957a2
IEDL.DBID 7X8
ISICitedReferencesCount 40
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000593113300010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1527-974X
IngestDate Sat Sep 27 21:25:56 EDT 2025
Mon Jul 21 05:51:37 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 9
Keywords sequential data generation
generative autoencoder
generative adversarial networks (GANs)
electornic health records (EHRs)
differential privacy
Language English
License The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c420t-239ae025cfdc5437129b18e352bf6dc3271f8a06d6a9eef8bee6cad78a26957a2
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://www.ncbi.nlm.nih.gov/pmc/articles/7647348
PMID 32989459
PQID 2447311738
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2447311738
pubmed_primary_32989459
PublicationCentury 2000
PublicationDate 2020-09-01
PublicationDateYYYYMMDD 2020-09-01
PublicationDate_xml – month: 09
  year: 2020
  text: 2020-09-01
  day: 01
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Journal of the American Medical Informatics Association : JAMIA
PublicationTitleAlternate J Am Med Inform Assoc
PublicationYear 2020
SSID ssj0016235
Score 2.5553613
Snippet Recent studies on electronic health records (EHRs) started to learn deep generative models and synthesize a huge amount of realistic records, in order to...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 1411
SubjectTerms Computer Simulation
Confidentiality
Electronic Health Records
Humans
Machine Learning
Neural Networks, Computer
Software
Title Generating sequential electronic health records using dual adversarial autoencoder
URI https://www.ncbi.nlm.nih.gov/pubmed/32989459
https://www.proquest.com/docview/2447311738
Volume 27
WOSCitedRecordID wos000593113300010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEA7qinjx_VhfRPAatk2apjmJiIsHd1lEpbeSV0WQdt3u-vudNF32JoKX3krLMJnvyzy-QegGQNIwqymRqc5IkiaG6JQJIimNdQn4FMbH3p7EeJzluZx0Cbema6tcxsQ2UNva-Bz5AGBIsDgWLLudfhG_NcpXV7sVGuuox4DK-JYuka-qCADtvNVLpfBxkeSdxiZc4r3q0IcaAFyoOP6FXbYoM9z97__toZ2OX-K74BD7aM1VB2hr1FXQD9Fz0Jn2zc44tFHDEf_Eq204OExG4pC9abBvjH_HfmILK7-8uVHeZbFazGuvgWnd7Ai9Dh9e7h9Jt1eBmIRGc0KZVA64jimt4QkTAPk6zhxQMV2m1jAq4jJTUWpTJZ0rM-1capQVmaKp5ELRY7RR1ZU7RVjGiY6MoJHmFqgIV47LzJnSQSgwius-ul5aqwC_9cUIVbl60RQre_XRSTB5MQ0CGwVrZeG5PPvD2-dom_orcNv2dYF6JZxad4k2zff8o5ldtQ4Bz_Fk9AMv-MK2
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Generating+sequential+electronic+health+records+using+dual+adversarial+autoencoder&rft.jtitle=Journal+of+the+American+Medical+Informatics+Association+%3A+JAMIA&rft.au=Lee%2C+Dongha&rft.au=Yu%2C+Hwanjo&rft.au=Jiang%2C+Xiaoqian&rft.au=Rogith%2C+Deevakar&rft.date=2020-09-01&rft.eissn=1527-974X&rft.volume=27&rft.issue=9&rft.spage=1411&rft_id=info:doi/10.1093%2Fjamia%2Focaa119&rft_id=info%3Apmid%2F32989459&rft_id=info%3Apmid%2F32989459&rft.externalDocID=32989459
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1527-974X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1527-974X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1527-974X&client=summon