Privacy-, linguistic-, and information-preserving synthesis of clinical documentation through generative agents

Uloženo v:
Podrobná bibliografie
Název: Privacy-, linguistic-, and information-preserving synthesis of clinical documentation through generative agents
Autoři: Velzen,van, M., Willigen,van der, R.F., et al.
Přispěvatelé: Kenniscentrum Zorginnovatie, Hogeschool Rotterdam
Zdroj: Frontiers in Artificial Intelligence - https://doi.org/10.3389/frai.2025.1644084. 8 (2025)
Informace o vydavateli: Hogeschool Rotterdam, 2025.
Frontiers.
Rok vydání: 2025
Témata: artikel tijdschrift, healthcare, data synthesis, privacy, generative agents, linguistics, information theory, synthetic health data generation (SHDG), clinical natural language processing (NLP)
Popis: The widespread adoption of generative agents (GAs) is reshaping the healthcare landscape. Nonetheless, broad utilization is impeded by restricted access to high-quality, interoperable clinical documentation from electronic health records (EHRs) due to persistent legal, ethical, and technical barriers. Synthetic health data generation (SHDG), leveraging pre-trained large language models (LLMs) instantiated as GAs, could offer a practical solution by creating synthetic patient information that mimics genuine EHRs. The use of LLMs, however, is not without issues; significant concerns remain regarding privacy, potential bias propagation, the risk of generating inaccurate or misleading content, and the lack of transparency in how these models make decisions. We therefore propose a privacy-, linguistic-, and information-preserving SHDG protocol that employs multiple context-aware, role-specific GAs. Guided by targeted prompting and authentic EHRs—serving as structural and linguistic templates—role-specific GAs can, in principle, operate collaboratively through multi-turn interactions. We theorized that utilizing GAs in this fashion permits LLMs not only to produce synthetic EHRs that are accurate, consistent, and contextually appropriate, but also to expose the underlying decision-making process. To test this hypothesis, we developed a no-code GA-driven SHDG workflow as a proof of concept, which was implemented within a predefined, multi-layered data science infrastructure (DSI) stack—an integrated ensemble of software and hardware designed to support rapid prototyping and deployment. The DSI stack streamlines implementation for healthcare professionals, improving accessibility, usability, and cybersecurity. To deploy and validate GA-assisted workflows, we implemented a fully automated SHDG evaluation framework—co-developed with GenAI technology—which holistically compares the informational and linguistic features of synthetic, anonymized, and real EHRs at both the document and corpus levels. Our findings highlight that SHDG implemented through GAs offers a scalable, transparent, and reproducible methodology for unlocking the potential of clinical documentation to drive innovation, accelerate research, and advance the development of learning health systems. The source code, synthetic datasets, toolchains and prompts created for this study can be accessed at the GitHub repository: github.com/HR-DataLab-Healthcare/RESEARCH_SUPPORT/tree/main/PROJECTS/Generative_Agent_based_Data-Synthesis.
Druh dokumentu: article
Jazyk: English
Přístupová URL adresa: https://surfsharekit.nl/public/41ff936d-138f-4945-a9cf-df6f0054faad
https://surfsharekit.nl/link/095a8a17-5c7f-4b6b-9e40-a45799fafa34
Dostupnost: http://www.hbo-kennisbank.nl/en/page/hborecord.view/?uploadId=sharekit_hr:oai:surfsharekit.nl:41ff936d-138f-4945-a9cf-df6f0054faad
Přístupové číslo: edshbo.sharekit.hr.oai.surfsharekit.nl.41ff936d.138f.4945.a9cf.df6f0054faad
Databáze: HBO Kennisbank
FullText Text:
  Availability: 0
CustomLinks:
  – Url: https://surfsharekit.nl/public/41ff936d-138f-4945-a9cf-df6f0054faad#
    Name: EDS - HBO Kennisbank (s4221598)
    Category: fullText
    Text: View record at HBO Kennisbank
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Velzen%20vM
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: edshbo
DbLabel: HBO Kennisbank
An: edshbo.sharekit.hr.oai.surfsharekit.nl.41ff936d.138f.4945.a9cf.df6f0054faad
RelevancyScore: 1114
AccessLevel: 3
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 1114.33532714844
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Privacy-, linguistic-, and information-preserving synthesis of clinical documentation through generative agents
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Velzen%2Cvan%2C+M%2E%22">Velzen,van, M.</searchLink><br /><searchLink fieldCode="AR" term="%22Willigen%2Cvan+der%2C+R%2EF%2E%22">Willigen,van der, R.F.</searchLink><br /><searchLink fieldCode="AR" term="%22et+al%2E%22">et al.</searchLink>
– Name: Author
  Label: Contributors
  Group: Au
  Data: Kenniscentrum Zorginnovatie, Hogeschool Rotterdam
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <i>Frontiers in Artificial Intelligence - https://doi.org/10.3389/frai.2025.1644084</i>. 8 (2025)
– Name: Publisher
  Label: Publisher Information
  Group: PubInfo
  Data: Hogeschool Rotterdam, 2025.<br />Frontiers.
– Name: DatePubCY
  Label: Publication Year
  Group: Date
  Data: 2025
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22artikel+tijdschrift%22">artikel tijdschrift</searchLink><br /><searchLink fieldCode="DE" term="%22healthcare%22">healthcare</searchLink><br /><searchLink fieldCode="DE" term="%22data+synthesis%22">data synthesis</searchLink><br /><searchLink fieldCode="DE" term="%22privacy%22">privacy</searchLink><br /><searchLink fieldCode="DE" term="%22generative+agents%22">generative agents</searchLink><br /><searchLink fieldCode="DE" term="%22linguistics%22">linguistics</searchLink><br /><searchLink fieldCode="DE" term="%22information+theory%22">information theory</searchLink><br /><searchLink fieldCode="DE" term="%22synthetic+health+data+generation+%28SHDG%29%22">synthetic health data generation (SHDG)</searchLink><br /><searchLink fieldCode="DE" term="%22clinical+natural+language+processing+%28NLP%29%22">clinical natural language processing (NLP)</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: The widespread adoption of generative agents (GAs) is reshaping the healthcare landscape. Nonetheless, broad utilization is impeded by restricted access to high-quality, interoperable clinical documentation from electronic health records (EHRs) due to persistent legal, ethical, and technical barriers. Synthetic health data generation (SHDG), leveraging pre-trained large language models (LLMs) instantiated as GAs, could offer a practical solution by creating synthetic patient information that mimics genuine EHRs. The use of LLMs, however, is not without issues; significant concerns remain regarding privacy, potential bias propagation, the risk of generating inaccurate or misleading content, and the lack of transparency in how these models make decisions. We therefore propose a privacy-, linguistic-, and information-preserving SHDG protocol that employs multiple context-aware, role-specific GAs. Guided by targeted prompting and authentic EHRs—serving as structural and linguistic templates—role-specific GAs can, in principle, operate collaboratively through multi-turn interactions. We theorized that utilizing GAs in this fashion permits LLMs not only to produce synthetic EHRs that are accurate, consistent, and contextually appropriate, but also to expose the underlying decision-making process. To test this hypothesis, we developed a no-code GA-driven SHDG workflow as a proof of concept, which was implemented within a predefined, multi-layered data science infrastructure (DSI) stack—an integrated ensemble of software and hardware designed to support rapid prototyping and deployment. The DSI stack streamlines implementation for healthcare professionals, improving accessibility, usability, and cybersecurity. To deploy and validate GA-assisted workflows, we implemented a fully automated SHDG evaluation framework—co-developed with GenAI technology—which holistically compares the informational and linguistic features of synthetic, anonymized, and real EHRs at both the document and corpus levels. Our findings highlight that SHDG implemented through GAs offers a scalable, transparent, and reproducible methodology for unlocking the potential of clinical documentation to drive innovation, accelerate research, and advance the development of learning health systems. The source code, synthetic datasets, toolchains and prompts created for this study can be accessed at the GitHub repository: github.com/HR-DataLab-Healthcare/RESEARCH_SUPPORT/tree/main/PROJECTS/Generative_Agent_based_Data-Synthesis.
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: article
– Name: Language
  Label: Language
  Group: Lang
  Data: English
– Name: URL
  Label: Access URL
  Group: URL
  Data: <link linkTarget="URL" linkTerm="https://surfsharekit.nl/public/41ff936d-138f-4945-a9cf-df6f0054faad" linkWindow="_blank">https://surfsharekit.nl/public/41ff936d-138f-4945-a9cf-df6f0054faad</link><br /><link linkTarget="URL" linkTerm="https://surfsharekit.nl/link/095a8a17-5c7f-4b6b-9e40-a45799fafa34" linkWindow="_blank">https://surfsharekit.nl/link/095a8a17-5c7f-4b6b-9e40-a45799fafa34</link>
– Name: URL
  Label: Availability
  Group: URL
  Data: http://www.hbo-kennisbank.nl/en/page/hborecord.view/?uploadId=sharekit_hr:oai:surfsharekit.nl:41ff936d-138f-4945-a9cf-df6f0054faad
– Name: AN
  Label: Accession Number
  Group: ID
  Data: edshbo.sharekit.hr.oai.surfsharekit.nl.41ff936d.138f.4945.a9cf.df6f0054faad
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edshbo&AN=edshbo.sharekit.hr.oai.surfsharekit.nl.41ff936d.138f.4945.a9cf.df6f0054faad
RecordInfo BibRecord:
  BibEntity:
    Languages:
      – Text: English
    Subjects:
      – SubjectFull: artikel tijdschrift
        Type: general
      – SubjectFull: healthcare
        Type: general
      – SubjectFull: data synthesis
        Type: general
      – SubjectFull: privacy
        Type: general
      – SubjectFull: generative agents
        Type: general
      – SubjectFull: linguistics
        Type: general
      – SubjectFull: information theory
        Type: general
      – SubjectFull: synthetic health data generation (SHDG)
        Type: general
      – SubjectFull: clinical natural language processing (NLP)
        Type: general
    Titles:
      – TitleFull: Privacy-, linguistic-, and information-preserving synthesis of clinical documentation through generative agents
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Velzen,van, M.
      – PersonEntity:
          Name:
            NameFull: Willigen,van der, R.F.
      – PersonEntity:
          Name:
            NameFull: et al.
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 16
              M: 09
              Type: published
              Y: 2025
          Identifiers:
            – Type: issn-locals
              Value: edshbo.open
          Numbering:
            – Type: volume
              Value: 8 (2025)
          Titles:
            – TitleFull: Frontiers in Artificial Intelligence - https://doi.org/10.3389/frai.2025.1644084
              Type: main
ResultId 1