Privacy-, linguistic-, and information-preserving synthesis of clinical documentation through generative agents
Uloženo v:
| Název: | Privacy-, linguistic-, and information-preserving synthesis of clinical documentation through generative agents |
|---|---|
| Autoři: | Velzen,van, M., Willigen,van der, R.F., et al. |
| Přispěvatelé: | Kenniscentrum Zorginnovatie, Hogeschool Rotterdam |
| Zdroj: | Frontiers in Artificial Intelligence - https://doi.org/10.3389/frai.2025.1644084. 8 (2025) |
| Informace o vydavateli: | Hogeschool Rotterdam, 2025. Frontiers. |
| Rok vydání: | 2025 |
| Témata: | artikel tijdschrift, healthcare, data synthesis, privacy, generative agents, linguistics, information theory, synthetic health data generation (SHDG), clinical natural language processing (NLP) |
| Popis: | The widespread adoption of generative agents (GAs) is reshaping the healthcare landscape. Nonetheless, broad utilization is impeded by restricted access to high-quality, interoperable clinical documentation from electronic health records (EHRs) due to persistent legal, ethical, and technical barriers. Synthetic health data generation (SHDG), leveraging pre-trained large language models (LLMs) instantiated as GAs, could offer a practical solution by creating synthetic patient information that mimics genuine EHRs. The use of LLMs, however, is not without issues; significant concerns remain regarding privacy, potential bias propagation, the risk of generating inaccurate or misleading content, and the lack of transparency in how these models make decisions. We therefore propose a privacy-, linguistic-, and information-preserving SHDG protocol that employs multiple context-aware, role-specific GAs. Guided by targeted prompting and authentic EHRs—serving as structural and linguistic templates—role-specific GAs can, in principle, operate collaboratively through multi-turn interactions. We theorized that utilizing GAs in this fashion permits LLMs not only to produce synthetic EHRs that are accurate, consistent, and contextually appropriate, but also to expose the underlying decision-making process. To test this hypothesis, we developed a no-code GA-driven SHDG workflow as a proof of concept, which was implemented within a predefined, multi-layered data science infrastructure (DSI) stack—an integrated ensemble of software and hardware designed to support rapid prototyping and deployment. The DSI stack streamlines implementation for healthcare professionals, improving accessibility, usability, and cybersecurity. To deploy and validate GA-assisted workflows, we implemented a fully automated SHDG evaluation framework—co-developed with GenAI technology—which holistically compares the informational and linguistic features of synthetic, anonymized, and real EHRs at both the document and corpus levels. Our findings highlight that SHDG implemented through GAs offers a scalable, transparent, and reproducible methodology for unlocking the potential of clinical documentation to drive innovation, accelerate research, and advance the development of learning health systems. The source code, synthetic datasets, toolchains and prompts created for this study can be accessed at the GitHub repository: github.com/HR-DataLab-Healthcare/RESEARCH_SUPPORT/tree/main/PROJECTS/Generative_Agent_based_Data-Synthesis. |
| Druh dokumentu: | article |
| Jazyk: | English |
| Přístupová URL adresa: | https://surfsharekit.nl/public/41ff936d-138f-4945-a9cf-df6f0054faad https://surfsharekit.nl/link/095a8a17-5c7f-4b6b-9e40-a45799fafa34 |
| Dostupnost: | http://www.hbo-kennisbank.nl/en/page/hborecord.view/?uploadId=sharekit_hr:oai:surfsharekit.nl:41ff936d-138f-4945-a9cf-df6f0054faad |
| Přístupové číslo: | edshbo.sharekit.hr.oai.surfsharekit.nl.41ff936d.138f.4945.a9cf.df6f0054faad |
| Databáze: | HBO Kennisbank |
| FullText | Text: Availability: 0 CustomLinks: – Url: https://surfsharekit.nl/public/41ff936d-138f-4945-a9cf-df6f0054faad# Name: EDS - HBO Kennisbank (s4221598) Category: fullText Text: View record at HBO Kennisbank – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Velzen%20vM Name: ISI Category: fullText Text: Nájsť tento článok vo Web of Science Icon: https://imagesrvr.epnet.com/ls/20docs.gif MouseOverText: Nájsť tento článok vo Web of Science |
|---|---|
| Header | DbId: edshbo DbLabel: HBO Kennisbank An: edshbo.sharekit.hr.oai.surfsharekit.nl.41ff936d.138f.4945.a9cf.df6f0054faad RelevancyScore: 1114 AccessLevel: 3 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 1114.33532714844 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Privacy-, linguistic-, and information-preserving synthesis of clinical documentation through generative agents – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Velzen%2Cvan%2C+M%2E%22">Velzen,van, M.</searchLink><br /><searchLink fieldCode="AR" term="%22Willigen%2Cvan+der%2C+R%2EF%2E%22">Willigen,van der, R.F.</searchLink><br /><searchLink fieldCode="AR" term="%22et+al%2E%22">et al.</searchLink> – Name: Author Label: Contributors Group: Au Data: Kenniscentrum Zorginnovatie, Hogeschool Rotterdam – Name: TitleSource Label: Source Group: Src Data: <i>Frontiers in Artificial Intelligence - https://doi.org/10.3389/frai.2025.1644084</i>. 8 (2025) – Name: Publisher Label: Publisher Information Group: PubInfo Data: Hogeschool Rotterdam, 2025.<br />Frontiers. – Name: DatePubCY Label: Publication Year Group: Date Data: 2025 – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22artikel+tijdschrift%22">artikel tijdschrift</searchLink><br /><searchLink fieldCode="DE" term="%22healthcare%22">healthcare</searchLink><br /><searchLink fieldCode="DE" term="%22data+synthesis%22">data synthesis</searchLink><br /><searchLink fieldCode="DE" term="%22privacy%22">privacy</searchLink><br /><searchLink fieldCode="DE" term="%22generative+agents%22">generative agents</searchLink><br /><searchLink fieldCode="DE" term="%22linguistics%22">linguistics</searchLink><br /><searchLink fieldCode="DE" term="%22information+theory%22">information theory</searchLink><br /><searchLink fieldCode="DE" term="%22synthetic+health+data+generation+%28SHDG%29%22">synthetic health data generation (SHDG)</searchLink><br /><searchLink fieldCode="DE" term="%22clinical+natural+language+processing+%28NLP%29%22">clinical natural language processing (NLP)</searchLink> – Name: Abstract Label: Description Group: Ab Data: The widespread adoption of generative agents (GAs) is reshaping the healthcare landscape. Nonetheless, broad utilization is impeded by restricted access to high-quality, interoperable clinical documentation from electronic health records (EHRs) due to persistent legal, ethical, and technical barriers. Synthetic health data generation (SHDG), leveraging pre-trained large language models (LLMs) instantiated as GAs, could offer a practical solution by creating synthetic patient information that mimics genuine EHRs. The use of LLMs, however, is not without issues; significant concerns remain regarding privacy, potential bias propagation, the risk of generating inaccurate or misleading content, and the lack of transparency in how these models make decisions. We therefore propose a privacy-, linguistic-, and information-preserving SHDG protocol that employs multiple context-aware, role-specific GAs. Guided by targeted prompting and authentic EHRs—serving as structural and linguistic templates—role-specific GAs can, in principle, operate collaboratively through multi-turn interactions. We theorized that utilizing GAs in this fashion permits LLMs not only to produce synthetic EHRs that are accurate, consistent, and contextually appropriate, but also to expose the underlying decision-making process. To test this hypothesis, we developed a no-code GA-driven SHDG workflow as a proof of concept, which was implemented within a predefined, multi-layered data science infrastructure (DSI) stack—an integrated ensemble of software and hardware designed to support rapid prototyping and deployment. The DSI stack streamlines implementation for healthcare professionals, improving accessibility, usability, and cybersecurity. To deploy and validate GA-assisted workflows, we implemented a fully automated SHDG evaluation framework—co-developed with GenAI technology—which holistically compares the informational and linguistic features of synthetic, anonymized, and real EHRs at both the document and corpus levels. Our findings highlight that SHDG implemented through GAs offers a scalable, transparent, and reproducible methodology for unlocking the potential of clinical documentation to drive innovation, accelerate research, and advance the development of learning health systems. The source code, synthetic datasets, toolchains and prompts created for this study can be accessed at the GitHub repository: github.com/HR-DataLab-Healthcare/RESEARCH_SUPPORT/tree/main/PROJECTS/Generative_Agent_based_Data-Synthesis. – Name: TypeDocument Label: Document Type Group: TypDoc Data: article – Name: Language Label: Language Group: Lang Data: English – Name: URL Label: Access URL Group: URL Data: <link linkTarget="URL" linkTerm="https://surfsharekit.nl/public/41ff936d-138f-4945-a9cf-df6f0054faad" linkWindow="_blank">https://surfsharekit.nl/public/41ff936d-138f-4945-a9cf-df6f0054faad</link><br /><link linkTarget="URL" linkTerm="https://surfsharekit.nl/link/095a8a17-5c7f-4b6b-9e40-a45799fafa34" linkWindow="_blank">https://surfsharekit.nl/link/095a8a17-5c7f-4b6b-9e40-a45799fafa34</link> – Name: URL Label: Availability Group: URL Data: http://www.hbo-kennisbank.nl/en/page/hborecord.view/?uploadId=sharekit_hr:oai:surfsharekit.nl:41ff936d-138f-4945-a9cf-df6f0054faad – Name: AN Label: Accession Number Group: ID Data: edshbo.sharekit.hr.oai.surfsharekit.nl.41ff936d.138f.4945.a9cf.df6f0054faad |
| PLink | https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edshbo&AN=edshbo.sharekit.hr.oai.surfsharekit.nl.41ff936d.138f.4945.a9cf.df6f0054faad |
| RecordInfo | BibRecord: BibEntity: Languages: – Text: English Subjects: – SubjectFull: artikel tijdschrift Type: general – SubjectFull: healthcare Type: general – SubjectFull: data synthesis Type: general – SubjectFull: privacy Type: general – SubjectFull: generative agents Type: general – SubjectFull: linguistics Type: general – SubjectFull: information theory Type: general – SubjectFull: synthetic health data generation (SHDG) Type: general – SubjectFull: clinical natural language processing (NLP) Type: general Titles: – TitleFull: Privacy-, linguistic-, and information-preserving synthesis of clinical documentation through generative agents Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Velzen,van, M. – PersonEntity: Name: NameFull: Willigen,van der, R.F. – PersonEntity: Name: NameFull: et al. IsPartOfRelationships: – BibEntity: Dates: – D: 16 M: 09 Type: published Y: 2025 Identifiers: – Type: issn-locals Value: edshbo.open Numbering: – Type: volume Value: 8 (2025) Titles: – TitleFull: Frontiers in Artificial Intelligence - https://doi.org/10.3389/frai.2025.1644084 Type: main |
| ResultId | 1 |
Nájsť tento článok vo Web of Science