PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps

The pre-training and fine-tuning paradigm has demonstrated its effectiveness and has become the standard approach for tailoring language models to various tasks. Currently, community-based platforms offer easy access to various pre-trained models, as anyone can publish without strict validation proc...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the ... ACM Conference on Computer and Communications Security Vol. 2024; p. 3511
Main Authors: Liu, Ruixuan, Wang, Tianhao, Cao, Yang, Xiong, Li
Format: Journal Article
Language:English
Published: United States 01.10.2024
Subjects:
ISSN:1543-7221, 1543-7221
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The pre-training and fine-tuning paradigm has demonstrated its effectiveness and has become the standard approach for tailoring language models to various tasks. Currently, community-based platforms offer easy access to various pre-trained models, as anyone can publish without strict validation processes. However, a released pre-trained model can be a privacy trap for fine-tuning datasets if it is carefully designed. In this work, we propose PreCurious framework to reveal the new attack surface where the attacker releases the pre-trained model and gets a black-box access to the final fine-tuned model. PreCurious aims to escalate the general privacy risk of both membership inference and data extraction on the fine-tuning dataset. The key intuition behind PreCurious is to manipulate the memorization stage of the pre-trained model and guide fine-tuning with a seemingly legitimate configuration. While empirical and theoretical evidence suggests that parameter-efficient and differentially private fine-tuning techniques can defend against privacy attacks on a fine-tuned model, PreCurious demonstrates the possibility of breaking up this invulnerability in a stealthy manner compared to fine-tuning on a benign pre-trained model. While DP provides some mitigation for membership inference attack, by further leveraging a sanitized dataset, PreCurious demonstrates potential vulnerabilities for targeted data extraction even under differentially private tuning with a strict privacy budget e.g. . Thus, PreCurious raises warnings for users on the potential risks of downloading pre-trained models from unknown sources, relying solely on tutorials or common-sense defenses, and releasing sanitized datasets even after perfect scrubbing.
AbstractList The pre-training and fine-tuning paradigm has demonstrated its effectiveness and has become the standard approach for tailoring language models to various tasks. Currently, community-based platforms offer easy access to various pre-trained models, as anyone can publish without strict validation processes. However, a released pre-trained model can be a privacy trap for fine-tuning datasets if it is carefully designed. In this work, we propose PreCurious framework to reveal the new attack surface where the attacker releases the pre-trained model and gets a black-box access to the final fine-tuned model. PreCurious aims to escalate the general privacy risk of both membership inference and data extraction on the fine-tuning dataset. The key intuition behind PreCurious is to manipulate the memorization stage of the pre-trained model and guide fine-tuning with a seemingly legitimate configuration. While empirical and theoretical evidence suggests that parameter-efficient and differentially private fine-tuning techniques can defend against privacy attacks on a fine-tuned model, PreCurious demonstrates the possibility of breaking up this invulnerability in a stealthy manner compared to fine-tuning on a benign pre-trained model. While DP provides some mitigation for membership inference attack, by further leveraging a sanitized dataset, PreCurious demonstrates potential vulnerabilities for targeted data extraction even under differentially private tuning with a strict privacy budget e.g. ϵ = 0.05 . Thus, PreCurious raises warnings for users on the potential risks of downloading pre-trained models from unknown sources, relying solely on tutorials or common-sense defenses, and releasing sanitized datasets even after perfect scrubbing.The pre-training and fine-tuning paradigm has demonstrated its effectiveness and has become the standard approach for tailoring language models to various tasks. Currently, community-based platforms offer easy access to various pre-trained models, as anyone can publish without strict validation processes. However, a released pre-trained model can be a privacy trap for fine-tuning datasets if it is carefully designed. In this work, we propose PreCurious framework to reveal the new attack surface where the attacker releases the pre-trained model and gets a black-box access to the final fine-tuned model. PreCurious aims to escalate the general privacy risk of both membership inference and data extraction on the fine-tuning dataset. The key intuition behind PreCurious is to manipulate the memorization stage of the pre-trained model and guide fine-tuning with a seemingly legitimate configuration. While empirical and theoretical evidence suggests that parameter-efficient and differentially private fine-tuning techniques can defend against privacy attacks on a fine-tuned model, PreCurious demonstrates the possibility of breaking up this invulnerability in a stealthy manner compared to fine-tuning on a benign pre-trained model. While DP provides some mitigation for membership inference attack, by further leveraging a sanitized dataset, PreCurious demonstrates potential vulnerabilities for targeted data extraction even under differentially private tuning with a strict privacy budget e.g. ϵ = 0.05 . Thus, PreCurious raises warnings for users on the potential risks of downloading pre-trained models from unknown sources, relying solely on tutorials or common-sense defenses, and releasing sanitized datasets even after perfect scrubbing.
The pre-training and fine-tuning paradigm has demonstrated its effectiveness and has become the standard approach for tailoring language models to various tasks. Currently, community-based platforms offer easy access to various pre-trained models, as anyone can publish without strict validation processes. However, a released pre-trained model can be a privacy trap for fine-tuning datasets if it is carefully designed. In this work, we propose PreCurious framework to reveal the new attack surface where the attacker releases the pre-trained model and gets a black-box access to the final fine-tuned model. PreCurious aims to escalate the general privacy risk of both membership inference and data extraction on the fine-tuning dataset. The key intuition behind PreCurious is to manipulate the memorization stage of the pre-trained model and guide fine-tuning with a seemingly legitimate configuration. While empirical and theoretical evidence suggests that parameter-efficient and differentially private fine-tuning techniques can defend against privacy attacks on a fine-tuned model, PreCurious demonstrates the possibility of breaking up this invulnerability in a stealthy manner compared to fine-tuning on a benign pre-trained model. While DP provides some mitigation for membership inference attack, by further leveraging a sanitized dataset, PreCurious demonstrates potential vulnerabilities for targeted data extraction even under differentially private tuning with a strict privacy budget e.g. . Thus, PreCurious raises warnings for users on the potential risks of downloading pre-trained models from unknown sources, relying solely on tutorials or common-sense defenses, and releasing sanitized datasets even after perfect scrubbing.
Author Xiong, Li
Liu, Ruixuan
Cao, Yang
Wang, Tianhao
Author_xml – sequence: 1
  givenname: Ruixuan
  surname: Liu
  fullname: Liu, Ruixuan
  organization: Emory University, Atlanta, USA
– sequence: 2
  givenname: Tianhao
  surname: Wang
  fullname: Wang, Tianhao
  organization: University of Virginia, Charlottesville, USA
– sequence: 3
  givenname: Yang
  surname: Cao
  fullname: Cao, Yang
  organization: Tokyo Institute of Technology, Tokyo, Japan
– sequence: 4
  givenname: Li
  surname: Xiong
  fullname: Xiong, Li
  organization: Emory University, Atlanta, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/40401199$$D View this record in MEDLINE/PubMed
BookMark eNpNkM1LxDAUxIOsuB969iY5eumazzbxJkXdhYoe1nNJm9el0k1q0ir731twBWFgHsyPgTdLNHPeAULXlKwpFfKOp1KlQqx5qgnL9BlaUCl4kjFGZ__uOVrG-EEI1ULzCzQXRBBKtV6g4i1APobWj_Eeb_w33jrna3ADnoJkF0zrwOLCuP1o9oBfvIUu4t0YHG7d4Ceq_TL1EU9kHy_ReWO6CFcnX6H3p8ddvkmK1-dt_lAkhisyJE0GVvHaKmk5kTUAyRrFrMissEpAUwEHpoHIxqSVtBZSybIKdEWVJFXN2Qrd_vb2wX-OEIfy0MYaus44mB4pOSOp1JP0hN6c0LE6gC370B5MOJZ_C7AfC8lfHQ
ContentType Journal Article
DBID NPM
7X8
DOI 10.1145/3658644.3690279
DatabaseName PubMed
MEDLINE - Academic
DatabaseTitle PubMed
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
PubMed
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Computer Science
EISSN 1543-7221
ExternalDocumentID 40401199
Genre Journal Article
GrantInformation_xml – fundername: NIEHS NIH HHS
  grantid: R01 ES033241
– fundername: NLM NIH HHS
  grantid: R01 LM013712
GroupedDBID -~X
123
29O
ABPPZ
AFFNX
ALMA_UNASSIGNED_HOLDINGS
ASPBG
AVWKF
NPM
YZZ
7X8
ID FETCH-LOGICAL-a380t-f7ed83cd85d305cee07f82d47d4d84efbe3e29e05fa6b5dde6527be9b1850bc32
IEDL.DBID 7X8
ISICitedReferencesCount 3
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001436367300239&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1543-7221
IngestDate Thu May 22 17:35:33 EDT 2025
Sat May 24 01:33:54 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Keywords Pre-Training
Language Model
Privacy Attack
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a380t-f7ed83cd85d305cee07f82d47d4d84efbe3e29e05fa6b5dde6527be9b1850bc32
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://pubmed.ncbi.nlm.nih.gov/PMC12094715
PMID 40401199
PQID 3206596599
PQPubID 23479
ParticipantIDs proquest_miscellaneous_3206596599
pubmed_primary_40401199
PublicationCentury 2000
PublicationDate 20241001
PublicationDateYYYYMMDD 2024-10-01
PublicationDate_xml – month: 10
  year: 2024
  text: 20241001
  day: 1
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Proceedings of the ... ACM Conference on Computer and Communications Security
PublicationTitleAlternate Conf Comput Commun Secur
PublicationYear 2024
SSID ssj0019493
Score 2.5136359
Snippet The pre-training and fine-tuning paradigm has demonstrated its effectiveness and has become the standard approach for tailoring language models to various...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 3511
Title PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps
URI https://www.ncbi.nlm.nih.gov/pubmed/40401199
https://www.proquest.com/docview/3206596599
Volume 2024
WOSCitedRecordID wos001436367300239&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEF7UevBifVtfrOB1bbK7ySZeRIqlQi09VOgt7GYnUJCkNrXiv3c2SelNBCEkh2xCMpnHl5lhPkLuUEeU8YxhCqMJkyZLWcQl2pUJAOM3CCV0RTahRqNoOo3HTcKtbNoq1z6xctS2SF2OvCu4qwDiFj_OP5hjjXLV1YZCY5u0BEIZZ5hquqkixLIauosoQTDFud-M9vFl0BUYeREK3Av8O-TqF3xZxZl--79PeED2G4RJn2qVOCRbkB-R9pq9gTbGfEyG4wX0PheuB_aBDoov-pLnhevVpHiCTRx1BFg6bPKZ1JGmvZd0gnens3xZ4KrZSqffFFfOyxPy1n-e9Aas4VZgWkTekmUKbCRSGwUWLR4jpaeyiFuprLSRhMyAAB6DF2Q6NAH6wDDAjwqxwfjumVTwU7KTFzmcE8o9DTIGY0yoZIhHDZFOQ-0DD40NoENu1_JKUHddQULngC-XbCTWIWe10JN5PWQjkehdfD-OL_5w9SXZ44g16h67K9LK0HLhmuymq-WsXNxUSoH70fj1Bxxlwhw
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=PreCurious%3A+How+Innocent+Pre-Trained+Language+Models+Turn+into+Privacy+Traps&rft.jtitle=Proceedings+of+the+...+ACM+Conference+on+Computer+and+Communications+Security&rft.au=Liu%2C+Ruixuan&rft.au=Wang%2C+Tianhao&rft.au=Cao%2C+Yang&rft.au=Xiong%2C+Li&rft.date=2024-10-01&rft.eissn=1543-7221&rft.volume=2024&rft.spage=3511&rft_id=info:doi/10.1145%2F3658644.3690279&rft_id=info%3Apmid%2F40401199&rft_id=info%3Apmid%2F40401199&rft.externalDocID=40401199
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1543-7221&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1543-7221&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1543-7221&client=summon