Balancing data privacy and usability in the federal statistical system

The federal statistical system is experiencing competing pressures for change. On the one hand, for confidentiality reasons, much socially valuable data currently held by federal agencies is either not made available to researchers at all or only made available under onerous conditions. On the other...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the National Academy of Sciences - PNAS Vol. 119; no. 31; p. e2104906119
Main Authors: Hotz, V Joseph, Bollinger, Christopher R, Komarova, Tatiana, Manski, Charles F, Moffitt, Robert A, Nekipelov, Denis, Sojourner, Aaron, Spencer, Bruce D
Format: Journal Article
Language:English
Published: United States 02.08.2022
Subjects:
ISSN:1091-6490, 1091-6490
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The federal statistical system is experiencing competing pressures for change. On the one hand, for confidentiality reasons, much socially valuable data currently held by federal agencies is either not made available to researchers at all or only made available under onerous conditions. On the other hand, agencies which release public databases face new challenges in protecting the privacy of the subjects in those databases, which leads them to consider releasing fewer data or masking the data in ways that will reduce their accuracy. In this essay, we argue that the discussion has not given proper consideration to the reduced social benefits of data availability and their usability relative to the value of increased levels of privacy protection. A more balanced benefit-cost framework should be used to assess these trade-offs. We express concerns both with synthetic data methods for disclosure limitation, which will reduce the types of research that can be reliably conducted in unknown ways, and with differential privacy criteria that use what we argue is an inappropriate measure of disclosure risk. We recommend that the measure of disclosure risk used to assess all disclosure protection methods focus on what we believe is the risk that individuals should care about, that more study of the impact of differential privacy criteria and synthetic data methods on data usability for research be conducted before either is put into widespread use, and that more research be conducted on alternative methods of disclosure risk reduction that better balance benefits and costs.
AbstractList The federal statistical system is experiencing competing pressures for change. On the one hand, for confidentiality reasons, much socially valuable data currently held by federal agencies is either not made available to researchers at all or only made available under onerous conditions. On the other hand, agencies which release public databases face new challenges in protecting the privacy of the subjects in those databases, which leads them to consider releasing fewer data or masking the data in ways that will reduce their accuracy. In this essay, we argue that the discussion has not given proper consideration to the reduced social benefits of data availability and their usability relative to the value of increased levels of privacy protection. A more balanced benefit-cost framework should be used to assess these trade-offs. We express concerns both with synthetic data methods for disclosure limitation, which will reduce the types of research that can be reliably conducted in unknown ways, and with differential privacy criteria that use what we argue is an inappropriate measure of disclosure risk. We recommend that the measure of disclosure risk used to assess all disclosure protection methods focus on what we believe is the risk that individuals should care about, that more study of the impact of differential privacy criteria and synthetic data methods on data usability for research be conducted before either is put into widespread use, and that more research be conducted on alternative methods of disclosure risk reduction that better balance benefits and costs.
The federal statistical system is experiencing competing pressures for change. On the one hand, for confidentiality reasons, much socially valuable data currently held by federal agencies is either not made available to researchers at all or only made available under onerous conditions. On the other hand, agencies which release public databases face new challenges in protecting the privacy of the subjects in those databases, which leads them to consider releasing fewer data or masking the data in ways that will reduce their accuracy. In this essay, we argue that the discussion has not given proper consideration to the reduced social benefits of data availability and their usability relative to the value of increased levels of privacy protection. A more balanced benefit-cost framework should be used to assess these trade-offs. We express concerns both with synthetic data methods for disclosure limitation, which will reduce the types of research that can be reliably conducted in unknown ways, and with differential privacy criteria that use what we argue is an inappropriate measure of disclosure risk. We recommend that the measure of disclosure risk used to assess all disclosure protection methods focus on what we believe is the risk that individuals should care about, that more study of the impact of differential privacy criteria and synthetic data methods on data usability for research be conducted before either is put into widespread use, and that more research be conducted on alternative methods of disclosure risk reduction that better balance benefits and costs.The federal statistical system is experiencing competing pressures for change. On the one hand, for confidentiality reasons, much socially valuable data currently held by federal agencies is either not made available to researchers at all or only made available under onerous conditions. On the other hand, agencies which release public databases face new challenges in protecting the privacy of the subjects in those databases, which leads them to consider releasing fewer data or masking the data in ways that will reduce their accuracy. In this essay, we argue that the discussion has not given proper consideration to the reduced social benefits of data availability and their usability relative to the value of increased levels of privacy protection. A more balanced benefit-cost framework should be used to assess these trade-offs. We express concerns both with synthetic data methods for disclosure limitation, which will reduce the types of research that can be reliably conducted in unknown ways, and with differential privacy criteria that use what we argue is an inappropriate measure of disclosure risk. We recommend that the measure of disclosure risk used to assess all disclosure protection methods focus on what we believe is the risk that individuals should care about, that more study of the impact of differential privacy criteria and synthetic data methods on data usability for research be conducted before either is put into widespread use, and that more research be conducted on alternative methods of disclosure risk reduction that better balance benefits and costs.
Author Manski, Charles F
Hotz, V Joseph
Bollinger, Christopher R
Nekipelov, Denis
Spencer, Bruce D
Komarova, Tatiana
Moffitt, Robert A
Sojourner, Aaron
Author_xml – sequence: 1
  givenname: V Joseph
  orcidid: 0000-0002-6958-3318
  surname: Hotz
  fullname: Hotz, V Joseph
  organization: Department of Economics, Duke University, Durham, NC 27708
– sequence: 2
  givenname: Christopher R
  orcidid: 0000-0002-3477-1028
  surname: Bollinger
  fullname: Bollinger, Christopher R
  organization: Department of Economics, University of Kentucky, Lexington, KY 40503
– sequence: 3
  givenname: Tatiana
  orcidid: 0000-0002-6581-5097
  surname: Komarova
  fullname: Komarova, Tatiana
  organization: The London School of Economics and Political Science, London WC2A 3PH, United Kingdom
– sequence: 4
  givenname: Charles F
  orcidid: 0000-0001-7260-7686
  surname: Manski
  fullname: Manski, Charles F
  organization: Department of Economics, Northwestern University, Evanston, IL 60208
– sequence: 5
  givenname: Robert A
  orcidid: 0000-0002-3627-3057
  surname: Moffitt
  fullname: Moffitt, Robert A
  organization: Department of Economics, Johns Hopkins University, Baltimore, MD 21211
– sequence: 6
  givenname: Denis
  orcidid: 0000-0003-4734-265X
  surname: Nekipelov
  fullname: Nekipelov, Denis
  organization: Department of Economics, University of Virginia, Charlottesville, VA 22904
– sequence: 7
  givenname: Aaron
  orcidid: 0000-0001-6839-2512
  surname: Sojourner
  fullname: Sojourner, Aaron
  organization: W. E. Upjohn Institute for Employment Policy, Kalamazoo, MI 49007
– sequence: 8
  givenname: Bruce D
  orcidid: 0000-0001-6155-7249
  surname: Spencer
  fullname: Spencer, Bruce D
  organization: Department of Statistics and Data Science, Northwestern University, Evanston, IL 60208
BackLink https://www.ncbi.nlm.nih.gov/pubmed/35878030$$D View this record in MEDLINE/PubMed
BookMark eNpNj0tLxDAUhYOMOA9du5Ms3XS8Sdo0WergqDDgRtflNg-NtJ2xSYX-eyuO4OqcAx8HviWZdfvOEXLJYM2gFDeHDuOaM8g1SMb0CVkw0CyT057963OyjPEDAHSh4IzMRaFKBQIWZHuHDXYmdG_UYkJ66MMXmpFiZ-kQsQ5NSCMNHU3vjnpnXY8NjQlTiCmYnz7G5Npzcuqxie7imCvyur1_2Txmu-eHp83tLjMF4ynDIleoDFdcMy-M8qrwBpRxqkThGKDkubBSSZEzbizUmDvmUejS-MJ6z1fk-vf30O8_BxdT1YZoXDM5uP0QKy51rmUJpZ7QqyM61K2z1WTWYj9Wf-78G81qXds
CitedBy_id crossref_primary_10_1073_pnas_2321882121
crossref_primary_10_1177_10775595251337073
crossref_primary_10_1109_TIFS_2025_3552033
crossref_primary_10_2478_jos_2023_0017
crossref_primary_10_1038_s41598_024_56409_3
crossref_primary_10_1073_pnas_2424655122
crossref_primary_10_1186_s40537_025_01118_5
crossref_primary_10_1016_j_ijar_2024_109242
crossref_primary_10_1007_s11113_024_09931_1
crossref_primary_10_1016_j_respol_2024_105080
crossref_primary_10_1073_pnas_2303890120
crossref_primary_10_1086_732683
crossref_primary_10_1111_padr_12580
crossref_primary_10_1126_sciadv_adt1512
crossref_primary_10_1287_mksc_2024_0901
crossref_primary_10_1073_pnas_2220558120
crossref_primary_10_1093_bioadv_vbaf046
crossref_primary_10_1086_732521
crossref_primary_10_1002_wics_1615
crossref_primary_10_1109_TETCI_2024_3500009
crossref_primary_10_1145_3735562
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1073/pnas.2104906119
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Sciences (General)
EISSN 1091-6490
ExternalDocumentID 35878030
Genre Journal Article
GroupedDBID ---
-DZ
-~X
.55
0R~
123
29P
2AX
2FS
2WC
4.4
53G
5RE
5VS
85S
AACGO
AAFWJ
AANCE
ABBHK
ABOCM
ABPLY
ABPPZ
ABTLG
ABZEH
ACGOD
ACIWK
ACNCT
ACPRK
AENEX
AEUPB
AEXZC
AFFNX
AFRAH
ALMA_UNASSIGNED_HOLDINGS
BKOMP
CGR
CS3
CUY
CVF
D0L
DCCCD
DIK
DU5
E3Z
EBS
ECM
EIF
F5P
FRP
GX1
H13
HH5
HYE
IPSME
JAAYA
JBMMH
JENOY
JHFFW
JKQEH
JLS
JLXEF
JPM
JSG
JST
KQ8
L7B
LU7
N9A
NPM
N~3
O9-
OK1
PNE
PQQKQ
R.V
RHI
RNA
RNS
RPM
RXW
SA0
SJN
TAE
TN5
UKR
W8F
WH7
WOQ
WOW
X7M
XSW
Y6R
YBH
YKV
YSK
ZCA
~02
~KM
7X8
ID FETCH-LOGICAL-c512t-a548a8c28291f3c8f85fc08ce87a3e10a6243d6863412cd0ba4e1fa397cf5dff2
IEDL.DBID 7X8
ISICitedReferencesCount 31
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000903753500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1091-6490
IngestDate Fri Sep 05 14:11:15 EDT 2025
Sat Nov 01 14:15:53 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 31
Keywords data disclosure risk
federal statistical system
data access
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c512t-a548a8c28291f3c8f85fc08ce87a3e10a6243d6863412cd0ba4e1fa397cf5dff2
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0003-4734-265X
0000-0002-6958-3318
0000-0001-6839-2512
0000-0002-6581-5097
0000-0002-3627-3057
0000-0001-7260-7686
0000-0002-3477-1028
0000-0001-6155-7249
OpenAccessLink https://pubmed.ncbi.nlm.nih.gov/PMC9351352
PMID 35878030
PQID 2694967079
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2694967079
pubmed_primary_35878030
PublicationCentury 2000
PublicationDate 2022-08-02
PublicationDateYYYYMMDD 2022-08-02
PublicationDate_xml – month: 08
  year: 2022
  text: 2022-08-02
  day: 02
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Proceedings of the National Academy of Sciences - PNAS
PublicationTitleAlternate Proc Natl Acad Sci U S A
PublicationYear 2022
SSID ssj0009580
Score 2.5993907
Snippet The federal statistical system is experiencing competing pressures for change. On the one hand, for confidentiality reasons, much socially valuable data...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage e2104906119
SubjectTerms Computer Security
Confidentiality
Data Collection
Disclosure
Federal Government
Government Agencies
Privacy
Title Balancing data privacy and usability in the federal statistical system
URI https://www.ncbi.nlm.nih.gov/pubmed/35878030
https://www.proquest.com/docview/2694967079
Volume 119
WOSCitedRecordID wos000903753500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEF7UevCi1md9sYIHPazNcx8nUbF4sfSg0FvY7AN6SWPTFvrvnUlS9CSClxACCWF2Ht_uzHxDyE0qbYoNnEw4MPKEc8NyZR3LUyPBtGQuRF4PmxDDoRyP1ag9cKvassq1T6wdtZ0aPCPvY8el4sjn9lB-MpwahdnVdoTGJunEAGWwpEuM5Q_SXdmwEaiQ8UQFa2ofEffLQlf3sN2BpzwMf8GXdZwZ7P33D_fJbosw6WOjEl2y4YoD0m1tuKK3LdH03SEZPGFdo4HgRbFQlJazyVKbFdWFpYuGene-opOCAkikHlkn4LvYgVSTO-N9TQN9RD4GL-_Pr6ydq8AMhPc507BL0dJgDjX0sZFept4E0jgpdAxLpnmUxJZLDhEuMjbIdeJCrwG5GJ9a76NjslVMC3dKqMqN07lPkFY-AWyguVCRNMonDpCftT1yvZZVBnqLyQhduOmiyr6l1SMnjcCzsiHYyOJUCgne5-wPb5-TnQg7ErCKI7ogHQ9W6y7JtlmCMGZXtULAdTh6-wJeI8AY
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Balancing+data+privacy+and+usability+in+the+federal+statistical+system&rft.jtitle=Proceedings+of+the+National+Academy+of+Sciences+-+PNAS&rft.au=Hotz%2C+V+Joseph&rft.au=Bollinger%2C+Christopher+R&rft.au=Komarova%2C+Tatiana&rft.au=Manski%2C+Charles+F&rft.date=2022-08-02&rft.eissn=1091-6490&rft.volume=119&rft.issue=31&rft.spage=e2104906119&rft_id=info:doi/10.1073%2Fpnas.2104906119&rft_id=info%3Apmid%2F35878030&rft_id=info%3Apmid%2F35878030&rft.externalDocID=35878030
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1091-6490&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1091-6490&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1091-6490&client=summon