Balancing data privacy and usability in the federal statistical system
The federal statistical system is experiencing competing pressures for change. On the one hand, for confidentiality reasons, much socially valuable data currently held by federal agencies is either not made available to researchers at all or only made available under onerous conditions. On the other...
Uložené v:
| Vydané v: | Proceedings of the National Academy of Sciences - PNAS Ročník 119; číslo 31; s. e2104906119 |
|---|---|
| Hlavní autori: | , , , , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
United States
02.08.2022
|
| Predmet: | |
| ISSN: | 1091-6490, 1091-6490 |
| On-line prístup: | Zistit podrobnosti o prístupe |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | The federal statistical system is experiencing competing pressures for change. On the one hand, for confidentiality reasons, much socially valuable data currently held by federal agencies is either not made available to researchers at all or only made available under onerous conditions. On the other hand, agencies which release public databases face new challenges in protecting the privacy of the subjects in those databases, which leads them to consider releasing fewer data or masking the data in ways that will reduce their accuracy. In this essay, we argue that the discussion has not given proper consideration to the reduced social benefits of data availability and their usability relative to the value of increased levels of privacy protection. A more balanced benefit-cost framework should be used to assess these trade-offs. We express concerns both with synthetic data methods for disclosure limitation, which will reduce the types of research that can be reliably conducted in unknown ways, and with differential privacy criteria that use what we argue is an inappropriate measure of disclosure risk. We recommend that the measure of disclosure risk used to assess all disclosure protection methods focus on what we believe is the risk that individuals should care about, that more study of the impact of differential privacy criteria and synthetic data methods on data usability for research be conducted before either is put into widespread use, and that more research be conducted on alternative methods of disclosure risk reduction that better balance benefits and costs. |
|---|---|
| AbstractList | The federal statistical system is experiencing competing pressures for change. On the one hand, for confidentiality reasons, much socially valuable data currently held by federal agencies is either not made available to researchers at all or only made available under onerous conditions. On the other hand, agencies which release public databases face new challenges in protecting the privacy of the subjects in those databases, which leads them to consider releasing fewer data or masking the data in ways that will reduce their accuracy. In this essay, we argue that the discussion has not given proper consideration to the reduced social benefits of data availability and their usability relative to the value of increased levels of privacy protection. A more balanced benefit-cost framework should be used to assess these trade-offs. We express concerns both with synthetic data methods for disclosure limitation, which will reduce the types of research that can be reliably conducted in unknown ways, and with differential privacy criteria that use what we argue is an inappropriate measure of disclosure risk. We recommend that the measure of disclosure risk used to assess all disclosure protection methods focus on what we believe is the risk that individuals should care about, that more study of the impact of differential privacy criteria and synthetic data methods on data usability for research be conducted before either is put into widespread use, and that more research be conducted on alternative methods of disclosure risk reduction that better balance benefits and costs.The federal statistical system is experiencing competing pressures for change. On the one hand, for confidentiality reasons, much socially valuable data currently held by federal agencies is either not made available to researchers at all or only made available under onerous conditions. On the other hand, agencies which release public databases face new challenges in protecting the privacy of the subjects in those databases, which leads them to consider releasing fewer data or masking the data in ways that will reduce their accuracy. In this essay, we argue that the discussion has not given proper consideration to the reduced social benefits of data availability and their usability relative to the value of increased levels of privacy protection. A more balanced benefit-cost framework should be used to assess these trade-offs. We express concerns both with synthetic data methods for disclosure limitation, which will reduce the types of research that can be reliably conducted in unknown ways, and with differential privacy criteria that use what we argue is an inappropriate measure of disclosure risk. We recommend that the measure of disclosure risk used to assess all disclosure protection methods focus on what we believe is the risk that individuals should care about, that more study of the impact of differential privacy criteria and synthetic data methods on data usability for research be conducted before either is put into widespread use, and that more research be conducted on alternative methods of disclosure risk reduction that better balance benefits and costs. The federal statistical system is experiencing competing pressures for change. On the one hand, for confidentiality reasons, much socially valuable data currently held by federal agencies is either not made available to researchers at all or only made available under onerous conditions. On the other hand, agencies which release public databases face new challenges in protecting the privacy of the subjects in those databases, which leads them to consider releasing fewer data or masking the data in ways that will reduce their accuracy. In this essay, we argue that the discussion has not given proper consideration to the reduced social benefits of data availability and their usability relative to the value of increased levels of privacy protection. A more balanced benefit-cost framework should be used to assess these trade-offs. We express concerns both with synthetic data methods for disclosure limitation, which will reduce the types of research that can be reliably conducted in unknown ways, and with differential privacy criteria that use what we argue is an inappropriate measure of disclosure risk. We recommend that the measure of disclosure risk used to assess all disclosure protection methods focus on what we believe is the risk that individuals should care about, that more study of the impact of differential privacy criteria and synthetic data methods on data usability for research be conducted before either is put into widespread use, and that more research be conducted on alternative methods of disclosure risk reduction that better balance benefits and costs. |
| Author | Manski, Charles F Hotz, V Joseph Bollinger, Christopher R Nekipelov, Denis Spencer, Bruce D Komarova, Tatiana Moffitt, Robert A Sojourner, Aaron |
| Author_xml | – sequence: 1 givenname: V Joseph orcidid: 0000-0002-6958-3318 surname: Hotz fullname: Hotz, V Joseph organization: Department of Economics, Duke University, Durham, NC 27708 – sequence: 2 givenname: Christopher R orcidid: 0000-0002-3477-1028 surname: Bollinger fullname: Bollinger, Christopher R organization: Department of Economics, University of Kentucky, Lexington, KY 40503 – sequence: 3 givenname: Tatiana orcidid: 0000-0002-6581-5097 surname: Komarova fullname: Komarova, Tatiana organization: The London School of Economics and Political Science, London WC2A 3PH, United Kingdom – sequence: 4 givenname: Charles F orcidid: 0000-0001-7260-7686 surname: Manski fullname: Manski, Charles F organization: Department of Economics, Northwestern University, Evanston, IL 60208 – sequence: 5 givenname: Robert A orcidid: 0000-0002-3627-3057 surname: Moffitt fullname: Moffitt, Robert A organization: Department of Economics, Johns Hopkins University, Baltimore, MD 21211 – sequence: 6 givenname: Denis orcidid: 0000-0003-4734-265X surname: Nekipelov fullname: Nekipelov, Denis organization: Department of Economics, University of Virginia, Charlottesville, VA 22904 – sequence: 7 givenname: Aaron orcidid: 0000-0001-6839-2512 surname: Sojourner fullname: Sojourner, Aaron organization: W. E. Upjohn Institute for Employment Policy, Kalamazoo, MI 49007 – sequence: 8 givenname: Bruce D orcidid: 0000-0001-6155-7249 surname: Spencer fullname: Spencer, Bruce D organization: Department of Statistics and Data Science, Northwestern University, Evanston, IL 60208 |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/35878030$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNj0tLxDAUhYOMOA9du5Ms3XS8Sdo0WergqDDgRtflNg-NtJ2xSYX-eyuO4OqcAx8HviWZdfvOEXLJYM2gFDeHDuOaM8g1SMb0CVkw0CyT057963OyjPEDAHSh4IzMRaFKBQIWZHuHDXYmdG_UYkJ66MMXmpFiZ-kQsQ5NSCMNHU3vjnpnXY8NjQlTiCmYnz7G5Npzcuqxie7imCvyur1_2Txmu-eHp83tLjMF4ynDIleoDFdcMy-M8qrwBpRxqkThGKDkubBSSZEzbizUmDvmUejS-MJ6z1fk-vf30O8_BxdT1YZoXDM5uP0QKy51rmUJpZ7QqyM61K2z1WTWYj9Wf-78G81qXds |
| CitedBy_id | crossref_primary_10_1073_pnas_2321882121 crossref_primary_10_1177_10775595251337073 crossref_primary_10_1109_TIFS_2025_3552033 crossref_primary_10_2478_jos_2023_0017 crossref_primary_10_1038_s41598_024_56409_3 crossref_primary_10_1073_pnas_2424655122 crossref_primary_10_1186_s40537_025_01118_5 crossref_primary_10_1016_j_ijar_2024_109242 crossref_primary_10_1007_s11113_024_09931_1 crossref_primary_10_1016_j_respol_2024_105080 crossref_primary_10_1073_pnas_2303890120 crossref_primary_10_1086_732683 crossref_primary_10_1111_padr_12580 crossref_primary_10_1126_sciadv_adt1512 crossref_primary_10_1287_mksc_2024_0901 crossref_primary_10_1073_pnas_2220558120 crossref_primary_10_1093_bioadv_vbaf046 crossref_primary_10_1086_732521 crossref_primary_10_1002_wics_1615 crossref_primary_10_1109_TETCI_2024_3500009 crossref_primary_10_1145_3735562 |
| ContentType | Journal Article |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1073/pnas.2104906119 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Sciences (General) |
| EISSN | 1091-6490 |
| ExternalDocumentID | 35878030 |
| Genre | Journal Article |
| GroupedDBID | --- -DZ -~X .55 0R~ 123 29P 2AX 2FS 2WC 4.4 53G 5RE 5VS 85S AACGO AAFWJ AANCE ABBHK ABOCM ABPLY ABPPZ ABTLG ABZEH ACGOD ACIWK ACNCT ACPRK AENEX AEUPB AEXZC AFFNX AFRAH ALMA_UNASSIGNED_HOLDINGS BKOMP CGR CS3 CUY CVF D0L DCCCD DIK DU5 E3Z EBS ECM EIF F5P FRP GX1 H13 HH5 HYE IPSME JAAYA JBMMH JENOY JHFFW JKQEH JLS JLXEF JPM JSG JST KQ8 L7B LU7 N9A NPM N~3 O9- OK1 PNE PQQKQ R.V RHI RNA RNS RPM RXW SA0 SJN TAE TN5 UKR W8F WH7 WOQ WOW X7M XSW Y6R YBH YKV YSK ZCA ~02 ~KM 7X8 |
| ID | FETCH-LOGICAL-c512t-a548a8c28291f3c8f85fc08ce87a3e10a6243d6863412cd0ba4e1fa397cf5dff2 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 31 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000903753500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1091-6490 |
| IngestDate | Fri Sep 05 14:11:15 EDT 2025 Sat Nov 01 14:15:53 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 31 |
| Keywords | data disclosure risk federal statistical system data access |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c512t-a548a8c28291f3c8f85fc08ce87a3e10a6243d6863412cd0ba4e1fa397cf5dff2 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0000-0003-4734-265X 0000-0002-6958-3318 0000-0001-6839-2512 0000-0002-6581-5097 0000-0002-3627-3057 0000-0001-7260-7686 0000-0002-3477-1028 0000-0001-6155-7249 |
| OpenAccessLink | https://pubmed.ncbi.nlm.nih.gov/PMC9351352 |
| PMID | 35878030 |
| PQID | 2694967079 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_2694967079 pubmed_primary_35878030 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-08-02 |
| PublicationDateYYYYMMDD | 2022-08-02 |
| PublicationDate_xml | – month: 08 year: 2022 text: 2022-08-02 day: 02 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Proceedings of the National Academy of Sciences - PNAS |
| PublicationTitleAlternate | Proc Natl Acad Sci U S A |
| PublicationYear | 2022 |
| SSID | ssj0009580 |
| Score | 2.5994482 |
| Snippet | The federal statistical system is experiencing competing pressures for change. On the one hand, for confidentiality reasons, much socially valuable data... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | e2104906119 |
| SubjectTerms | Computer Security Confidentiality Data Collection Disclosure Federal Government Government Agencies Privacy |
| Title | Balancing data privacy and usability in the federal statistical system |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/35878030 https://www.proquest.com/docview/2694967079 |
| Volume | 119 |
| WOSCitedRecordID | wos000903753500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELaAMrAA5VleMhIDDIbYTmxnQoCoGKDqAFK3yPFD6pKGpq3Uf885SQUTQmKJokiJotP57rPvu-8QukqME4pJT5xILInBS0jOc0lMHkkm8lzEvtaZfZWDgRqN0mF74Fa1tMpVTKwDtZ2YcEZ-FzouUxH03O7LTxKmRoXqajtCYx11OECZQOmSI_VDdFc1agQpJSJOo5W0j-R3ZaGrW9juwFNB6S_4ss4z_Z3__uEu2m4RJn5oXKKL1lyxh7rtGq7wdSs0fbOP-o-B12ggeeFAFMXldLzQZol1YfG8kd6dLfG4wAASsQ-qE_Dd0IFUizuH-1oG-gB99J_fn15IO1eBGEjvM6Jhl6KVCTVU6rlRXiXeRMo4JTV3NNKCxdwKJSDDMWOjXMeOeg3IxfjEes8O0UYxKdwxwqlNtI4BsxlJYymtZloxynOTWwtgkvXQ5cpWGfhtKEbowk3mVfZtrR46agyelY3ARsYTJRVEn5M_vH2KtljoSAgsDnaGOh5WrTtHm2YBxphe1A4B18Hw7QtMXsBa |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Balancing+data+privacy+and+usability+in+the+federal+statistical+system&rft.jtitle=Proceedings+of+the+National+Academy+of+Sciences+-+PNAS&rft.au=Hotz%2C+V+Joseph&rft.au=Bollinger%2C+Christopher+R&rft.au=Komarova%2C+Tatiana&rft.au=Manski%2C+Charles+F&rft.date=2022-08-02&rft.eissn=1091-6490&rft.volume=119&rft.issue=31&rft.spage=e2104906119&rft_id=info:doi/10.1073%2Fpnas.2104906119&rft_id=info%3Apmid%2F35878030&rft_id=info%3Apmid%2F35878030&rft.externalDocID=35878030 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1091-6490&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1091-6490&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1091-6490&client=summon |