Silencing the Risk, Not the Whistle: A Semi-automated Text Sanitization Tool for Mitigating the Risk of Whistleblower Re-Identification

Uloženo v:
Podrobná bibliografie
Název: Silencing the Risk, Not the Whistle: A Semi-automated Text Sanitization Tool for Mitigating the Risk of Whistleblower Re-Identification
Autoři: Dimitri Staufer, Frank Pallas, Bettina Berendt
Zdroj: The 2024 ACM Conference on Fairness Accountability and Transparency. :733-745
Publication Status: Preprint
Informace o vydavateli: ACM, 2024.
Rok vydání: 2024
Témata: H.3, FOS: Computer and information sciences, Computer Science - Computation and Language, H.5, J.4, 300 Sozialwissenschaften::380 Handel, Kommunikation, Verkehr::384 Kommunikation, Telekommunikation, Computer Science - Human-Computer Interaction, LLM-based rephrasing, K.4, 16. Peace & justice, K.5, Computer Science - Information Retrieval, Human-Computer Interaction (cs.HC), whistleblower anonymity, authorship obfuscation, Software Engineering (cs.SE), Computer Science - Computers and Society, Computer Science - Software Engineering, fine-tuning language models, D.2, text sanitization, Computers and Society (cs.CY), Computation and Language (cs.CL), Information Retrieval (cs.IR)
Popis: Whistleblowing is essential for ensuring transparency and accountability in both public and private sectors. However, (potential) whistleblowers often fear or face retaliation, even when reporting anonymously. The specific content of their disclosures and their distinct writing style may re-identify them as the source. Legal measures, such as the EU WBD, are limited in their scope and effectiveness. Therefore, computational methods to prevent re-identification are important complementary tools for encouraging whistleblowers to come forward. However, current text sanitization tools follow a one-size-fits-all approach and take an overly limited view of anonymity. They aim to mitigate identification risk by replacing typical high-risk words (such as person names and other NE labels) and combinations thereof with placeholders. Such an approach, however, is inadequate for the whistleblowing scenario since it neglects further re-identification potential in textual features, including writing style. Therefore, we propose, implement, and evaluate a novel classification and mitigation strategy for rewriting texts that involves the whistleblower in the assessment of the risk and utility. Our prototypical tool semi-automatically evaluates risk at the word/term level and applies risk-adapted anonymization techniques to produce a grammatically disjointed yet appropriately sanitized text. We then use a LLM that we fine-tuned for paraphrasing to render this text coherent and style-neutral. We evaluate our tool's effectiveness using court cases from the ECHR and excerpts from a real-world whistleblower testimony and measure the protection against authorship attribution (AA) attacks and utility loss statistically using the popular IMDb62 movie reviews dataset. Our method can significantly reduce AA accuracy from 98.81% to 31.22%, while preserving up to 73.1% of the original content's semantics.
Accepted for publication at the ACM Conference on Fairness, Accountability, and Transparency 2024 (ACM FAccT'24). This is a preprint manuscript (authors' own version before final copy-editing)
Druh dokumentu: Article
Conference object
DOI: 10.1145/3630106.3658936
DOI: 10.48550/arxiv.2405.01097
DOI: 10.14279/depositonce-21315
Přístupová URL adresa: http://arxiv.org/abs/2405.01097
Rights: CC BY ND
CC BY NC ND
Přístupové číslo: edsair.doi.dedup.....64b4eb1cf8239898b155bdea38c7e306
Databáze: OpenAIRE
Popis
Abstrakt:Whistleblowing is essential for ensuring transparency and accountability in both public and private sectors. However, (potential) whistleblowers often fear or face retaliation, even when reporting anonymously. The specific content of their disclosures and their distinct writing style may re-identify them as the source. Legal measures, such as the EU WBD, are limited in their scope and effectiveness. Therefore, computational methods to prevent re-identification are important complementary tools for encouraging whistleblowers to come forward. However, current text sanitization tools follow a one-size-fits-all approach and take an overly limited view of anonymity. They aim to mitigate identification risk by replacing typical high-risk words (such as person names and other NE labels) and combinations thereof with placeholders. Such an approach, however, is inadequate for the whistleblowing scenario since it neglects further re-identification potential in textual features, including writing style. Therefore, we propose, implement, and evaluate a novel classification and mitigation strategy for rewriting texts that involves the whistleblower in the assessment of the risk and utility. Our prototypical tool semi-automatically evaluates risk at the word/term level and applies risk-adapted anonymization techniques to produce a grammatically disjointed yet appropriately sanitized text. We then use a LLM that we fine-tuned for paraphrasing to render this text coherent and style-neutral. We evaluate our tool's effectiveness using court cases from the ECHR and excerpts from a real-world whistleblower testimony and measure the protection against authorship attribution (AA) attacks and utility loss statistically using the popular IMDb62 movie reviews dataset. Our method can significantly reduce AA accuracy from 98.81% to 31.22%, while preserving up to 73.1% of the original content's semantics.<br />Accepted for publication at the ACM Conference on Fairness, Accountability, and Transparency 2024 (ACM FAccT'24). This is a preprint manuscript (authors' own version before final copy-editing)
DOI:10.1145/3630106.3658936