Algorithmically Bypassing Censorship on Sina Weibo with Nondeterministic Homophone Substitutions

Like traditional media, social media in China is subject to censorship. However, in limited cases, activists have employed homophones of censored keywords to avoid detection by keyword matching algorithms. In this paper, we show that it is possible to scale this idea up in ways that make it difficul...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the International AAAI Conference on Web and Social Media Vol. 9; no. 1; pp. 150 - 158
Main Authors: Hiruncharoenvate, Chaya, Lin, Zhiyuan, Gilbert, Eric
Format: Journal Article
Language:English
Published: 03.08.2021
ISSN:2162-3449, 2334-0770
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Like traditional media, social media in China is subject to censorship. However, in limited cases, activists have employed homophones of censored keywords to avoid detection by keyword matching algorithms. In this paper, we show that it is possible to scale this idea up in ways that make it difficult to defend against. Specifically, we present a non-deterministic algorithm for generating homophones that create large numbers of false positives for censors, making it difficult to locate banned conversations. In two experiments, we show that 1) homophone-transformed weibos posted to Sina Weibo remain on-site three times longer than their previously censored counterparts, and 2) native Chinese speakers can recover the original intent behind the homophone-transformed messages, with 99% of our posts understood by the majority of our participants. Finally, we find that coping with homophone transformations is likely to cost the Sina Weibo censorship apparatus an additional 15 hours of human labor per day, per censored keyword. To conclude, we reflect briefly on the opportunities presented by this algorithm to build interactive, client-side tools that promote free speech.
ISSN:2162-3449
2334-0770
DOI:10.1609/icwsm.v9i1.14637