SecureNLP: A System for Multi-Party Privacy-Preserving Natural Language Processing

Natural language processing (NLP) allows a computer program to understand human language as it is spoken, and has been increasingly deployed in a growing number of applications, such as machine translation, sentiment analysis, and electronic voice assistant. While information obtained from different...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on information forensics and security Vol. 15; pp. 3709 - 3721
Main Authors: Feng, Qi, He, Debiao, Liu, Zhe, Wang, Huaqun, Choo, Kim-Kwang Raymond
Format: Journal Article
Language:English
Published: New York IEEE 2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:1556-6013, 1556-6021
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Natural language processing (NLP) allows a computer program to understand human language as it is spoken, and has been increasingly deployed in a growing number of applications, such as machine translation, sentiment analysis, and electronic voice assistant. While information obtained from different sources can enhance the accuracy of NLP models, there are also privacy implications in the collection of such massive data. Thus, in this paper, we design a privacy-preserving system SecureNLP, focusing on the instance of recurrent neural network (RNN)-based sequence-to-sequence with attention model for neural machine translation. Specifically, for non-linear functions such as sigmoid and tanh, we design two efficient distributed protocols using secure multi-party computation (MPC), which are used to carry out the respective tasks in the SecureNLP. We also prove the security of these two protocols (i.e., privacy-preserving long short-term memory network <inline-formula> <tex-math notation="LaTeX">\textsf {PrivLSTM} </tex-math></inline-formula>, and privacy-preserving sequence to sequence transformation <inline-formula> <tex-math notation="LaTeX">\textsf {PrivSEQ2SEQ} </tex-math></inline-formula>) in the semi-honest adversary model, in the sense that any honest-but-curious adversary cannot learn anything else from the messages they receive from other parties. The proposed system is implemented in C++ and Python, and the findings from the evaluation demonstrate the utility of the protocols in cross-domain NLP.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1556-6013
1556-6021
DOI:10.1109/TIFS.2020.2997134