Key-value data collection and statistical analysis with local differential privacy

The collection and statistical analysis of simple data types (e.g., categorical, numerical and multi-dimensional data) under local differential privacy has been widely studied. Recently, researchers have focused on the collection of the key-value data, which is one of the main types of NoSQL data mo...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Information sciences Ročník 640; s. 119058
Hlavní autoři: Zhu, Hui, Tang, Xiaohu, Yang, Laurence Tianruo, Fu, Chao, Peng, Shuangrong
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Inc 01.09.2023
Témata:
ISSN:0020-0255, 1872-6291
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The collection and statistical analysis of simple data types (e.g., categorical, numerical and multi-dimensional data) under local differential privacy has been widely studied. Recently, researchers have focused on the collection of the key-value data, which is one of the main types of NoSQL data model. In the collection and statistical analysis of key-value data under local differential privacy, the frequency and mean of each key must be estimated simultaneously. However, achieving a good utility-privacy tradeoff is difficult, because key-value data has inherent correlation, and some users may have different numbers of key-value pairs. In this paper, we propose an efficient sampling based scheme for collecting and analyzing key-value data. Note that the more valid data collected, the higher the accuracy of statistical data under the same disturbance level and disturbance algorithm. Therefore, we make full use of probability sampling and the inherent correlation of key-value data to improve the probability of users submitting valid key-value data. Moreover, we optimize the budget allocation on key-value data, so that the overall variance of frequency and mean estimation is close to optimal. Detailed theoretical analysis and experimental results show that the proposed scheme is superior to existing schemes in accuracy. •We propose an efficient SKV-GRR scheme with separate key and value selection for collecting and analyzing key-value data.•In the key selection, we use unequal probability sampling to improve the probability of users submitting valid data.•The value selection based on weak correlated perturbation can improve the probability of users submitting valid value data.•We optimize the budget allocation on the selected key and the selected value to improve the accuracy of estimated data.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2023.119058