QIDLearningLib: A Python library for quasi-identifier recognition and evaluation
Quasi-identifiers (QIDs) are attributes in a dataset that are not directly unique identifiers of the users/entities themselves but can be used, often in conjunction with other datasets or information, to identify individuals and thus present a privacy risk in data sharing and analysis. Identifying Q...
Uloženo v:
| Vydáno v: | Neurocomputing (Amsterdam) Ročník 654; s. 131239 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
14.11.2025
|
| Témata: | |
| ISSN: | 0925-2312 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Quasi-identifiers (QIDs) are attributes in a dataset that are not directly unique identifiers of the users/entities themselves but can be used, often in conjunction with other datasets or information, to identify individuals and thus present a privacy risk in data sharing and analysis. Identifying QIDs is important in developing proper strategies for anonymization and data sanitization. This paper proposes QIDLearningLib, a Python library that offers a set of metrics and tools to measure the qualities of QIDs and identify them in data sets. It incorporates metrics from different domains – causality, privacy, data utility, and performance – to offer a holistic assessment of the properties of attributes in a given tabular dataset. Furthermore, QIDLearningLib offers visual analysis tools to present how these metrics shift over a dataset and implements an extensible framework that employs multiple optimization algorithms such as an evolutionary algorithm, simulated annealing, and greedy search using these metrics to identify a meaningful set of QIDs.
•QIDLearningLib is the first library for automated QID recognition in tabular datasets.•It integrates metrics from causality, data privacy, and data utility for flexible QID selection.•It provides metrics to evaluate the performance of the QID selection system, against a ground-truth.•Includes multiple optimization algorithms for QID selection based on user-defined metrics.•Supports redundancy analysis to identify the most relevant, non-overlapping metrics.•Provides graphical and testing tools for enhanced interpretability. |
|---|---|
| ISSN: | 0925-2312 |
| DOI: | 10.1016/j.neucom.2025.131239 |