QIDLearningLib: A Python library for quasi-identifier recognition and evaluation

Quasi-identifiers (QIDs) are attributes in a dataset that are not directly unique identifiers of the users/entities themselves but can be used, often in conjunction with other datasets or information, to identify individuals and thus present a privacy risk in data sharing and analysis. Identifying Q...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Neurocomputing (Amsterdam) Ročník 654; s. 131239
Hlavní autoři: Amaral Simões, Sancho, Vilela, João P., Seoane Santos, Miriam, Henriques Abreu, Pedro
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 14.11.2025
Témata:
ISSN:0925-2312
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Quasi-identifiers (QIDs) are attributes in a dataset that are not directly unique identifiers of the users/entities themselves but can be used, often in conjunction with other datasets or information, to identify individuals and thus present a privacy risk in data sharing and analysis. Identifying QIDs is important in developing proper strategies for anonymization and data sanitization. This paper proposes QIDLearningLib, a Python library that offers a set of metrics and tools to measure the qualities of QIDs and identify them in data sets. It incorporates metrics from different domains – causality, privacy, data utility, and performance – to offer a holistic assessment of the properties of attributes in a given tabular dataset. Furthermore, QIDLearningLib offers visual analysis tools to present how these metrics shift over a dataset and implements an extensible framework that employs multiple optimization algorithms such as an evolutionary algorithm, simulated annealing, and greedy search using these metrics to identify a meaningful set of QIDs. •QIDLearningLib is the first library for automated QID recognition in tabular datasets.•It integrates metrics from causality, data privacy, and data utility for flexible QID selection.•It provides metrics to evaluate the performance of the QID selection system, against a ground-truth.•Includes multiple optimization algorithms for QID selection based on user-defined metrics.•Supports redundancy analysis to identify the most relevant, non-overlapping metrics.•Provides graphical and testing tools for enhanced interpretability.
ISSN:0925-2312
DOI:10.1016/j.neucom.2025.131239