Unsupervised Entity Resolution With Blocking and Graph Algorithms

Entity resolution identifies all records in a database that refer to the same entity. In this paper, we propose an unsupervised framework for entity resolution using blocking and graph algorithms. The records are partitioned into blocks with no redundancy for efficiency improvement. For intra-block...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on knowledge and data engineering Ročník 34; číslo 3; s. 1501 - 1515
Hlavní autoři: Zhang, Dongxiang, Li, Dongsheng, Guo, Long, Tan, Kian-Lee
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.03.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1041-4347, 1558-2191
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Entity resolution identifies all records in a database that refer to the same entity. In this paper, we propose an unsupervised framework for entity resolution using blocking and graph algorithms. The records are partitioned into blocks with no redundancy for efficiency improvement. For intra-block data processing, we propose a graph-theoretic fusion framework with two components, namely ITER and CliqueRank. Specifically, ITER constructs a weighted bipartite graph between terms and record-record pairs and iteratively propagates the node salience until convergence. Subsequently, CliqueRank constructs a record graph to estimate the likelihood of two records resident in the same clique. The derived likelihood from CliqueRank is fed back to ITER to rectify the edge weight until a joint optimum can be reached. Experimental evaluation was conducted with 4 real datasets. Results show that our unsupervised framework is comparable or even superior to state-of-the-art deep learning approaches.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2020.2991063