Explainable graph clustering via expanders in the massively parallel computation model

Explainable clustering provides human-understandable reasons for decisions in black-box learning models. In a previous work, a decision tree built on the set of dimensions was used to define ranges of values for k-means clusters. For explainable graph clustering, we use expander graphs instead of de...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Information sciences Ročník 677; s. 120897
Hlavní autoři: Aghamolaei, Sepideh, Ghodsi, Mohammad
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Inc 01.08.2024
Témata:
ISSN:0020-0255
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Explainable clustering provides human-understandable reasons for decisions in black-box learning models. In a previous work, a decision tree built on the set of dimensions was used to define ranges of values for k-means clusters. For explainable graph clustering, we use expander graphs instead of dense subgraphs since powering an expander graph is guaranteed to result in a clique after at most a logarithmic number of steps. Consider a set of multi-dimensional points labeled with k labels. We introduce the heat map sorting problem as reordering the rows and columns of an input matrix (each point is a column and each row is a dimension) such that the labels of the entries of the matrix form connected components (clusters). A cluster is preserved if it remains connected, i.e., if it is not split into several clusters and no two clusters are merged. In the massively parallel computation model (MPC), each machine has a sublinear memory and the total memory of the machines is linear. We prove the problem is NP-hard. We give a fixed-parameter algorithm in MPC and an approximation algorithm based on expander decomposition. We empirically compare our algorithm with explainable k-means on several graphs of email and computer networks. •A general method for explainable clustering of high-dimensional data.•A fixed-parameter algorithms for explainable graph clustering.•A Massively Parallel Computation (MPC) algorithm for explainable clustering.•An approximation algorithm for graph clustering on expander graphs.
ISSN:0020-0255
DOI:10.1016/j.ins.2024.120897