Explainable graph clustering via expanders in the massively parallel computation model
Explainable clustering provides human-understandable reasons for decisions in black-box learning models. In a previous work, a decision tree built on the set of dimensions was used to define ranges of values for k-means clusters. For explainable graph clustering, we use expander graphs instead of de...
Saved in:
| Published in: | Information sciences Vol. 677; p. 120897 |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier Inc
01.08.2024
|
| Subjects: | |
| ISSN: | 0020-0255 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Explainable clustering provides human-understandable reasons for decisions in black-box learning models. In a previous work, a decision tree built on the set of dimensions was used to define ranges of values for k-means clusters. For explainable graph clustering, we use expander graphs instead of dense subgraphs since powering an expander graph is guaranteed to result in a clique after at most a logarithmic number of steps.
Consider a set of multi-dimensional points labeled with k labels. We introduce the heat map sorting problem as reordering the rows and columns of an input matrix (each point is a column and each row is a dimension) such that the labels of the entries of the matrix form connected components (clusters). A cluster is preserved if it remains connected, i.e., if it is not split into several clusters and no two clusters are merged. In the massively parallel computation model (MPC), each machine has a sublinear memory and the total memory of the machines is linear.
We prove the problem is NP-hard. We give a fixed-parameter algorithm in MPC and an approximation algorithm based on expander decomposition. We empirically compare our algorithm with explainable k-means on several graphs of email and computer networks.
•A general method for explainable clustering of high-dimensional data.•A fixed-parameter algorithms for explainable graph clustering.•A Massively Parallel Computation (MPC) algorithm for explainable clustering.•An approximation algorithm for graph clustering on expander graphs. |
|---|---|
| ISSN: | 0020-0255 |
| DOI: | 10.1016/j.ins.2024.120897 |