MIDAS: Multilinear detection at scale

We focus on two classes of problems in graph mining: (1) finding trees and (2) anomaly detection in complex networks using scan statistics. These are fundamental problems in a broad class of applications. Most of the parallel algorithms for such problems are either based on heuristics, which do not...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of parallel and distributed computing Jg. 132; H. C; S. 363 - 382
Hauptverfasser: Ekanayake, Saliya, Cadena, Jose, Wickramasinghe, Udayanga, Vullikanti, Anil
Format: Journal Article
Sprache:Englisch
Veröffentlicht: United States Elsevier Inc 01.10.2019
Elsevier
Schlagworte:
ISSN:0743-7315, 1096-0848
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We focus on two classes of problems in graph mining: (1) finding trees and (2) anomaly detection in complex networks using scan statistics. These are fundamental problems in a broad class of applications. Most of the parallel algorithms for such problems are either based on heuristics, which do not scale very well, or use techniques like color coding, which have a high memory overhead. In this paper, we develop a novel approach for parallelizing both these classes of problems, using an algebraic representation of subgraphs as monomials—this methodology involves detecting multilinear terms in multivariate polynomials. Our algorithms show good scaling over a large regime, and they run on networks with close to half one billion edges. The resulting parallel algorithm for trees is able to scale to subgraphs of size 18, which has not been done before, and it significantly outperforms the best prior color coding based method (FASCIA) by more than two orders of magnitude. Our algorithm for network scan statistics is the first such parallelization, and it is able to handle a broad class of scan statistics functions with the same approach. •Finding subgraphs is an important primitive in network analysis.•It is possible to find “small” subgraphs optimally, but it takes exponential time.•Existing parallel algorithms find subgraphs of size up to 12.•We propose a distributed algorithm that scales to subgraphs of size 18.•Our algorithm can be applied to find subtrees and for anomaly detection tasks.
Bibliographie:USDOE
ISSN:0743-7315
1096-0848
DOI:10.1016/j.jpdc.2019.04.006