MIDAS: Multilinear detection at scale

We focus on two classes of problems in graph mining: (1) finding trees and (2) anomaly detection in complex networks using scan statistics. These are fundamental problems in a broad class of applications. Most of the parallel algorithms for such problems are either based on heuristics, which do not...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Journal of parallel and distributed computing Ročník 132; číslo C; s. 363 - 382
Hlavní autori:	Ekanayake, Saliya, Cadena, Jose, Wickramasinghe, Udayanga, Vullikanti, Anil
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	United States Elsevier Inc 01.10.2019 Elsevier
Predmet:	Distributed graph algorithms Graph scan statistics Multilinear detection Parameterized complexity Subgraph isomorphism Subgraph isomorphism Distributed graph algorithms Graph scan statistics Parameterized complexity Multilinear detection
ISSN:	0743-7315, 1096-0848
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	We focus on two classes of problems in graph mining: (1) finding trees and (2) anomaly detection in complex networks using scan statistics. These are fundamental problems in a broad class of applications. Most of the parallel algorithms for such problems are either based on heuristics, which do not scale very well, or use techniques like color coding, which have a high memory overhead. In this paper, we develop a novel approach for parallelizing both these classes of problems, using an algebraic representation of subgraphs as monomials—this methodology involves detecting multilinear terms in multivariate polynomials. Our algorithms show good scaling over a large regime, and they run on networks with close to half one billion edges. The resulting parallel algorithm for trees is able to scale to subgraphs of size 18, which has not been done before, and it significantly outperforms the best prior color coding based method (FASCIA) by more than two orders of magnitude. Our algorithm for network scan statistics is the first such parallelization, and it is able to handle a broad class of scan statistics functions with the same approach. •Finding subgraphs is an important primitive in network analysis.•It is possible to find “small” subgraphs optimally, but it takes exponential time.•Existing parallel algorithms find subgraphs of size up to 12.•We propose a distributed algorithm that scales to subgraphs of size 18.•Our algorithm can be applied to find subtrees and for anomaly detection tasks.
Bibliografia:	USDOE
ISSN:	0743-7315 1096-0848
DOI:	10.1016/j.jpdc.2019.04.006