MIDAS: Multilinear detection at scale

We focus on two classes of problems in graph mining: (1) finding trees and (2) anomaly detection in complex networks using scan statistics. These are fundamental problems in a broad class of applications. Most of the parallel algorithms for such problems are either based on heuristics, which do not...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of parallel and distributed computing Vol. 132; no. C; pp. 363 - 382
Main Authors:	Ekanayake, Saliya, Cadena, Jose, Wickramasinghe, Udayanga, Vullikanti, Anil
Format:	Journal Article
Language:	English
Published:	United States Elsevier Inc 01.10.2019 Elsevier
Subjects:	Distributed graph algorithms Graph scan statistics Multilinear detection Parameterized complexity Subgraph isomorphism Subgraph isomorphism Distributed graph algorithms Graph scan statistics Parameterized complexity Multilinear detection
ISSN:	0743-7315, 1096-0848
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We focus on two classes of problems in graph mining: (1) finding trees and (2) anomaly detection in complex networks using scan statistics. These are fundamental problems in a broad class of applications. Most of the parallel algorithms for such problems are either based on heuristics, which do not scale very well, or use techniques like color coding, which have a high memory overhead. In this paper, we develop a novel approach for parallelizing both these classes of problems, using an algebraic representation of subgraphs as monomials—this methodology involves detecting multilinear terms in multivariate polynomials. Our algorithms show good scaling over a large regime, and they run on networks with close to half one billion edges. The resulting parallel algorithm for trees is able to scale to subgraphs of size 18, which has not been done before, and it significantly outperforms the best prior color coding based method (FASCIA) by more than two orders of magnitude. Our algorithm for network scan statistics is the first such parallelization, and it is able to handle a broad class of scan statistics functions with the same approach. •Finding subgraphs is an important primitive in network analysis.•It is possible to find “small” subgraphs optimally, but it takes exponential time.•Existing parallel algorithms find subgraphs of size up to 12.•We propose a distributed algorithm that scales to subgraphs of size 18.•Our algorithm can be applied to find subtrees and for anomaly detection tasks.
Bibliography:	USDOE
ISSN:	0743-7315 1096-0848
DOI:	10.1016/j.jpdc.2019.04.006