A DC programming approach for finding communities in networks

Automatic discovery of community structures in complex networks is a fundamental task in many disciplines, including physics, biology, and the social sciences. The most used criterion for characterizing the existence of a community structure in a network is modularity, a quantitative measure propose...

Full description

Saved in:

Bibliographic Details
Published in:	Neural computation Vol. 26; no. 12; p. 2827
Main Authors:	Le Thi, Hoai An, Nguyen, Manh Cuong, Dinh, Tao Pham
Format:	Journal Article
Language:	English
Published:	United States 01.12.2014
Subjects:	Algorithms Community Networks Humans Models, Theoretical
ISSN:	1530-888X, 1530-888X
Online Access:	Get more information
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Automatic discovery of community structures in complex networks is a fundamental task in many disciplines, including physics, biology, and the social sciences. The most used criterion for characterizing the existence of a community structure in a network is modularity, a quantitative measure proposed by Newman and Girvan (2004). The discovery community can be formulated as the so-called modularity maximization problem that consists of finding a partition of nodes of a network with the highest modularity. In this letter, we propose a fast and scalable algorithm called DCAM, based on DC (difference of convex function) programming and DCA (DC algorithms), an innovative approach in nonconvex programming framework for solving the modularity maximization problem. The special structure of the problem considered here has been well exploited to get an inexpensive DCA scheme that requires only a matrix-vector product at each iteration. Starting with a very large number of communities, DCAM furnishes, as output results, an optimal partition together with the optimal number of communities [Formula: see text]; that is, the number of communities is discovered automatically during DCAM's iterations. Numerical experiments are performed on a variety of real-world network data sets with up to 4,194,304 nodes and 30,359,198 edges. The comparative results with height reference algorithms show that the proposed approach outperforms them not only on quality and rapidity but also on scalability. Moreover, it realizes a very good trade-off between the quality of solutions and the run time.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1530-888X 1530-888X
DOI:	10.1162/NECO_a_00673