RankMap: A Framework for Distributed Learning From Dense Data Sets

This paper introduces RankMap, a platform-aware end-to-end framework for efficient execution of a broad class of iterative learning algorithms for massive and dense data sets. Our framework exploits data structure to scalably factorize it into an ensemble of lower rank subspaces. The factorization c...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transaction on neural networks and learning systems Vol. 29; no. 7; pp. 2717 - 2730
Main Authors:	Mirhoseini, Azalia, Dyer, Eva L., Songhori, Ebrahim M., Baraniuk, Richard, Koushanfar, Farinaz
Format:	Journal Article
Language:	English
Published:	United States IEEE 01.07.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Algorithms Computational modeling Computer networks Data recovery Data structures Datasets Dense and big data Distributed databases Distributed processing Iterative algorithms iterative machine learning (ML) Iterative methods large-scale distributed computing Learning algorithms low rank approximation Machine learning Matrix decomposition Memory Partitioning algorithms Signal processing algorithms Sparse matrices sparse matrix factorization Subspaces union of subspaces
ISSN:	2162-237X, 2162-2388, 2162-2388
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper introduces RankMap, a platform-aware end-to-end framework for efficient execution of a broad class of iterative learning algorithms for massive and dense data sets. Our framework exploits data structure to scalably factorize it into an ensemble of lower rank subspaces. The factorization creates sparse low-dimensional representations of the data, a property which is leveraged to devise effective mapping and scheduling of iterative learning algorithms on the distributed computing machines. We provide two APIs, one matrix-based and one graph-based, which facilitate automated adoption of the framework for performing several contemporary learning applications. To demonstrate the utility of RankMap, we solve sparse recovery and power iteration problems on various real-world data sets with up to 1.8 billion nonzeros. Our evaluations are performed on Amazon EC2 and IBM iDataPlex servers using up to 244 cores. The results demonstrate up to two orders of magnitude improvements in memory usage, execution speed, and bandwidth compared with the best reported prior work, while achieving the same level of learning accuracy.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2016.2631581