Scalable Convex Multiple Sequence Alignment via Entropy-Regularized Dual Decomposition

is one of the fundamental tasks in biological sequence analysis that underlies applications such as phylogenetic trees, profiles, and structure prediction. The task, however, is NP-hard, and the current practice resorts to heuristic and local-search methods. Recently, a convex optimization approach...

Full description

Saved in:
Bibliographic Details
Published in:JMLR workshop and conference proceedings Vol. 54; p. 1514
Main Authors: Zhang, Jiong, Yen, Ian E H, Ravikumar, Pradeep, Dhillon, Inderjit S
Format: Journal Article
Language:English
Published: United States 01.04.2017
ISSN:1938-7288
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:is one of the fundamental tasks in biological sequence analysis that underlies applications such as phylogenetic trees, profiles, and structure prediction. The task, however, is NP-hard, and the current practice resorts to heuristic and local-search methods. Recently, a convex optimization approach for MSA was proposed based on the concept of atomic norm [23], which demonstrates significant improvement over existing methods in the quality of alignments. However, the convex program is challenging to solve due to the constraint given by the intersection of two atomic-norm balls, for which the existing algorithm can only handle sequences of length up to 50, with an iteration complexity subject to constants of unknown relation to the natural parameters of MSA. In this work, we propose an algorithm that exploits to induce closed-form solutions for each atomic-norm-constrained subproblem, giving a single-loop algorithm of iteration complexity linear to the problem size (total length of all sequences). The proposed algorithm gives significantly better alignments than existing methods on sequences of length up to hundreds, where the existing convex programming method fails to converge in one day.
ISSN:1938-7288