Ultrafast clustering algorithms for metagenomic sequence analysis

The rapid advances of high-throughput sequencing technologies dramatically prompted metagenomic studies of microbial communities that exist at various environments. Fundamental questions in metagenomics include the identities, composition and dynamics of microbial populations and their functions and...

Full description

Saved in:
Bibliographic Details
Published in:Briefings in bioinformatics Vol. 13; no. 6; pp. 656 - 668
Main Authors: Li, W., Fu, L., Niu, B., Wu, S., Wooley, J.
Format: Journal Article
Language:English
Published: England Oxford Publishing Limited (England) 01.11.2012
Oxford University Press
Subjects:
ISSN:1467-5463, 1477-4054, 1477-4054
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The rapid advances of high-throughput sequencing technologies dramatically prompted metagenomic studies of microbial communities that exist at various environments. Fundamental questions in metagenomics include the identities, composition and dynamics of microbial populations and their functions and interactions. However, the massive quantity and the comprehensive complexity of these sequence data pose tremendous challenges in data analysis. These challenges include but are not limited to ever-increasing computational demand, biased sequence sampling, sequence errors, sequence artifacts and novel sequences. Sequence clustering methods can directly answer many of the fundamental questions by grouping similar sequences into families. In addition, clustering analysis also addresses the challenges in metagenomics. Thus, a large redundant data set can be represented with a small non-redundant set, where each cluster can be represented by a single entry or a consensus. Artifacts can be rapidly detected through clustering. Errors can be identified, filtered or corrected by using consensus from sequences within clusters.
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ISSN:1467-5463
1477-4054
1477-4054
DOI:10.1093/bib/bbs035