Scaling up genome annotation using MAKER and work queue

Next generation sequencing technologies have enabled sequencing many genomes. Because of the overall increasing demand and the inherent parallelism available in many required analyses, these bioinformatics applications should ideally run on clusters, clouds and/or grids. We present a modified annota...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	International journal of bioinformatics research and applications Ročník 10; číslo 4-5; s. 447
Hlavní autoři:	Thrasher, Andrew, Musgrave, Zachary, Kachmarck, Brian, Thain, Douglas, Emrich, Scott
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Switzerland 2014
Témata:	Algorithms Animals Anopheles - genetics Caenorhabditis - genetics Cluster Analysis Computational Biology - methods Computer Systems Genome High-Throughput Nucleotide Sequencing - methods Software Tsetse Flies - genetics Caenorhabditis japonica genome annotation grid computing next generation sequencing bioinformatics work queue explicit data transfer cloud computing clusters distributed computing
ISSN:	1744-5485
On-line přístup:	Zjistit podrobnosti o přístupu
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Next generation sequencing technologies have enabled sequencing many genomes. Because of the overall increasing demand and the inherent parallelism available in many required analyses, these bioinformatics applications should ideally run on clusters, clouds and/or grids. We present a modified annotation framework that achieves a speed-up of 45x using 50 workers using a Caenorhabditis japonica test case. We also evaluate these modifications within the Amazon EC2 cloud framework. The underlying genome annotation (MAKER) is parallelised as an MPI application. Our framework enables it to now run without MPI while utilising a wide variety of distributed computing resources. This parallel framework also allows easy explicit data transfer, which helps overcome a major limitation of bioinformatics tools that often rely on shared file systems. Combined, our proposed framework can be used, even during early stages of development, to easily run sequence analysis tools on clusters, grids and clouds.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1744-5485
DOI:	10.1504/IJBRA.2014.062994