Scaling up genome annotation using MAKER and work queue

Next generation sequencing technologies have enabled sequencing many genomes. Because of the overall increasing demand and the inherent parallelism available in many required analyses, these bioinformatics applications should ideally run on clusters, clouds and/or grids. We present a modified annota...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:International journal of bioinformatics research and applications Ročník 10; číslo 4-5; s. 447
Hlavní autori: Thrasher, Andrew, Musgrave, Zachary, Kachmarck, Brian, Thain, Douglas, Emrich, Scott
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Switzerland 2014
Predmet:
ISSN:1744-5485
On-line prístup:Zistit podrobnosti o prístupe
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Next generation sequencing technologies have enabled sequencing many genomes. Because of the overall increasing demand and the inherent parallelism available in many required analyses, these bioinformatics applications should ideally run on clusters, clouds and/or grids. We present a modified annotation framework that achieves a speed-up of 45x using 50 workers using a Caenorhabditis japonica test case. We also evaluate these modifications within the Amazon EC2 cloud framework. The underlying genome annotation (MAKER) is parallelised as an MPI application. Our framework enables it to now run without MPI while utilising a wide variety of distributed computing resources. This parallel framework also allows easy explicit data transfer, which helps overcome a major limitation of bioinformatics tools that often rely on shared file systems. Combined, our proposed framework can be used, even during early stages of development, to easily run sequence analysis tools on clusters, grids and clouds.
AbstractList Next generation sequencing technologies have enabled sequencing many genomes. Because of the overall increasing demand and the inherent parallelism available in many required analyses, these bioinformatics applications should ideally run on clusters, clouds and/or grids. We present a modified annotation framework that achieves a speed-up of 45x using 50 workers using a Caenorhabditis japonica test case. We also evaluate these modifications within the Amazon EC2 cloud framework. The underlying genome annotation (MAKER) is parallelised as an MPI application. Our framework enables it to now run without MPI while utilising a wide variety of distributed computing resources. This parallel framework also allows easy explicit data transfer, which helps overcome a major limitation of bioinformatics tools that often rely on shared file systems. Combined, our proposed framework can be used, even during early stages of development, to easily run sequence analysis tools on clusters, grids and clouds.Next generation sequencing technologies have enabled sequencing many genomes. Because of the overall increasing demand and the inherent parallelism available in many required analyses, these bioinformatics applications should ideally run on clusters, clouds and/or grids. We present a modified annotation framework that achieves a speed-up of 45x using 50 workers using a Caenorhabditis japonica test case. We also evaluate these modifications within the Amazon EC2 cloud framework. The underlying genome annotation (MAKER) is parallelised as an MPI application. Our framework enables it to now run without MPI while utilising a wide variety of distributed computing resources. This parallel framework also allows easy explicit data transfer, which helps overcome a major limitation of bioinformatics tools that often rely on shared file systems. Combined, our proposed framework can be used, even during early stages of development, to easily run sequence analysis tools on clusters, grids and clouds.
Next generation sequencing technologies have enabled sequencing many genomes. Because of the overall increasing demand and the inherent parallelism available in many required analyses, these bioinformatics applications should ideally run on clusters, clouds and/or grids. We present a modified annotation framework that achieves a speed-up of 45x using 50 workers using a Caenorhabditis japonica test case. We also evaluate these modifications within the Amazon EC2 cloud framework. The underlying genome annotation (MAKER) is parallelised as an MPI application. Our framework enables it to now run without MPI while utilising a wide variety of distributed computing resources. This parallel framework also allows easy explicit data transfer, which helps overcome a major limitation of bioinformatics tools that often rely on shared file systems. Combined, our proposed framework can be used, even during early stages of development, to easily run sequence analysis tools on clusters, grids and clouds.
Author Kachmarck, Brian
Musgrave, Zachary
Thrasher, Andrew
Thain, Douglas
Emrich, Scott
Author_xml – sequence: 1
  givenname: Andrew
  surname: Thrasher
  fullname: Thrasher, Andrew
  organization: Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA
– sequence: 2
  givenname: Zachary
  surname: Musgrave
  fullname: Musgrave, Zachary
  organization: Yelp, Inc., San Francisco, CA, USA
– sequence: 3
  givenname: Brian
  surname: Kachmarck
  fullname: Kachmarck, Brian
  organization: Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA
– sequence: 4
  givenname: Douglas
  surname: Thain
  fullname: Thain, Douglas
  organization: Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA
– sequence: 5
  givenname: Scott
  surname: Emrich
  fullname: Emrich, Scott
  organization: Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/24989862$$D View this record in MEDLINE/PubMed
BookMark eNo1jz1PwzAYhD0U0Q_4ASwoI0uK_fqNY4-lKlAoQiowR25sV4HELnEixL8niDKddPfoTjclIx-8JeSC0TnLKF6vH262izlQhnMqQCkckQnLEdMMZTYm0xjfKUUBmJ2SMaCSSgqYkPyl1HXl90l_SPbWh8Ym2vvQ6a4KPunjb_S0eFxtB9skX6H9SD5729szcuJ0He35UWfk7Xb1urxPN8936-Vik5acqy5VVPKcIjVSUWVydGAcc9JY5EKDLbkzJXVqp51wSoCWFDILku24A2TMwoxc_fUe2jAMx65oqljautbehj4WLEMOuRSKDejlEe13jTXFoa0a3X4X_2fhB3njVTQ
CitedBy_id crossref_primary_10_1002_cpe_4683
crossref_primary_10_3389_fgene_2020_00876
crossref_primary_10_1016_j_dib_2020_106531
crossref_primary_10_1016_j_procs_2015_10_116
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1504/IJBRA.2014.062994
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Biology
ExternalDocumentID 24989862
Genre Research Support, U.S. Gov't, Non-P.H.S
Journal Article
Research Support, N.I.H., Extramural
GroupedDBID ---
0R~
29J
4.4
53G
5GY
ABJNI
ACGFS
ACIWK
ACPRK
AFRAH
ALMA_UNASSIGNED_HOLDINGS
ALSBL
CGR
CS3
CUY
CVF
DU5
EBS
ECM
EIF
EJD
F5P
H13
HZ~
MET
MIE
NPM
O9-
P2P
RTD
7X8
ID FETCH-LOGICAL-c339t-90837040d8909d74f2df1f8de436a2ec3fdc0f9baf6f962a8025e281b3f2411e2
IEDL.DBID 7X8
ISSN 1744-5485
IngestDate Wed Oct 01 14:07:46 EDT 2025
Thu Jan 02 23:01:15 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 4-5
Keywords Caenorhabditis japonica
genome annotation
grid computing
next generation sequencing
bioinformatics
work queue
explicit data transfer
cloud computing
clusters
distributed computing
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c339t-90837040d8909d74f2df1f8de436a2ec3fdc0f9baf6f962a8025e281b3f2411e2
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PMID 24989862
PQID 1543278691
PQPubID 23479
ParticipantIDs proquest_miscellaneous_1543278691
pubmed_primary_24989862
PublicationCentury 2000
PublicationDate 2014-00-00
20140101
PublicationDateYYYYMMDD 2014-01-01
PublicationDate_xml – year: 2014
  text: 2014-00-00
PublicationDecade 2010
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
PublicationTitle International journal of bioinformatics research and applications
PublicationTitleAlternate Int J Bioinform Res Appl
PublicationYear 2014
SSID ssj0046245
Score 1.9573143
Snippet Next generation sequencing technologies have enabled sequencing many genomes. Because of the overall increasing demand and the inherent parallelism available...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 447
SubjectTerms Algorithms
Animals
Anopheles - genetics
Caenorhabditis - genetics
Cluster Analysis
Computational Biology - methods
Computer Systems
Genome
High-Throughput Nucleotide Sequencing - methods
Software
Tsetse Flies - genetics
Title Scaling up genome annotation using MAKER and work queue
URI https://www.ncbi.nlm.nih.gov/pubmed/24989862
https://www.proquest.com/docview/1543278691
Volume 10
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEF7UKnjx_agvVvAaTXY3m92TVGnx1VLqg97CZh_iwaTaVvDfO5ukehIELzkEEpaZ2ZlvZj5mEDpRsdPGSA25CTMB04IHQkYsMI4QaSOuwnJOwdNd0uuJ4VD264LbuKZVznxi6ahNoX2N_AxCPSWJ4DI6H70FfmuU767WKzTmUYMClPGUrmT43UVgnJRLigF0swCQeVx3NeOQnV3fXAxantnFTkMOLpn9jjDLSNNZ_e8Z19BKjTFxqzKKdTRn8w20VG2d_NxEyT3oBSIWno6wH9H6arHK86LqyWNPhH_G3dZtewCvDfbELQyHmNot9NhpP1xeBfX-hEBTKieBDP1kGxYaIUNpEuaIcZETxjLKFbGaOqNDJzPluJOcKAH4xxLAsdRBXI8s2UYLeZHbXYS1o5RRwCYqU8xQqyz8QsVxloEFcK6a6HgmkRTs0zcdVG6L6Tj9kUkT7VRiTUfVII0UUj8hIaXa-8PX-2jZa6uqfhyghoPbaQ_Rov6YvIzfj0rFw7PX734B4oW2uQ
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Scaling+up+genome+annotation+using+MAKER+and+work+queue&rft.jtitle=International+journal+of+bioinformatics+research+and+applications&rft.au=Thrasher%2C+Andrew&rft.au=Musgrave%2C+Zachary&rft.au=Kachmarck%2C+Brian&rft.au=Thain%2C+Douglas&rft.date=2014-01-01&rft.issn=1744-5485&rft.volume=10&rft.issue=4-5&rft.spage=447&rft_id=info:doi/10.1504%2FIJBRA.2014.062994&rft_id=info%3Apmid%2F24989862&rft_id=info%3Apmid%2F24989862&rft.externalDocID=24989862
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1744-5485&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1744-5485&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1744-5485&client=summon