Scaling up genome annotation using MAKER and work queue
Next generation sequencing technologies have enabled sequencing many genomes. Because of the overall increasing demand and the inherent parallelism available in many required analyses, these bioinformatics applications should ideally run on clusters, clouds and/or grids. We present a modified annota...
Uložené v:
| Vydané v: | International journal of bioinformatics research and applications Ročník 10; číslo 4-5; s. 447 |
|---|---|
| Hlavní autori: | , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Switzerland
2014
|
| Predmet: | |
| ISSN: | 1744-5485 |
| On-line prístup: | Zistit podrobnosti o prístupe |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Next generation sequencing technologies have enabled sequencing many genomes. Because of the overall increasing demand and the inherent parallelism available in many required analyses, these bioinformatics applications should ideally run on clusters, clouds and/or grids. We present a modified annotation framework that achieves a speed-up of 45x using 50 workers using a Caenorhabditis japonica test case. We also evaluate these modifications within the Amazon EC2 cloud framework. The underlying genome annotation (MAKER) is parallelised as an MPI application. Our framework enables it to now run without MPI while utilising a wide variety of distributed computing resources. This parallel framework also allows easy explicit data transfer, which helps overcome a major limitation of bioinformatics tools that often rely on shared file systems. Combined, our proposed framework can be used, even during early stages of development, to easily run sequence analysis tools on clusters, grids and clouds. |
|---|---|
| AbstractList | Next generation sequencing technologies have enabled sequencing many genomes. Because of the overall increasing demand and the inherent parallelism available in many required analyses, these bioinformatics applications should ideally run on clusters, clouds and/or grids. We present a modified annotation framework that achieves a speed-up of 45x using 50 workers using a Caenorhabditis japonica test case. We also evaluate these modifications within the Amazon EC2 cloud framework. The underlying genome annotation (MAKER) is parallelised as an MPI application. Our framework enables it to now run without MPI while utilising a wide variety of distributed computing resources. This parallel framework also allows easy explicit data transfer, which helps overcome a major limitation of bioinformatics tools that often rely on shared file systems. Combined, our proposed framework can be used, even during early stages of development, to easily run sequence analysis tools on clusters, grids and clouds.Next generation sequencing technologies have enabled sequencing many genomes. Because of the overall increasing demand and the inherent parallelism available in many required analyses, these bioinformatics applications should ideally run on clusters, clouds and/or grids. We present a modified annotation framework that achieves a speed-up of 45x using 50 workers using a Caenorhabditis japonica test case. We also evaluate these modifications within the Amazon EC2 cloud framework. The underlying genome annotation (MAKER) is parallelised as an MPI application. Our framework enables it to now run without MPI while utilising a wide variety of distributed computing resources. This parallel framework also allows easy explicit data transfer, which helps overcome a major limitation of bioinformatics tools that often rely on shared file systems. Combined, our proposed framework can be used, even during early stages of development, to easily run sequence analysis tools on clusters, grids and clouds. Next generation sequencing technologies have enabled sequencing many genomes. Because of the overall increasing demand and the inherent parallelism available in many required analyses, these bioinformatics applications should ideally run on clusters, clouds and/or grids. We present a modified annotation framework that achieves a speed-up of 45x using 50 workers using a Caenorhabditis japonica test case. We also evaluate these modifications within the Amazon EC2 cloud framework. The underlying genome annotation (MAKER) is parallelised as an MPI application. Our framework enables it to now run without MPI while utilising a wide variety of distributed computing resources. This parallel framework also allows easy explicit data transfer, which helps overcome a major limitation of bioinformatics tools that often rely on shared file systems. Combined, our proposed framework can be used, even during early stages of development, to easily run sequence analysis tools on clusters, grids and clouds. |
| Author | Kachmarck, Brian Musgrave, Zachary Thrasher, Andrew Thain, Douglas Emrich, Scott |
| Author_xml | – sequence: 1 givenname: Andrew surname: Thrasher fullname: Thrasher, Andrew organization: Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA – sequence: 2 givenname: Zachary surname: Musgrave fullname: Musgrave, Zachary organization: Yelp, Inc., San Francisco, CA, USA – sequence: 3 givenname: Brian surname: Kachmarck fullname: Kachmarck, Brian organization: Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA – sequence: 4 givenname: Douglas surname: Thain fullname: Thain, Douglas organization: Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA – sequence: 5 givenname: Scott surname: Emrich fullname: Emrich, Scott organization: Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/24989862$$D View this record in MEDLINE/PubMed |
| BookMark | eNo1jz1PwzAYhD0U0Q_4ASwoI0uK_fqNY4-lKlAoQiowR25sV4HELnEixL8niDKddPfoTjclIx-8JeSC0TnLKF6vH262izlQhnMqQCkckQnLEdMMZTYm0xjfKUUBmJ2SMaCSSgqYkPyl1HXl90l_SPbWh8Ym2vvQ6a4KPunjb_S0eFxtB9skX6H9SD5729szcuJ0He35UWfk7Xb1urxPN8936-Vik5acqy5VVPKcIjVSUWVydGAcc9JY5EKDLbkzJXVqp51wSoCWFDILku24A2TMwoxc_fUe2jAMx65oqljautbehj4WLEMOuRSKDejlEe13jTXFoa0a3X4X_2fhB3njVTQ |
| CitedBy_id | crossref_primary_10_1002_cpe_4683 crossref_primary_10_3389_fgene_2020_00876 crossref_primary_10_1016_j_dib_2020_106531 crossref_primary_10_1016_j_procs_2015_10_116 |
| ContentType | Journal Article |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1504/IJBRA.2014.062994 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Biology |
| ExternalDocumentID | 24989862 |
| Genre | Research Support, U.S. Gov't, Non-P.H.S Journal Article Research Support, N.I.H., Extramural |
| GroupedDBID | --- 0R~ 29J 4.4 53G 5GY ABJNI ACGFS ACIWK ACPRK AFRAH ALMA_UNASSIGNED_HOLDINGS ALSBL CGR CS3 CUY CVF DU5 EBS ECM EIF EJD F5P H13 HZ~ MET MIE NPM O9- P2P RTD 7X8 |
| ID | FETCH-LOGICAL-c339t-90837040d8909d74f2df1f8de436a2ec3fdc0f9baf6f962a8025e281b3f2411e2 |
| IEDL.DBID | 7X8 |
| ISSN | 1744-5485 |
| IngestDate | Wed Oct 01 14:07:46 EDT 2025 Thu Jan 02 23:01:15 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 4-5 |
| Keywords | Caenorhabditis japonica genome annotation grid computing next generation sequencing bioinformatics work queue explicit data transfer cloud computing clusters distributed computing |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c339t-90837040d8909d74f2df1f8de436a2ec3fdc0f9baf6f962a8025e281b3f2411e2 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| PMID | 24989862 |
| PQID | 1543278691 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_1543278691 pubmed_primary_24989862 |
| PublicationCentury | 2000 |
| PublicationDate | 2014-00-00 20140101 |
| PublicationDateYYYYMMDD | 2014-01-01 |
| PublicationDate_xml | – year: 2014 text: 2014-00-00 |
| PublicationDecade | 2010 |
| PublicationPlace | Switzerland |
| PublicationPlace_xml | – name: Switzerland |
| PublicationTitle | International journal of bioinformatics research and applications |
| PublicationTitleAlternate | Int J Bioinform Res Appl |
| PublicationYear | 2014 |
| SSID | ssj0046245 |
| Score | 1.9573143 |
| Snippet | Next generation sequencing technologies have enabled sequencing many genomes. Because of the overall increasing demand and the inherent parallelism available... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 447 |
| SubjectTerms | Algorithms Animals Anopheles - genetics Caenorhabditis - genetics Cluster Analysis Computational Biology - methods Computer Systems Genome High-Throughput Nucleotide Sequencing - methods Software Tsetse Flies - genetics |
| Title | Scaling up genome annotation using MAKER and work queue |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/24989862 https://www.proquest.com/docview/1543278691 |
| Volume | 10 |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEF7UKnjx_agvVvAaTXY3m92TVGnx1VLqg97CZh_iwaTaVvDfO5ukehIELzkEEpaZ2ZlvZj5mEDpRsdPGSA25CTMB04IHQkYsMI4QaSOuwnJOwdNd0uuJ4VD264LbuKZVznxi6ahNoX2N_AxCPSWJ4DI6H70FfmuU767WKzTmUYMClPGUrmT43UVgnJRLigF0swCQeVx3NeOQnV3fXAxantnFTkMOLpn9jjDLSNNZ_e8Z19BKjTFxqzKKdTRn8w20VG2d_NxEyT3oBSIWno6wH9H6arHK86LqyWNPhH_G3dZtewCvDfbELQyHmNot9NhpP1xeBfX-hEBTKieBDP1kGxYaIUNpEuaIcZETxjLKFbGaOqNDJzPluJOcKAH4xxLAsdRBXI8s2UYLeZHbXYS1o5RRwCYqU8xQqyz8QsVxloEFcK6a6HgmkRTs0zcdVG6L6Tj9kUkT7VRiTUfVII0UUj8hIaXa-8PX-2jZa6uqfhyghoPbaQ_Rov6YvIzfj0rFw7PX734B4oW2uQ |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Scaling+up+genome+annotation+using+MAKER+and+work+queue&rft.jtitle=International+journal+of+bioinformatics+research+and+applications&rft.au=Thrasher%2C+Andrew&rft.au=Musgrave%2C+Zachary&rft.au=Kachmarck%2C+Brian&rft.au=Thain%2C+Douglas&rft.date=2014-01-01&rft.issn=1744-5485&rft.volume=10&rft.issue=4-5&rft.spage=447&rft_id=info:doi/10.1504%2FIJBRA.2014.062994&rft_id=info%3Apmid%2F24989862&rft_id=info%3Apmid%2F24989862&rft.externalDocID=24989862 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1744-5485&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1744-5485&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1744-5485&client=summon |