From Scripted HPC-Based NGS Pipelines to Workflows on the Cloud
In this paper we describe our initial experiences in the Cloud-e-Genome project with moving the whole exome sequencing pipeline from the scripted HPC-based solution to a workflow enactment system running in the cloud. We discuss shortcomings of the existing approach based on scripts and list benefit...
Uložené v:
| Vydané v: | 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing s. 694 - 700 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
01.05.2014
|
| Predmet: | |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | In this paper we describe our initial experiences in the Cloud-e-Genome project with moving the whole exome sequencing pipeline from the scripted HPC-based solution to a workflow enactment system running in the cloud. We discuss shortcomings of the existing approach based on scripts and list benefits that a workflow-based solution can provide. Despite the effort it involved to wrap all required tools in the form of workflow blocks and the restrictions of the dataflow model used to represent workflows we expect the migration to significantly improve the current status of the pipeline. Our target is to enable flexibility, traceability and reproducibility of the solution, so that it can better fit the evolution of tools, data and pipeline itself and allow us to run it at national scale. This work will become foundation for the more complete system that includes variant filtering and interpretation for the diagnostic purposes. |
|---|---|
| AbstractList | In this paper we describe our initial experiences in the Cloud-e-Genome project with moving the whole exome sequencing pipeline from the scripted HPC-based solution to a workflow enactment system running in the cloud. We discuss shortcomings of the existing approach based on scripts and list benefits that a workflow-based solution can provide. Despite the effort it involved to wrap all required tools in the form of workflow blocks and the restrictions of the dataflow model used to represent workflows we expect the migration to significantly improve the current status of the pipeline. Our target is to enable flexibility, traceability and reproducibility of the solution, so that it can better fit the evolution of tools, data and pipeline itself and allow us to run it at national scale. This work will become foundation for the more complete system that includes variant filtering and interpretation for the diagnostic purposes. |
| Author | Cala, Jacek Wijaya, Eldarina Azfar Missier, Paolo Yaobo Xu |
| Author_xml | – sequence: 1 givenname: Jacek surname: Cala fullname: Cala, Jacek organization: Sch. of Comput. Sci., Newcastle Univ., Newcastle upon Tyne, UK – sequence: 2 surname: Yaobo Xu fullname: Yaobo Xu email: yaobo.xu@newcastle.ac.uk organization: Inst. of Genetic Med., Newcastle Univ., Newcastle upon Tyne, UK – sequence: 3 givenname: Eldarina Azfar surname: Wijaya fullname: Wijaya, Eldarina Azfar organization: Sch. of Comput. Sci., Newcastle Univ., Newcastle upon Tyne, UK – sequence: 4 givenname: Paolo surname: Missier fullname: Missier, Paolo organization: Sch. of Comput. Sci., Newcastle Univ., Newcastle upon Tyne, UK |
| BookMark | eNotzEFLwzAYgOEICrq5swcv-QOd-ZI0SU-iwXXC0MEUjyNtvmCwa0paEf_9Bnp639MzI-d96pGQG2BLAFbdWVvn6JecgVwCN2dkBlJXFddGmkuyGMfYMK60kgLMFblf5XSguzbHYUJP11tbPLrxdC_1jm7jgF3scaRToh8pf4Uu_Yw09XT6RGq79O2vyUVw3YiL_87J--rpza6LzWv9bB82heNST4X0iMiCDl5J7rRg6FtwUlSGAwuOew4BvSiD1p4zFxpVAmtagd6hkawVc3L758YTtB9yPLj8u1dGqpKDOALNxUjQ |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/CCGrid.2014.128 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 1479927848 9781479927845 |
| EndPage | 700 |
| ExternalDocumentID | 6846521 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL ALMA_UNASSIGNED_HOLDINGS CBEJK RIB RIC RIE RIL |
| ID | FETCH-LOGICAL-a247t-4deee0f7fd642a730edc1a4398210fa2d21fed35f77d20afb6510bc3edae840c3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 3 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000361021000087&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Dec 20 05:20:10 EST 2023 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a247t-4deee0f7fd642a730edc1a4398210fa2d21fed35f77d20afb6510bc3edae840c3 |
| PageCount | 7 |
| ParticipantIDs | ieee_primary_6846521 |
| PublicationCentury | 2000 |
| PublicationDate | 2014-May |
| PublicationDateYYYYMMDD | 2014-05-01 |
| PublicationDate_xml | – month: 05 year: 2014 text: 2014-May |
| PublicationDecade | 2010 |
| PublicationTitle | 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing |
| PublicationTitleAbbrev | ccgrid |
| PublicationYear | 2014 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib026764318 |
| Score | 1.5887394 |
| Snippet | In this paper we describe our initial experiences in the Cloud-e-Genome project with moving the whole exome sequencing pipeline from the scripted HPC-based... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 694 |
| SubjectTerms | Bioinformatics Cloud computing Computational modeling Genomics Libraries Pipelines |
| Title | From Scripted HPC-Based NGS Pipelines to Workflows on the Cloud |
| URI | https://ieeexplore.ieee.org/document/6846521 |
| WOSCitedRecordID | wos000361021000087&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVKxcAEqEV8ywMjbm3HidMJiYi2UxWpIHWrHPssVSpxlabw97GTUhhY2CwP_j49n33vHkIPkGpBrTaEgRZEeIAhaZIURChqi9iKwtJWbELOZuliMco76PHAhQGAJvgMBqHY_OUbp3fhqWyYeLCMA2v8SErZcrW-zw5PpMdWlu6z9zA6GmbZpFqFZKBMDFhQW_8ln9Kgx_j0f_2eof4PDQ_nB4A5Rx0oe-hpXLl3PG-MHQye5hl59khk8Gwyx_lqE_jlsMW1w-Ed3K7d5xa7Evt7Hs7Wbmf66G388ppNyV4FgSguZE2E8eOhVlrjXQXlDRKMZsrfI1LvrVnFDWcWTBRbKQ2nyhaJN7NCR2AUeO9NRxeoW7oSLhE2kkcmZKSPpRaap0WQKfKNUxgxrVRyhXph8stNm-hiuZ_39d_VN-gkrG0b_XeLunW1gzt0rD_q1ba6b3bnCyKBkW4 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFG4ImuhJDRh_24NHC2vXreNk4iJgxGUJmHAjXfuakOBGYOi_b7shevDiremhv1--vvZ970PoDiLFPaM0oaA44RZgSBSGGeHSM1lgeGa8WmxCJEk0nfbSBrrfcWEAoAo-g44rVn_5ulAb91TWDS1YBo41vhdwzmjN1vo-PSwUFl1ptM3fQ71eN44Hq7lLB0p5hzq99V8CKhV-9I_-1_Mxav8Q8XC6g5gT1IC8hR76q-IdjytzB42HaUweLRZpnAzGOJ0vHcMc1rgssHsJN4vic42LHNubHo4XxUa30Vv_aRIPyVYHgUjGRUm4tuPxjDDaOgvSmiRoRaW9SUTWXzOSaUYNaD8wQmjmSZOF1tAy5YOWYP035Z-iZl7kcIawFszXLid9IBRXLMqcUJFt3IMeVVKG56jlJj9b1qkuZtt5X_xdfYsOhpPX0Wz0nLxcokO3znUs4BVqlqsNXKN99VHO16ubaqe-AOVUlLU |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2014+14th+IEEE%2FACM+International+Symposium+on+Cluster%2C+Cloud+and+Grid+Computing&rft.atitle=From+Scripted+HPC-Based+NGS+Pipelines+to+Workflows+on+the+Cloud&rft.au=Cala%2C+Jacek&rft.au=Yaobo+Xu&rft.au=Wijaya%2C+Eldarina+Azfar&rft.au=Missier%2C+Paolo&rft.date=2014-05-01&rft.pub=IEEE&rft.spage=694&rft.epage=700&rft_id=info:doi/10.1109%2FCCGrid.2014.128&rft.externalDocID=6846521 |