mSNP: A Massively Parallel Algorithm for Large-Scale SNP Detection
Single Nucleotide Polymorphism (SNP) detection is a fundamental procedure of whole genome analysis. SOAPsnp, a classic tool for detection, would take more than one week to analyze one typical human genome, which limits the efficiency of downstream analyses. In this paper, we present mSNP, an optimiz...
Uloženo v:
| Vydáno v: | IEEE transactions on parallel and distributed systems Ročník 29; číslo 11; s. 2557 - 2567 |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
IEEE
01.11.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 1045-9219, 1558-2183 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Single Nucleotide Polymorphism (SNP) detection is a fundamental procedure of whole genome analysis. SOAPsnp, a classic tool for detection, would take more than one week to analyze one typical human genome, which limits the efficiency of downstream analyses. In this paper, we present mSNP, an optimized version of SOAPsnp, which leverages Intel Xeon Phi coprocessors for large-scale SNP detection. Firstly, we redesigned the essential data structures of SOAPsnp, which significantly reduces memory footprint and improves computing efficiency. Then we developed a coordinated parallel framework for a higher hardware utilization of both CPU and Xeon Phi. Also, we tailored the data structures and operations to utilize the wide VPU of Xeon Phi to improve data throughput. Last but not the least, we proposeed a read-based window division strategy to improve throughput and obtain better load balance. mSNP is the first SNP detection tool empowered by Xeon Phi. We achieved a 38x single thread speedup on CPU, without any loss in precision. Moreover, mSNP successfully scaled to 4,096 nodes on Tianhe-2. Our experiments demonstrate that mSNP is efficient and scalable for large-scale human genome SNP detection. |
|---|---|
| AbstractList | Single Nucleotide Polymorphism (SNP) detection is a fundamental procedure of whole genome analysis. SOAPsnp, a classic tool for detection, would take more than one week to analyze one typical human genome, which limits the efficiency of downstream analyses. In this paper, we present mSNP, an optimized version of SOAPsnp, which leverages Intel Xeon Phi coprocessors for large-scale SNP detection. Firstly, we redesigned the essential data structures of SOAPsnp, which significantly reduces memory footprint and improves computing efficiency. Then we developed a coordinated parallel framework for a higher hardware utilization of both CPU and Xeon Phi. Also, we tailored the data structures and operations to utilize the wide VPU of Xeon Phi to improve data throughput. Last but not the least, we proposeed a read-based window division strategy to improve throughput and obtain better load balance. mSNP is the first SNP detection tool empowered by Xeon Phi. We achieved a 38x single thread speedup on CPU, without any loss in precision. Moreover, mSNP successfully scaled to 4,096 nodes on Tianhe-2. Our experiments demonstrate that mSNP is efficient and scalable for large-scale human genome SNP detection. Single Nucleotide Polymorphism (SNP) detection is a fundamental procedure of whole genome analysis. SOAPsnp, a classic tool for detection, would take more than one week to analyze one typical human genome, which limits the efficiency of downstream analyses. In this paper, we present mSNP, an optimized version of SOAPsnp, which leverages Intel Xeon Phi coprocessors for large-scale SNP detection. Firstly, we redesigned the essential data structures of SOAPsnp, which significantly reduces memory footprint and improves computing efficiency. Then we developed a coordinated parallel framework for a higher hardware utilization of both CPU and Xeon Phi. Also, we tailored the data structures and operations to utilize the wide VPU of Xeon Phi to improve data throughput. Last but not the least, we proposed a read-based window division strategy to improve throughput and obtain better load balance. mSNP is the first SNP detection tool empowered by Xeon Phi. We achieved a 38x single thread speedup on CPU, without any loss in precision. Moreover, mSNP successfully scaled to 4,096 nodes on Tianhe-2. Our experiments demonstrate that mSNP is efficient and scalable for large-scale human genome SNP detection. |
| Author | Peng, Shaoliang Liao, Xiangke Cui, Yingbo Zhu, Xiaoqian Lu, Yutong Wang, Bingqiang Wu, Chengkun |
| Author_xml | – sequence: 1 givenname: Yingbo orcidid: 0000-0003-4000-4957 surname: Cui fullname: Cui, Yingbo email: yingbocui@nudt.edu.cn organization: College of Computer, National University of Defense Technology, Changsha, China – sequence: 2 givenname: Shaoliang orcidid: 0000-0002-4647-2615 surname: Peng fullname: Peng, Shaoliang email: pengshaoliang@nudt.edu.cn organization: College of Computer Science and Electronic Engineering & National Supercomputing Centre in Changsha, Hunan University, Changsha, China – sequence: 3 givenname: Yutong surname: Lu fullname: Lu, Yutong email: yutong.lu@nscc-gz.cn organization: National Supercomputer Center in Guangzhou, Guangzhou, Guangdong, China – sequence: 4 givenname: Xiaoqian surname: Zhu fullname: Zhu, Xiaoqian email: xiaoqianzhu@nudt.edu.cn organization: College of Computer, National University of Defense Technology, Changsha, China – sequence: 5 givenname: Bingqiang surname: Wang fullname: Wang, Bingqiang email: bingqiang.wang@nscc-gz.cn organization: National Supercomputer Center in Guangzhou, Guangzhou, Guangdong, China – sequence: 6 givenname: Chengkun surname: Wu fullname: Wu, Chengkun email: chengkun_wu@nudt.edu.cn organization: College of Computer, National University of Defense Technology, Changsha, China – sequence: 7 givenname: Xiangke orcidid: 0000-0002-6125-3330 surname: Liao fullname: Liao, Xiangke email: xkliao@nudt.edu.cn organization: College of Computer, National University of Defense Technology, Changsha, China |
| BookMark | eNp9kE1LAzEQhoMo2FZ_gHgJeN6aZDebxFtt_YKqhdZzyKazdUu6W5NU6L93lxYPHoTABOZ9Zoanj07rpgaErigZUkrU7WI2mQ8ZoXLIZKq4kCeoRzmXCaMyPW3_JOOJYlSdo34Ia0JoxknWQ_eb-dvsDo_wqwmh-ga3xzPjjXPg8MitGl_Fzw0uG4-nxq8gmVvjALcMnkAEG6umvkBnpXEBLo91gD4eHxbj52T6_vQyHk0Ty1QaE2FTyI0SBS05L7gQeVFYKpeKgwFDZSEJkyXjihmxZGA5KURZ5oQoyYuUq3SAbg5zt7752kGIet3sfN2u1IyyjCrevjYlDinrmxA8lNpW0XR3Rm8qpynRnTDdCdOdMH0U1pL0D7n11cb4_b_M9YGpAOA3L9Oc5W33B9Andpg |
| CODEN | ITDSEO |
| CitedBy_id | crossref_primary_10_1038_s41598_020_63842_7 crossref_primary_10_1109_TPDS_2025_3581972 crossref_primary_10_1093_bib_bbab070 crossref_primary_10_1109_ACCESS_2019_2938765 |
| Cites_doi | 10.1093/nar/gkp1137 10.1093/bioinformatics/btp352 10.1186/1471-2164-14-425 10.1109/MSST.2010.5496972 10.1109/ICPP.2011.51 10.1145/989393.989401 10.1109/ICPP.2015.92 10.1101/gr.129684.111 10.1109/ccgrid.2014.111 10.1093/bioinformatics/btp336 10.1093/nar/gkr599 10.1186/1471-2156-9-23 10.1101/gr.088013.108 10.1038/ng.806 10.1186/gb-2009-10-11-r134 10.1093/bioinformatics/btp373 10.1093/bioinformatics/btt314 10.1145/1345206.1345220 10.1109/ipdps.2013.44 10.1007/978-3-319-07581-5_14 10.1186/1471-2105-12-134 10.1002/cpe.3161 10.1007/978-3-319-20119-1_6 10.1186/1471-2105-7-438 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/TPDS.2018.2839578 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE/IET Electronic Library (IEL) (UW System Shared) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) (UW System Shared) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 1558-2183 |
| EndPage | 2567 |
| ExternalDocumentID | 10_1109_TPDS_2018_2839578 8362678 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: State Key Laboratory of Chemo/Biosensing and Chemometrics – fundername: NSFC grantid: 61772543; U1435222; 61625202; 61272056 – fundername: Fundamental Research Funds for the Central Universities; and Guangdong Provincial Department of Science and Technology grantid: 2016B090918122 – fundername: National Key R&D Program of China grantid: 2017YFB0202602; 2017YFC1311003; 2016YFC1302500; 2016YFB0200400; 2017YFB0202104 |
| GroupedDBID | --Z -~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACIWK AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS TN5 TWZ UHB AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D RIG |
| ID | FETCH-LOGICAL-c293t-7c3e6a97b1f55b5776bbc18d95eaea18b8028f2592a7d2ec50b7ff600985b3593 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 4 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000447046200012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1045-9219 |
| IngestDate | Mon Jun 30 04:09:50 EDT 2025 Sat Nov 29 06:06:46 EST 2025 Tue Nov 18 22:31:03 EST 2025 Wed Aug 27 02:52:46 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 11 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c293t-7c3e6a97b1f55b5776bbc18d95eaea18b8028f2592a7d2ec50b7ff600985b3593 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-4647-2615 0000-0003-4000-4957 0000-0002-6125-3330 |
| PQID | 2124195195 |
| PQPubID | 85437 |
| PageCount | 11 |
| ParticipantIDs | crossref_citationtrail_10_1109_TPDS_2018_2839578 proquest_journals_2124195195 ieee_primary_8362678 crossref_primary_10_1109_TPDS_2018_2839578 |
| PublicationCentury | 2000 |
| PublicationDate | 2018-11-01 |
| PublicationDateYYYYMMDD | 2018-11-01 |
| PublicationDate_xml | – month: 11 year: 2018 text: 2018-11-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on parallel and distributed systems |
| PublicationTitleAbbrev | TPDS |
| PublicationYear | 2018 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref12 ref15 Jeffers (ref14) 2013 ref31 Misra (ref24) 2014 ref30 ref11 ref33 ref10 ref32 ref2 ref1 ref19 (ref16) 2014; 8 Xing (ref20) 2013 ref23 ref26 ref25 (ref3) 1999; 9 ref28 ref27 ref29 ref8 ref7 ref9 ref4 Jongsoo (ref21) 2013 ref6 ref5 Alexander (ref22) 2013 |
| References_xml | – ident: ref33 doi: 10.1093/nar/gkp1137 – ident: ref7 doi: 10.1093/bioinformatics/btp352 – ident: ref11 doi: 10.1186/1471-2164-14-425 – volume: 8 start-page: 345 issue: 3 volume-title: Frontiers Comput. Sci. year: 2014 ident: ref16 article-title: MilkyWay-2 supercomputer: System and application – ident: ref10 doi: 10.1109/MSST.2010.5496972 – ident: ref12 doi: 10.1109/ICPP.2011.51 – ident: ref25 doi: 10.1145/989393.989401 – start-page: 126 year: 2013 ident: ref22 article-title: Design and implementation of the Linpack benchmark for single and multi-node systems based on Intel Xeon Phi coprocessor publication-title: Proc. IEEE 27th Int. Symp. Parallel Distrib. Process. – ident: ref32 doi: 10.1109/ICPP.2015.92 – ident: ref30 doi: 10.1101/gr.129684.111 – ident: ref13 doi: 10.1109/ccgrid.2014.111 – volume: 9 start-page: 677 issue: 8 volume-title: Genome Res. year: 1999 ident: ref3 article-title: dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation – ident: ref31 doi: 10.1093/bioinformatics/btp336 – start-page: 241 volume-title: Proc. Parallel Distrib. Process. Symp. IEEE 28th Int. year: 2014 ident: ref24 article-title: Parallel mutual information based construction of whole-genome networks on the Intel Xeon PhiTM coprocessor – ident: ref28 doi: 10.1093/nar/gkr599 – year: 2013 ident: ref14 publication-title: Intel Xeon Phi Coprocessor High Performance Programming – ident: ref4 doi: 10.1186/1471-2156-9-23 – ident: ref1 doi: 10.1101/gr.088013.108 – ident: ref2 doi: 10.1038/ng.806 – ident: ref9 doi: 10.1186/gb-2009-10-11-r134 – year: 2013 ident: ref21 article-title: Tera-scale 1D FFT with low-communication algorithm and Intel Xeon Phi coprocessors publication-title: Proc. SC13: Int. Conf. High Perform. Comput. Netw. Storage Anal. – ident: ref29 doi: 10.1093/bioinformatics/btp373 – ident: ref8 doi: 10.1093/bioinformatics/btt314 – start-page: 273 year: 2013 ident: ref20 article-title: Efficient sparse matrix-vector multiplication on x86-based many-core processors publication-title: Proc. 27th Int. ACM Conf. Int. Conf. Supercomputing – ident: ref26 doi: 10.1145/1345206.1345220 – ident: ref23 doi: 10.1109/ipdps.2013.44 – ident: ref15 doi: 10.1007/978-3-319-07581-5_14 – ident: ref5 doi: 10.1186/1471-2105-12-134 – ident: ref19 doi: 10.1002/cpe.3161 – ident: ref27 doi: 10.1007/978-3-319-20119-1_6 – ident: ref6 doi: 10.1186/1471-2105-7-438 |
| SSID | ssj0014504 |
| Score | 2.267667 |
| Snippet | Single Nucleotide Polymorphism (SNP) detection is a fundamental procedure of whole genome analysis. SOAPsnp, a classic tool for detection, would take more than... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 2557 |
| SubjectTerms | Bioinformatics Computer memory Computing time Coprocessors Data structures Genomes Genomics Graphics processing units MIC Microsoft Windows Polymorphism SNP detection SOAPsnp sparse matrix vectorization Xeon Phi |
| Title | mSNP: A Massively Parallel Algorithm for Large-Scale SNP Detection |
| URI | https://ieeexplore.ieee.org/document/8362678 https://www.proquest.com/docview/2124195195 |
| Volume | 29 |
| WOSCitedRecordID | wos000447046200012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE/IET Electronic Library (IEL) (UW System Shared) customDbUrl: eissn: 1558-2183 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014504 issn: 1045-9219 databaseCode: RIE dateStart: 19900101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFH5B4kEPoqARRdODJ-Ngv7q23lAkHpAsARNuy9p1SsIPA8PE_962GwuJxsTbDu3b0tf2e299_T6Am8B2E4VK2PKkRyzfdZkVJ6lnKSSmKXGxdHKSpAEZDulkwsIK3JV3YaSUpvhMtvWjOctPlmKjf5V1qOZOIXQP9ggh-V2t8sTAx0YqUGUXWL3IYcUJpmOzzjjsjXQRF20rLGVYK6rtYJARVfmxExt46df-92HHcFSEkaib-_0EKnJRh9pWogEVK7YOhzt8gw14mI-G4T3qohcVMqttbvaFwnil1VSUqdnbcjXN3udIhbFooAvErZFyoESqD-rJzNRsLU7htf80fny2ChEFSygkzywiPBnEjHAnxZhjQgLOhUMThmUsY4dyqiKMVCVBbkwSVwpsc5KmgeYZxdzDzDuD6mK5kOeAXEF4Im2apAn2_VjQwFEGKeeGx575TbC3wxqJgmFcC13MIpNp2CzSnoi0J6LCE024Lbt85PQafzVu6KEvGxaj3oTW1ndRsQDXkUJk32GaOufi916XcKBt59cKW1DNVht5BfviM5uuV9dmbn0D0SrI8A |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwGP3QKagP3sV5zYNPYrVpmybxzduYOMdgE3wrTZqqsIvMKvjv_ZJ2Q1AE3_qQS8lpcr40X84BOIr9IENWYl5oQu5FQSC9NMtDD5lY5DxghpYiSS3ebovHR9mZgZPpXRhjjEs-M6f20Z3lZyP9bn-VnQmrncLFLMyxKApoeVtremYQMWcWiPsLhl1RWZ1hUl-e9TrXXZvGJU6RTSWznmrfWMjZqvxYix3BNFb-92qrsFwFkuSiRH4NZsxwHVYmJg2kmrPrsPRNcXADLgfdduecXJB7DJpxoet_kk46tn4q2FT_aTR-KZ4HBANZ0rIp4l4XITQE65BrU7isreEmPDRueldNr7JR8DRyeeFxHZo4lVzRnDHFOI-V0lRkkpnUpFQogTFGjtugIOVZYDTzFc_z2CqNMhUyGW5BbTgamm0ggeYqM77I8gwRSLWIKTYolHJK9jKqgz8Z1kRXGuPW6qKfuL2GLxOLRGKRSCok6nA8rfJaCmz8VXjDDv20YDXqddibYJdUU_AtQU6OqLTiOTu_1zqEhWbvvpW0btt3u7Bo-ykvGe5BrRi_m32Y1x_Fy9v4wH1nX3EDzDc |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=mSNP%3A+A+Massively+Parallel+Algorithm+for+Large-Scale+SNP+Detection&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Cui%2C+Yingbo&rft.au=Peng%2C+Shaoliang&rft.au=Lu%2C+Yutong&rft.au=Zhu%2C+Xiaoqian&rft.date=2018-11-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1045-9219&rft.eissn=1558-2183&rft.volume=29&rft.issue=11&rft.spage=2557&rft_id=info:doi/10.1109%2FTPDS.2018.2839578&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon |