mSNP: A Massively Parallel Algorithm for Large-Scale SNP Detection

Single Nucleotide Polymorphism (SNP) detection is a fundamental procedure of whole genome analysis. SOAPsnp, a classic tool for detection, would take more than one week to analyze one typical human genome, which limits the efficiency of downstream analyses. In this paper, we present mSNP, an optimiz...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on parallel and distributed systems Ročník 29; číslo 11; s. 2557 - 2567
Hlavní autoři: Cui, Yingbo, Peng, Shaoliang, Lu, Yutong, Zhu, Xiaoqian, Wang, Bingqiang, Wu, Chengkun, Liao, Xiangke
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.11.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1045-9219, 1558-2183
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Single Nucleotide Polymorphism (SNP) detection is a fundamental procedure of whole genome analysis. SOAPsnp, a classic tool for detection, would take more than one week to analyze one typical human genome, which limits the efficiency of downstream analyses. In this paper, we present mSNP, an optimized version of SOAPsnp, which leverages Intel Xeon Phi coprocessors for large-scale SNP detection. Firstly, we redesigned the essential data structures of SOAPsnp, which significantly reduces memory footprint and improves computing efficiency. Then we developed a coordinated parallel framework for a higher hardware utilization of both CPU and Xeon Phi. Also, we tailored the data structures and operations to utilize the wide VPU of Xeon Phi to improve data throughput. Last but not the least, we proposeed a read-based window division strategy to improve throughput and obtain better load balance. mSNP is the first SNP detection tool empowered by Xeon Phi. We achieved a 38x single thread speedup on CPU, without any loss in precision. Moreover, mSNP successfully scaled to 4,096 nodes on Tianhe-2. Our experiments demonstrate that mSNP is efficient and scalable for large-scale human genome SNP detection.
AbstractList Single Nucleotide Polymorphism (SNP) detection is a fundamental procedure of whole genome analysis. SOAPsnp, a classic tool for detection, would take more than one week to analyze one typical human genome, which limits the efficiency of downstream analyses. In this paper, we present mSNP, an optimized version of SOAPsnp, which leverages Intel Xeon Phi coprocessors for large-scale SNP detection. Firstly, we redesigned the essential data structures of SOAPsnp, which significantly reduces memory footprint and improves computing efficiency. Then we developed a coordinated parallel framework for a higher hardware utilization of both CPU and Xeon Phi. Also, we tailored the data structures and operations to utilize the wide VPU of Xeon Phi to improve data throughput. Last but not the least, we proposeed a read-based window division strategy to improve throughput and obtain better load balance. mSNP is the first SNP detection tool empowered by Xeon Phi. We achieved a 38x single thread speedup on CPU, without any loss in precision. Moreover, mSNP successfully scaled to 4,096 nodes on Tianhe-2. Our experiments demonstrate that mSNP is efficient and scalable for large-scale human genome SNP detection.
Single Nucleotide Polymorphism (SNP) detection is a fundamental procedure of whole genome analysis. SOAPsnp, a classic tool for detection, would take more than one week to analyze one typical human genome, which limits the efficiency of downstream analyses. In this paper, we present mSNP, an optimized version of SOAPsnp, which leverages Intel Xeon Phi coprocessors for large-scale SNP detection. Firstly, we redesigned the essential data structures of SOAPsnp, which significantly reduces memory footprint and improves computing efficiency. Then we developed a coordinated parallel framework for a higher hardware utilization of both CPU and Xeon Phi. Also, we tailored the data structures and operations to utilize the wide VPU of Xeon Phi to improve data throughput. Last but not the least, we proposed a read-based window division strategy to improve throughput and obtain better load balance. mSNP is the first SNP detection tool empowered by Xeon Phi. We achieved a 38x single thread speedup on CPU, without any loss in precision. Moreover, mSNP successfully scaled to 4,096 nodes on Tianhe-2. Our experiments demonstrate that mSNP is efficient and scalable for large-scale human genome SNP detection.
Author Peng, Shaoliang
Liao, Xiangke
Cui, Yingbo
Zhu, Xiaoqian
Lu, Yutong
Wang, Bingqiang
Wu, Chengkun
Author_xml – sequence: 1
  givenname: Yingbo
  orcidid: 0000-0003-4000-4957
  surname: Cui
  fullname: Cui, Yingbo
  email: yingbocui@nudt.edu.cn
  organization: College of Computer, National University of Defense Technology, Changsha, China
– sequence: 2
  givenname: Shaoliang
  orcidid: 0000-0002-4647-2615
  surname: Peng
  fullname: Peng, Shaoliang
  email: pengshaoliang@nudt.edu.cn
  organization: College of Computer Science and Electronic Engineering & National Supercomputing Centre in Changsha, Hunan University, Changsha, China
– sequence: 3
  givenname: Yutong
  surname: Lu
  fullname: Lu, Yutong
  email: yutong.lu@nscc-gz.cn
  organization: National Supercomputer Center in Guangzhou, Guangzhou, Guangdong, China
– sequence: 4
  givenname: Xiaoqian
  surname: Zhu
  fullname: Zhu, Xiaoqian
  email: xiaoqianzhu@nudt.edu.cn
  organization: College of Computer, National University of Defense Technology, Changsha, China
– sequence: 5
  givenname: Bingqiang
  surname: Wang
  fullname: Wang, Bingqiang
  email: bingqiang.wang@nscc-gz.cn
  organization: National Supercomputer Center in Guangzhou, Guangzhou, Guangdong, China
– sequence: 6
  givenname: Chengkun
  surname: Wu
  fullname: Wu, Chengkun
  email: chengkun_wu@nudt.edu.cn
  organization: College of Computer, National University of Defense Technology, Changsha, China
– sequence: 7
  givenname: Xiangke
  orcidid: 0000-0002-6125-3330
  surname: Liao
  fullname: Liao, Xiangke
  email: xkliao@nudt.edu.cn
  organization: College of Computer, National University of Defense Technology, Changsha, China
BookMark eNp9kE1LAzEQhoMo2FZ_gHgJeN6aZDebxFtt_YKqhdZzyKazdUu6W5NU6L93lxYPHoTABOZ9Zoanj07rpgaErigZUkrU7WI2mQ8ZoXLIZKq4kCeoRzmXCaMyPW3_JOOJYlSdo34Ia0JoxknWQ_eb-dvsDo_wqwmh-ga3xzPjjXPg8MitGl_Fzw0uG4-nxq8gmVvjALcMnkAEG6umvkBnpXEBLo91gD4eHxbj52T6_vQyHk0Ty1QaE2FTyI0SBS05L7gQeVFYKpeKgwFDZSEJkyXjihmxZGA5KURZ5oQoyYuUq3SAbg5zt7752kGIet3sfN2u1IyyjCrevjYlDinrmxA8lNpW0XR3Rm8qpynRnTDdCdOdMH0U1pL0D7n11cb4_b_M9YGpAOA3L9Oc5W33B9Andpg
CODEN ITDSEO
CitedBy_id crossref_primary_10_1038_s41598_020_63842_7
crossref_primary_10_1109_TPDS_2025_3581972
crossref_primary_10_1093_bib_bbab070
crossref_primary_10_1109_ACCESS_2019_2938765
Cites_doi 10.1093/nar/gkp1137
10.1093/bioinformatics/btp352
10.1186/1471-2164-14-425
10.1109/MSST.2010.5496972
10.1109/ICPP.2011.51
10.1145/989393.989401
10.1109/ICPP.2015.92
10.1101/gr.129684.111
10.1109/ccgrid.2014.111
10.1093/bioinformatics/btp336
10.1093/nar/gkr599
10.1186/1471-2156-9-23
10.1101/gr.088013.108
10.1038/ng.806
10.1186/gb-2009-10-11-r134
10.1093/bioinformatics/btp373
10.1093/bioinformatics/btt314
10.1145/1345206.1345220
10.1109/ipdps.2013.44
10.1007/978-3-319-07581-5_14
10.1186/1471-2105-12-134
10.1002/cpe.3161
10.1007/978-3-319-20119-1_6
10.1186/1471-2105-7-438
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TPDS.2018.2839578
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE/IET Electronic Library (IEL) (UW System Shared)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL) (UW System Shared)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1558-2183
EndPage 2567
ExternalDocumentID 10_1109_TPDS_2018_2839578
8362678
Genre orig-research
GrantInformation_xml – fundername: State Key Laboratory of Chemo/Biosensing and Chemometrics
– fundername: NSFC
  grantid: 61772543; U1435222; 61625202; 61272056
– fundername: Fundamental Research Funds for the Central Universities; and Guangdong Provincial Department of Science and Technology
  grantid: 2016B090918122
– fundername: National Key R&D Program of China
  grantid: 2017YFB0202602; 2017YFC1311003; 2016YFC1302500; 2016YFB0200400; 2017YFB0202104
GroupedDBID --Z
-~X
.DC
0R~
29I
4.4
5GY
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
IEDLZ
IFIPE
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
TN5
TWZ
UHB
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
RIG
ID FETCH-LOGICAL-c293t-7c3e6a97b1f55b5776bbc18d95eaea18b8028f2592a7d2ec50b7ff600985b3593
IEDL.DBID RIE
ISICitedReferencesCount 4
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000447046200012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1045-9219
IngestDate Mon Jun 30 04:09:50 EDT 2025
Sat Nov 29 06:06:46 EST 2025
Tue Nov 18 22:31:03 EST 2025
Wed Aug 27 02:52:46 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 11
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c293t-7c3e6a97b1f55b5776bbc18d95eaea18b8028f2592a7d2ec50b7ff600985b3593
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-4647-2615
0000-0003-4000-4957
0000-0002-6125-3330
PQID 2124195195
PQPubID 85437
PageCount 11
ParticipantIDs crossref_citationtrail_10_1109_TPDS_2018_2839578
proquest_journals_2124195195
ieee_primary_8362678
crossref_primary_10_1109_TPDS_2018_2839578
PublicationCentury 2000
PublicationDate 2018-11-01
PublicationDateYYYYMMDD 2018-11-01
PublicationDate_xml – month: 11
  year: 2018
  text: 2018-11-01
  day: 01
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on parallel and distributed systems
PublicationTitleAbbrev TPDS
PublicationYear 2018
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
Jeffers (ref14) 2013
ref31
Misra (ref24) 2014
ref30
ref11
ref33
ref10
ref32
ref2
ref1
ref19
(ref16) 2014; 8
Xing (ref20) 2013
ref23
ref26
ref25
(ref3) 1999; 9
ref28
ref27
ref29
ref8
ref7
ref9
ref4
Jongsoo (ref21) 2013
ref6
ref5
Alexander (ref22) 2013
References_xml – ident: ref33
  doi: 10.1093/nar/gkp1137
– ident: ref7
  doi: 10.1093/bioinformatics/btp352
– ident: ref11
  doi: 10.1186/1471-2164-14-425
– volume: 8
  start-page: 345
  issue: 3
  volume-title: Frontiers Comput. Sci.
  year: 2014
  ident: ref16
  article-title: MilkyWay-2 supercomputer: System and application
– ident: ref10
  doi: 10.1109/MSST.2010.5496972
– ident: ref12
  doi: 10.1109/ICPP.2011.51
– ident: ref25
  doi: 10.1145/989393.989401
– start-page: 126
  year: 2013
  ident: ref22
  article-title: Design and implementation of the Linpack benchmark for single and multi-node systems based on Intel Xeon Phi coprocessor
  publication-title: Proc. IEEE 27th Int. Symp. Parallel Distrib. Process.
– ident: ref32
  doi: 10.1109/ICPP.2015.92
– ident: ref30
  doi: 10.1101/gr.129684.111
– ident: ref13
  doi: 10.1109/ccgrid.2014.111
– volume: 9
  start-page: 677
  issue: 8
  volume-title: Genome Res.
  year: 1999
  ident: ref3
  article-title: dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation
– ident: ref31
  doi: 10.1093/bioinformatics/btp336
– start-page: 241
  volume-title: Proc. Parallel Distrib. Process. Symp. IEEE 28th Int.
  year: 2014
  ident: ref24
  article-title: Parallel mutual information based construction of whole-genome networks on the Intel Xeon PhiTM coprocessor
– ident: ref28
  doi: 10.1093/nar/gkr599
– year: 2013
  ident: ref14
  publication-title: Intel Xeon Phi Coprocessor High Performance Programming
– ident: ref4
  doi: 10.1186/1471-2156-9-23
– ident: ref1
  doi: 10.1101/gr.088013.108
– ident: ref2
  doi: 10.1038/ng.806
– ident: ref9
  doi: 10.1186/gb-2009-10-11-r134
– year: 2013
  ident: ref21
  article-title: Tera-scale 1D FFT with low-communication algorithm and Intel Xeon Phi coprocessors
  publication-title: Proc. SC13: Int. Conf. High Perform. Comput. Netw. Storage Anal.
– ident: ref29
  doi: 10.1093/bioinformatics/btp373
– ident: ref8
  doi: 10.1093/bioinformatics/btt314
– start-page: 273
  year: 2013
  ident: ref20
  article-title: Efficient sparse matrix-vector multiplication on x86-based many-core processors
  publication-title: Proc. 27th Int. ACM Conf. Int. Conf. Supercomputing
– ident: ref26
  doi: 10.1145/1345206.1345220
– ident: ref23
  doi: 10.1109/ipdps.2013.44
– ident: ref15
  doi: 10.1007/978-3-319-07581-5_14
– ident: ref5
  doi: 10.1186/1471-2105-12-134
– ident: ref19
  doi: 10.1002/cpe.3161
– ident: ref27
  doi: 10.1007/978-3-319-20119-1_6
– ident: ref6
  doi: 10.1186/1471-2105-7-438
SSID ssj0014504
Score 2.267667
Snippet Single Nucleotide Polymorphism (SNP) detection is a fundamental procedure of whole genome analysis. SOAPsnp, a classic tool for detection, would take more than...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 2557
SubjectTerms Bioinformatics
Computer memory
Computing time
Coprocessors
Data structures
Genomes
Genomics
Graphics processing units
MIC
Microsoft Windows
Polymorphism
SNP detection
SOAPsnp
sparse matrix
vectorization
Xeon Phi
Title mSNP: A Massively Parallel Algorithm for Large-Scale SNP Detection
URI https://ieeexplore.ieee.org/document/8362678
https://www.proquest.com/docview/2124195195
Volume 29
WOSCitedRecordID wos000447046200012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE/IET Electronic Library (IEL) (UW System Shared)
  customDbUrl:
  eissn: 1558-2183
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014504
  issn: 1045-9219
  databaseCode: RIE
  dateStart: 19900101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFH5B4kEPoqARRdODJ-Ngv7q23lAkHpAsARNuy9p1SsIPA8PE_962GwuJxsTbDu3b0tf2e299_T6Am8B2E4VK2PKkRyzfdZkVJ6lnKSSmKXGxdHKSpAEZDulkwsIK3JV3YaSUpvhMtvWjOctPlmKjf5V1qOZOIXQP9ggh-V2t8sTAx0YqUGUXWL3IYcUJpmOzzjjsjXQRF20rLGVYK6rtYJARVfmxExt46df-92HHcFSEkaib-_0EKnJRh9pWogEVK7YOhzt8gw14mI-G4T3qohcVMqttbvaFwnil1VSUqdnbcjXN3udIhbFooAvErZFyoESqD-rJzNRsLU7htf80fny2ChEFSygkzywiPBnEjHAnxZhjQgLOhUMThmUsY4dyqiKMVCVBbkwSVwpsc5KmgeYZxdzDzDuD6mK5kOeAXEF4Im2apAn2_VjQwFEGKeeGx575TbC3wxqJgmFcC13MIpNp2CzSnoi0J6LCE024Lbt85PQafzVu6KEvGxaj3oTW1ndRsQDXkUJk32GaOufi916XcKBt59cKW1DNVht5BfviM5uuV9dmbn0D0SrI8A
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwGP3QKagP3sV5zYNPYrVpmybxzduYOMdgE3wrTZqqsIvMKvjv_ZJ2Q1AE3_qQS8lpcr40X84BOIr9IENWYl5oQu5FQSC9NMtDD5lY5DxghpYiSS3ebovHR9mZgZPpXRhjjEs-M6f20Z3lZyP9bn-VnQmrncLFLMyxKApoeVtremYQMWcWiPsLhl1RWZ1hUl-e9TrXXZvGJU6RTSWznmrfWMjZqvxYix3BNFb-92qrsFwFkuSiRH4NZsxwHVYmJg2kmrPrsPRNcXADLgfdduecXJB7DJpxoet_kk46tn4q2FT_aTR-KZ4HBANZ0rIp4l4XITQE65BrU7isreEmPDRueldNr7JR8DRyeeFxHZo4lVzRnDHFOI-V0lRkkpnUpFQogTFGjtugIOVZYDTzFc_z2CqNMhUyGW5BbTgamm0ggeYqM77I8gwRSLWIKTYolHJK9jKqgz8Z1kRXGuPW6qKfuL2GLxOLRGKRSCok6nA8rfJaCmz8VXjDDv20YDXqddibYJdUU_AtQU6OqLTiOTu_1zqEhWbvvpW0btt3u7Bo-ykvGe5BrRi_m32Y1x_Fy9v4wH1nX3EDzDc
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=mSNP%3A+A+Massively+Parallel+Algorithm+for+Large-Scale+SNP+Detection&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Cui%2C+Yingbo&rft.au=Peng%2C+Shaoliang&rft.au=Lu%2C+Yutong&rft.au=Zhu%2C+Xiaoqian&rft.date=2018-11-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1045-9219&rft.eissn=1558-2183&rft.volume=29&rft.issue=11&rft.spage=2557&rft_id=info:doi/10.1109%2FTPDS.2018.2839578&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon