Multithread Multistring Burrows-Wheeler Transform and Longest Common Prefix Array

Indexing huge collections of strings, such as those produced by the widespread sequencing technologies, heavily relies on multistring generalizations of the Burrows-Wheeler transform (BWT) and the longest common prefix (LCP) array, since solving efficiently both problems are essential ingredients of...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of computational biology Ročník 26; číslo 9; s. 948
Hlavní autoři: Bonizzoni, Paola, Della Vedova, Gianluca, Pirola, Yuri, Previtali, Marco, Rizzi, Raffaella
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States 01.09.2019
Témata:
ISSN:1557-8666, 1557-8666
On-line přístup:Zjistit podrobnosti o přístupu
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Indexing huge collections of strings, such as those produced by the widespread sequencing technologies, heavily relies on multistring generalizations of the Burrows-Wheeler transform (BWT) and the longest common prefix (LCP) array, since solving efficiently both problems are essential ingredients of several algorithms on a collection of strings, such as those for genome assembly. In this article, we explore a multithread computational strategy for building the BWT and LCP array. Our algorithm applies a divide and conquer approach that leads to parallel computation of multistring BWT and LCP array. Indexing huge collections of strings, such as those produced by the widespread sequencing technologies, heavily relies on multistring generalizations of the Burrows-Wheeler transform (BWT) and the longest common prefix (LCP) array, since solving efficiently both problems are essential ingredients of several algorithms on a collection of strings, such as those for genome assembly. In this article, we explore a multithread computational strategy for building the BWT and LCP array. Our algorithm applies a divide and conquer approach that leads to parallel computation of multistring BWT and LCP array.
AbstractList Indexing huge collections of strings, such as those produced by the widespread sequencing technologies, heavily relies on multistring generalizations of the Burrows-Wheeler transform (BWT) and the longest common prefix (LCP) array, since solving efficiently both problems are essential ingredients of several algorithms on a collection of strings, such as those for genome assembly. In this article, we explore a multithread computational strategy for building the BWT and LCP array. Our algorithm applies a divide and conquer approach that leads to parallel computation of multistring BWT and LCP array. Indexing huge collections of strings, such as those produced by the widespread sequencing technologies, heavily relies on multistring generalizations of the Burrows-Wheeler transform (BWT) and the longest common prefix (LCP) array, since solving efficiently both problems are essential ingredients of several algorithms on a collection of strings, such as those for genome assembly. In this article, we explore a multithread computational strategy for building the BWT and LCP array. Our algorithm applies a divide and conquer approach that leads to parallel computation of multistring BWT and LCP array.
Author Bonizzoni, Paola
Previtali, Marco
Rizzi, Raffaella
Della Vedova, Gianluca
Pirola, Yuri
Author_xml – sequence: 1
  givenname: Paola
  surname: Bonizzoni
  fullname: Bonizzoni, Paola
  organization: Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Milan, Italy
– sequence: 2
  givenname: Gianluca
  surname: Della Vedova
  fullname: Della Vedova, Gianluca
  organization: Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Milan, Italy
– sequence: 3
  givenname: Yuri
  surname: Pirola
  fullname: Pirola, Yuri
  organization: Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Milan, Italy
– sequence: 4
  givenname: Marco
  surname: Previtali
  fullname: Previtali, Marco
  organization: Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Milan, Italy
– sequence: 5
  givenname: Raffaella
  surname: Rizzi
  fullname: Rizzi, Raffaella
  organization: Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Milan, Italy
BackLink https://www.ncbi.nlm.nih.gov/pubmed/31140836$$D View this record in MEDLINE/PubMed
BookMark eNpNUEtLAzEYDFKxDz16lRy9bE3ybbbpsRZfUFGh4nFJdr-0K7tJTXbR_nsXreBphmEYZmZMBs47JOScsylnan5VNGYqGFdTJoAdkRGXcpaoLMsG__iQjGN8Z4xDxmYnZAicp0xBNiIvj13dVu02oC7pD49tqNyGXnch-M-YvG0Rawx0HbSL1oeGalfSlXcbjC1d-qbxjj4HtNUXXYSg96fk2Oo64tkBJ-T19ma9vE9WT3cPy8UqKUCoNoE0KzAtoEwzo6zFwqAUEuS8hLlNmenFAtAYCzCTACk3RhsOdi5VP9SimJDL39xd8B9dXyZvqlhgXWuHvou5EMCVFKmC3npxsHamwTLfharRYZ__3SC-ATtVYbY
CitedBy_id crossref_primary_10_1186_s13015_023_00232_4
crossref_primary_10_1016_j_tcs_2020_11_041
crossref_primary_10_1016_j_tcs_2019_11_001
crossref_primary_10_1007_s00236_024_00467_7
crossref_primary_10_1093_bioinformatics_btae333
crossref_primary_10_1186_s12859_020_03628_w
ContentType Journal Article
DBID NPM
7X8
DOI 10.1089/cmb.2018.0230
DatabaseName PubMed
MEDLINE - Academic
DatabaseTitle PubMed
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Biology
Mathematics
EISSN 1557-8666
ExternalDocumentID 31140836
Genre Journal Article
GroupedDBID ---
0R~
1-M
29K
34G
39C
4.4
53G
5GY
ABBKN
ABEFU
ACGFO
ADBBV
AENEX
AFOSN
AI.
ALMA_UNASSIGNED_HOLDINGS
BAWUL
BNQNF
CAG
COF
CS3
D-I
DIK
DU5
EBS
EJD
F5P
IAO
IER
IGS
IHR
IM4
ISR
ITC
MV1
NPM
NQHIM
O9-
OK1
P2P
R.V
RIG
RML
RMSOB
RNS
TN5
TR2
UE5
VH1
7X8
SCNPE
ID FETCH-LOGICAL-c328t-346ce4c3d46b8ffecbe525359d39f40bb8fc3ebbf33753341bbab13f958230fe2
IEDL.DBID 7X8
ISICitedReferencesCount 11
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000469491200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1557-8666
IngestDate Thu Sep 04 15:43:12 EDT 2025
Thu Jan 02 22:59:24 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 9
Keywords parallel algorithms
Burrows–Wheeler transform
multithreading
longest common prefix array
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c328t-346ce4c3d46b8ffecbe525359d39f40bb8fc3ebbf33753341bbab13f958230fe2
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PMID 31140836
PQID 2231852483
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2231852483
pubmed_primary_31140836
PublicationCentury 2000
PublicationDate 2019-09-00
20190901
PublicationDateYYYYMMDD 2019-09-01
PublicationDate_xml – month: 09
  year: 2019
  text: 2019-09-00
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Journal of computational biology
PublicationTitleAlternate J Comput Biol
PublicationYear 2019
SSID ssj0013607
Score 2.3152618
Snippet Indexing huge collections of strings, such as those produced by the widespread sequencing technologies, heavily relies on multistring generalizations of the...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 948
Title Multithread Multistring Burrows-Wheeler Transform and Longest Common Prefix Array
URI https://www.ncbi.nlm.nih.gov/pubmed/31140836
https://www.proquest.com/docview/2231852483
Volume 26
WOSCitedRecordID wos000469491200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEB7UKujBR33VFyt4jabdTbJ7EhWLh7ZUqNJbyb7Eg5vaWLH_3tkkVS-C4CWEwLJhdme-b3deAGeJpNrGIkZF4mHAEiNQ50wUcG6VFKmUcRHy_9hJej0-HIp-deGWV2GVc5tYGGqdKX9HfoEw5vN8GaeX49fAd43y3tWqhcYi1ChSGa-YyfCHFyEu0qURMtESI0-vamyGXFyoF-njuvi55-C_s8sCZdob__2_TViv-CW5KjfEFiwYV4eVsuPkrA5r3a8yrfk23Jfpt7icqSbFu-_i4Z7I9dSXZswDNNUISxMymNNbkjpNOpnzXinik0syR_oIss8fOOUkne3AQ_t2cHMXVD0WAkVb_C2gLFaGKapZLLmPIJEmakU0EpoKy0KJHxU1UlpKE5-125QylU1qReQ9dNa0dmHJZc7sA8GTig2tjsJYS6aYRJxD9oO2FGcykTINOJ1LboR72DsmUmeyaT76ll0D9krxj8ZlsY0RxQObr6B98IfRh7CKa1qFgB1BzaIGm2NYVu8ov8lJsTnw2et3PwGPI8TK
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multithread+Multistring+Burrows-Wheeler+Transform+and+Longest+Common+Prefix+Array&rft.jtitle=Journal+of+computational+biology&rft.au=Bonizzoni%2C+Paola&rft.au=Della+Vedova%2C+Gianluca&rft.au=Pirola%2C+Yuri&rft.au=Previtali%2C+Marco&rft.date=2019-09-01&rft.eissn=1557-8666&rft.volume=26&rft.issue=9&rft.spage=948&rft_id=info:doi/10.1089%2Fcmb.2018.0230&rft_id=info%3Apmid%2F31140836&rft_id=info%3Apmid%2F31140836&rft.externalDocID=31140836
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1557-8666&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1557-8666&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1557-8666&client=summon