Compressed indexing and local alignment of DNA

Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G by...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics Jg. 24; H. 6; S. 791 - 797
Hauptverfasser: Lam, T. W., Sung, W. K., Tam, S. L., Wong, C. K., Yiu, S. M.
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Oxford Oxford University Press 15.03.2008
Oxford Publishing Limited (England)
Schlagworte:
ISSN:1367-4803, 1367-4811, 1460-2059, 1367-4811
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments. Results: In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(|T|0.628|P|) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments). Availability: www.cs.hku.hk/~ckwong3/bwtsw Contact: twlam@cs.hku.hk
AbstractList Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments. Results: In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(|T|0.628|P|) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments). Availability: www.cs.hku.hk/~ckwong3/bwtsw Contact: twlam@cs.hku.hk
Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments. In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(/T/(0.628)/P/) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments). www.cs.hku.hk/~ckwong3/bwtsw twlam@cs.hku.hk.
Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments. Results: In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(|T|[sup]0.628|P|) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments). Availability: www.cs.hku.hk/~ckwong3/bwtsw Contact: twlam@cs.hku.hk
Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments. Results: In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(|T|0.628|P|) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments). Availability:  www.cs.hku.hk/~ckwong3/bwtsw Contact:  twlam@cs.hku.hk
Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments. Results: In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(|T|0.628|P|) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments). Availability: www.cs.hku.hk/~ckwong3/bwtsw Contact: twlam@cs.hku.hk
MOTIVATION: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments. RESULTS: In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(|T| super(0.628)|P|) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments). AVAILABILITY: www.cs.hku.hk/~ckwong3/bwtsw.
Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments.MOTIVATIONRecent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments.In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(/T/(0.628)/P/) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments).RESULTSIn this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(/T/(0.628)/P/) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments).www.cs.hku.hk/~ckwong3/bwtswAVAILABILITYwww.cs.hku.hk/~ckwong3/bwtswtwlam@cs.hku.hk.CONTACTtwlam@cs.hku.hk.
Author Tam, S. L.
Sung, W. K.
Yiu, S. M.
Wong, C. K.
Lam, T. W.
Author_xml – sequence: 1
  givenname: T. W.
  surname: Lam
  fullname: Lam, T. W.
  organization: Department of Computer Science, University of Hong Kong, Hong Kong, China and Department of Computer Science, National University of Singapore, Singapore
– sequence: 2
  givenname: W. K.
  surname: Sung
  fullname: Sung, W. K.
  organization: Department of Computer Science, University of Hong Kong, Hong Kong, China and Department of Computer Science, National University of Singapore, Singapore
– sequence: 3
  givenname: S. L.
  surname: Tam
  fullname: Tam, S. L.
  organization: Department of Computer Science, University of Hong Kong, Hong Kong, China and Department of Computer Science, National University of Singapore, Singapore
– sequence: 4
  givenname: C. K.
  surname: Wong
  fullname: Wong, C. K.
  organization: Department of Computer Science, University of Hong Kong, Hong Kong, China and Department of Computer Science, National University of Singapore, Singapore
– sequence: 5
  givenname: S. M.
  surname: Yiu
  fullname: Yiu, S. M.
  organization: Department of Computer Science, University of Hong Kong, Hong Kong, China and Department of Computer Science, National University of Singapore, Singapore
BackLink http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=20196698$$DView record in Pascal Francis
https://www.ncbi.nlm.nih.gov/pubmed/18227115$$D View this record in MEDLINE/PubMed
BookMark eNqNkV1rFTEQhoNU7If-BGUR6t22k6_NBq_KqbbCQW8USm9CTna2pO4mx2QX6r83sqct9qZeZQLPM8PMe0j2QgxIyFsKJxQ0P9346EMf02gn7_LpZgrA2QtyQEUDNQOp90rNG1WLFvg-Ocz5FkBSIcQrsk9bxhSl8oCcrOK4TZgzdpUPHd75cFPZ0FVDdHao7OBvwohhqmJfnX89e01e9nbI-Gb3HpEfnz99X13W628XX1Zn69pJRqfaacp61WMPKKQSTGygFU4BOsapk0g16LZ8uZTSdojYMCHRttxRaTUofkQ-LH23Kf6aMU9m9NnhMNiAcc5GAddUK_EsyEA1stVNAd8_AW_jnEJZwlDdNgIopwV6t4PmzYid2SY_2vTb3N-rAMc7wOZynz7Z4Hx-4BhQ3TS6LZxcOJdizgn7x1Zg_uZn_s3PLPkV7-MTz_mpEDFMyfrhWRsWO87b_x5YL4rPE949SDb9NI3iSprLq2tzrfn6oj0X5or_ARouxXI
CODEN BOINFP
CitedBy_id crossref_primary_10_1109_TCBB_2016_2586070
crossref_primary_10_1007_s00453_013_9794_z
crossref_primary_10_1007_s11280_022_01128_w
crossref_primary_10_1093_bib_bbq015
crossref_primary_10_1186_s12864_017_3734_2
crossref_primary_10_1016_j_ins_2015_08_008
crossref_primary_10_1186_s12864_020_6569_1
crossref_primary_10_1080_13102818_2014_959711
crossref_primary_10_1371_journal_pone_0126409
crossref_primary_10_1007_s13721_014_0067_9
crossref_primary_10_1007_s40484_020_0214_5
crossref_primary_10_1093_bioinformatics_btp336
crossref_primary_10_1186_gb_2009_10_3_r25
crossref_primary_10_1109_JPROC_2015_2455551
crossref_primary_10_1186_1756_0500_5_27
crossref_primary_10_1093_bib_bbt088
crossref_primary_10_1145_1882471_1882478
crossref_primary_10_1186_s12864_019_6241_9
crossref_primary_10_1016_j_jda_2011_01_002
crossref_primary_10_1089_cmb_2009_0169
crossref_primary_10_1109_TIT_2020_2996543
crossref_primary_10_1155_2014_309650
crossref_primary_10_1186_1756_0381_5_6
crossref_primary_10_1016_j_jda_2015_01_004
crossref_primary_10_1109_TCBB_2018_2884701
crossref_primary_10_1016_j_gene_2012_06_014
crossref_primary_10_1109_TSP_2011_2157915
crossref_primary_10_1145_2629691
crossref_primary_10_1145_2635816
crossref_primary_10_1145_2847525
crossref_primary_10_1186_s13059_021_02443_7
crossref_primary_10_1109_TPDS_2021_3051011
crossref_primary_10_1186_1471_2105_12_S9_S15
crossref_primary_10_3389_fgene_2021_615958
crossref_primary_10_1093_bioinformatics_btp324
crossref_primary_10_1016_j_aquaculture_2024_741259
crossref_primary_10_1016_j_ic_2012_02_002
crossref_primary_10_1145_3296957_3173193
crossref_primary_10_1002_spe_2227
crossref_primary_10_1016_j_entcs_2014_01_021
Cites_doi 10.1101/gr.1350803
10.1016/S0022-2836(05)80360-2
10.1109/69.979973
10.1093/nar/25.17.3389
10.1145/321941.321946
10.1016/0022-2836(81)90087-5
10.1093/bioinformatics/18.6.873
10.1007/s007780200064
10.1145/335305.335351
10.1017/CBO9780511574931
10.1145/299432.299460
10.1142/S0219720004000661
10.1016/S0196-6774(03)00087-7
10.1007/s00453-006-1228-8
10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O
10.1089/cmb.2005.12.407
ContentType Journal Article
Copyright The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 2008
2008 INIST-CNRS
The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
Copyright_xml – notice: The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 2008
– notice: 2008 INIST-CNRS
– notice: The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
DBID BSCLL
AAYXX
CITATION
IQODW
CGR
CUY
CVF
ECM
EIF
NPM
7QF
7QO
7QQ
7SC
7SE
7SP
7SR
7TA
7TB
7TM
7TO
7U5
8BQ
8FD
F28
FR3
H8D
H8G
H94
JG9
JQ2
K9.
KR7
L7M
L~C
L~D
P64
7X8
DOI 10.1093/bioinformatics/btn032
DatabaseName Istex
CrossRef
Pascal-Francis
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Aluminium Industry Abstracts
Biotechnology Research Abstracts
Ceramic Abstracts
Computer and Information Systems Abstracts
Corrosion Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
Materials Business File
Mechanical & Transportation Engineering Abstracts
Nucleic Acids Abstracts
Oncogenes and Growth Factors Abstracts
Solid State and Superconductivity Abstracts
METADEX
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
Aerospace Database
Copper Technical Reference Library
AIDS and Cancer Research Abstracts
Materials Research Database
ProQuest Computer Science Collection
ProQuest Health & Medical Complete (Alumni)
Civil Engineering Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Materials Research Database
Oncogenes and Growth Factors Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
Nucleic Acids Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Health & Medical Complete (Alumni)
Materials Business File
Aerospace Database
Copper Technical Reference Library
Engineered Materials Abstracts
Biotechnology Research Abstracts
AIDS and Cancer Research Abstracts
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Civil Engineering Abstracts
Aluminium Industry Abstracts
Electronics & Communications Abstracts
Ceramic Abstracts
METADEX
Biotechnology and BioEngineering Abstracts
Computer and Information Systems Abstracts Professional
Solid State and Superconductivity Abstracts
Engineering Research Database
Corrosion Abstracts
MEDLINE - Academic
DatabaseTitleList
MEDLINE
Materials Research Database
CrossRef

Engineering Research Database
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1460-2059
1367-4811
EndPage 797
ExternalDocumentID 1450217121
18227115
20196698
10_1093_bioinformatics_btn032
10.1093/bioinformatics/btn032
ark_67375_HXZ_Z93LG8D4_X
Genre Evaluation Studies
Research Support, Non-U.S. Gov't
Journal Article
GroupedDBID -~X
.2P
.I3
482
48X
5GY
AAMVS
ABGNP
ABJNI
ABPTD
ACGFS
ACUFI
ADZXQ
ALMA_UNASSIGNED_HOLDINGS
BSCLL
CZ4
EE~
F5P
F9B
H5~
HAR
HW0
IOX
KSI
KSN
NGC
Q5Y
RD5
ROZ
RXO
TLC
TN5
TOX
WH7
~91
ADRIX
BCRHZ
KOP
ROX
---
-E4
.DC
0R~
1TH
23N
2WC
4.4
53G
5WA
70D
AAIJN
AAIMJ
AAJKP
AAJQQ
AAKPC
AAMDB
AAOGV
AAPQZ
AAPXW
AAUQX
AAVAP
AAVLN
AAYXX
ABEJV
ABEUO
ABIXL
ABNGD
ABNKS
ABPQP
ABQLI
ABWST
ABXVV
ABZBJ
ACIWK
ACPRK
ACUKT
ACUXJ
ACYTK
ADBBV
ADEYI
ADEZT
ADFTL
ADGKP
ADGZP
ADHKW
ADHZD
ADMLS
ADOCK
ADPDF
ADRDM
ADRTK
ADVEK
ADYVW
ADZTZ
AECKG
AEGPL
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFNX
AFFZL
AFGWE
AFIYH
AFOFC
AFRAH
AGINJ
AGKEF
AGQPQ
AGQXC
AGSYK
AHMBA
AHXPO
AIJHB
AJEEA
AJEUX
AKHUL
AKWXX
ALTZX
ALUQC
AMNDL
APIBT
APWMN
ARIXL
ASPBG
AVWKF
AXUDD
AYOIW
AZFZN
AZVOD
BAWUL
BAYMD
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
C1A
C45
CAG
CDBKE
CITATION
COF
CS3
DAKXR
DIK
DILTD
DU5
D~K
EBD
EBS
EJD
EMOBN
FEDTE
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
HVGLF
HZ~
J21
JXSIZ
KAQDR
KQ8
M-Z
MK~
ML0
N9A
NLBLG
NMDNZ
NOMLY
NU-
NVLIB
O0~
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
P2P
PAFKI
PB-
PEELM
PQQKQ
Q1.
R44
RNS
ROL
RPM
RUSNO
RW1
SV3
TEORI
TJP
TR2
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZKX
~KM
.-4
.GJ
ABEFU
AI.
AQDSO
ATTQO
ELUNK
IQODW
NTWIH
O~Y
RIG
RNI
RZF
RZO
VH1
ZGI
ABQTQ
AFXEN
CGR
CUY
CVF
ECM
EIF
M49
NPM
7QF
7QO
7QQ
7SC
7SE
7SP
7SR
7TA
7TB
7TM
7TO
7U5
8BQ
8FD
F28
FR3
H8D
H8G
H94
JG9
JQ2
K9.
KR7
L7M
L~C
L~D
P64
7X8
ID FETCH-LOGICAL-c521t-c912f7fef0e457424b084c70ec231c5e1909870e3555adeee6245ea83c15a9073
IEDL.DBID TOX
ISICitedReferencesCount 76
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000254010400008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1367-4803
1367-4811
IngestDate Thu Sep 04 19:06:36 EDT 2025
Tue Oct 07 08:19:47 EDT 2025
Mon Oct 06 17:17:26 EDT 2025
Wed Feb 19 01:48:58 EST 2025
Mon Jul 21 09:13:32 EDT 2025
Sat Nov 29 05:33:35 EST 2025
Tue Nov 18 21:56:26 EST 2025
Wed Aug 28 03:24:15 EDT 2024
Sat Sep 20 11:02:38 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 6
Keywords Bioinformatics
DNA
Indexing
Language English
License CC BY 4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c521t-c912f7fef0e457424b084c70ec231c5e1909870e3555adeee6245ea83c15a9073
Notes istex:F39666AAE5C2748BA55BB896AB27E4628DC32053
ark:/67375/HXZ-Z93LG8D4-X
To whom correspondence should be addressed.
ArticleID:btn032
Associate Editor: Thomas Lengauer
ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ObjectType-Undefined-1
ObjectType-Feature-3
OpenAccessLink https://academic.oup.com/bioinformatics/article-pdf/24/6/791/16886605/btn032.pdf
PMID 18227115
PQID 198640131
PQPubID 36124
PageCount 7
ParticipantIDs proquest_miscellaneous_70391974
proquest_miscellaneous_20765896
proquest_journals_198640131
pubmed_primary_18227115
pascalfrancis_primary_20196698
crossref_primary_10_1093_bioinformatics_btn032
crossref_citationtrail_10_1093_bioinformatics_btn032
oup_primary_10_1093_bioinformatics_btn032
istex_primary_ark_67375_HXZ_Z93LG8D4_X
PublicationCentury 2000
PublicationDate 2008-Mar-15
PublicationDateYYYYMMDD 2008-03-15
PublicationDate_xml – month: 03
  year: 2008
  text: 2008-Mar-15
  day: 15
PublicationDecade 2000
PublicationPlace Oxford
PublicationPlace_xml – name: Oxford
– name: England
PublicationTitle Bioinformatics
PublicationTitleAlternate Bioinformatics
PublicationYear 2008
Publisher Oxford University Press
Oxford Publishing Limited (England)
Publisher_xml – name: Oxford University Press
– name: Oxford Publishing Limited (England)
References Giladi (2023020209513941500_B8) 2002; 18
Ferragina (2023020209513941500_B7) 2001
Li (2023020209513941500_B17) 2004; 2
McCreight (2023020209513941500_B19) 1976; 23
Hon (2023020209513941500_B13) 2004
Hunt (2023020209513941500_B14) 2002; 11
Lippert (2023020209513941500_B18) 2005; 12
Healy (2023020209513941500_B11) 2003; 13
Ozturk (2023020209513941500_B21) 2003
Williams (2023020209513941500_B24) 2002; 14
Meek (2023020209513941500_B20) 2003
Ferragina (2023020209513941500_B6) 2000
Altschul (2023020209513941500_B1) 1990; 215
Burrow (2023020209513941500_B4) 1994
Smith (2023020209513941500_B23) 1981; 147
Cao (2023020209513941500_B5) 2005
Kurtz (2023020209513941500_B16) 1999; 29
Altschul (2023020209513941500_B2) 1997; 25
Sadakane (2023020209513941500_B22) 2003; 48
Grossi (2023020209513941500_B9) 2000
Gusfield (2023020209513941500_B10) 1997
Hon (2023020209513941500_B12) 2007; 48
Karlin (2023020209513941500_B15) 1990
Burkhardt (2023020209513941500_B3) 1999
References_xml – volume: 13
  start-page: 2306
  year: 2003
  ident: 2023020209513941500_B11
  article-title: Annotating large genomes with exact word matches
  publication-title: Genomes Research
  doi: 10.1101/gr.1350803
– start-page: 31
  year: 2004
  ident: 2023020209513941500_B13
  article-title: Practical aspects of compressed suffix arrays and FM-Index in searching DNA sequences
  publication-title: ALENEX/ANALC
– start-page: 2264
  year: 1990
  ident: 2023020209513941500_B15
  article-title: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes
– volume: 215
  start-page: 403
  year: 1990
  ident: 2023020209513941500_B1
  article-title: Basic local alignment search tool
  publication-title: J. Mol. Biol
  doi: 10.1016/S0022-2836(05)80360-2
– start-page: 4
  year: 2005
  ident: 2023020209513941500_B5
  article-title: Indexing DNA sequences using q-grams
  publication-title: DASFAA
– volume: 14
  start-page: 63
  year: 2002
  ident: 2023020209513941500_B24
  article-title: Indexing and retrieval for genomic databases
  publication-title: IEEE Trans. Knowledge Data Eng
  doi: 10.1109/69.979973
– volume: 25
  start-page: 3389
  year: 1997
  ident: 2023020209513941500_B2
  article-title: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs
  publication-title: Nucl. Acids Res
  doi: 10.1093/nar/25.17.3389
– volume: 23
  start-page: 262
  year: 1976
  ident: 2023020209513941500_B19
  article-title: A space-economical suffix tree construction algorithm
  publication-title: J. ACM
  doi: 10.1145/321941.321946
– volume: 147
  start-page: 195
  year: 1981
  ident: 2023020209513941500_B23
  article-title: Identification of common molecular subsequences
  publication-title: J. Mol. Biol
  doi: 10.1016/0022-2836(81)90087-5
– start-page: 269
  year: 2001
  ident: 2023020209513941500_B7
  article-title: An experimental study of an opportunistic index
  publication-title: SODA
– volume: 18
  start-page: 873
  year: 2002
  ident: 2023020209513941500_B8
  article-title: SST: An algorithm for finding near-exact sequence matches in time proportional to the logarithm of the database size
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/18.6.873
– start-page: 359
  year: 2003
  ident: 2023020209513941500_B21
  article-title: Effective indexing and filtering for similarity search in large biosequence databases
  publication-title: BIBE
– volume: 11
  start-page: 256
  year: 2002
  ident: 2023020209513941500_B14
  article-title: Database indexing for large DNA and protein sequence collections
  publication-title: The VLDB J
  doi: 10.1007/s007780200064
– start-page: 397
  year: 2000
  ident: 2023020209513941500_B9
  article-title: Compressed suffix arrays and suffix trees with applications to text indexing and string matching
  publication-title: STOC
  doi: 10.1145/335305.335351
– year: 1997
  ident: 2023020209513941500_B10
  article-title: Algorithms on Strings, Trees, and Sequences
  doi: 10.1017/CBO9780511574931
– start-page: 77
  year: 1999
  ident: 2023020209513941500_B3
  article-title: q-Gram based database searching using a suffix array (quasar)
  publication-title: RECOMB
  doi: 10.1145/299432.299460
– volume: 2
  start-page: 417
  year: 2004
  ident: 2023020209513941500_B17
  article-title: PatterHunter II: Highly sensitive and fast homology search
  publication-title: J. Bioinformatics Comput. Biol
  doi: 10.1142/S0219720004000661
– volume: 48
  start-page: 294
  year: 2003
  ident: 2023020209513941500_B22
  article-title: New text indexing functionalities of the compressed suffix arrays
  publication-title: J. Algorithms
  doi: 10.1016/S0196-6774(03)00087-7
– volume: 48
  start-page: 23
  year: 2007
  ident: 2023020209513941500_B12
  article-title: Constructing compressed suffix arrays with large alphabets
  publication-title: Algorithmica
  doi: 10.1007/s00453-006-1228-8
– start-page: 390
  year: 2000
  ident: 2023020209513941500_B6
  article-title: Opportunistic data structures with applications
  publication-title: FOCS
– volume: 29
  start-page: 1149
  year: 1999
  ident: 2023020209513941500_B16
  article-title: Reducing the space requirement of suffix trees
  publication-title: Software - Practice and Exp
  doi: 10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O
– year: 1994
  ident: 2023020209513941500_B4
  article-title: A block-sorting lossless data compression algorithm
  publication-title: Technical Report 124, Digital Equipment Corporation
– volume: 12
  start-page: 407
  year: 2005
  ident: 2023020209513941500_B18
  article-title: Space-efficient whole genome comparisons with Burrows-Wheeler transforms
  publication-title: J. Comput. Biol
  doi: 10.1089/cmb.2005.12.407
– start-page: 910
  year: 2003
  ident: 2023020209513941500_B20
  article-title: OASIS: An online and accurate technique for local-alignment searches on biological sequences
  publication-title: VLDB
SSID ssj0051444
ssj0005056
Score 2.2640975
Snippet Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the...
Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human...
MOTIVATION: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the...
SourceID proquest
pubmed
pascalfrancis
crossref
oup
istex
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 791
SubjectTerms Algorithms
Base Sequence
Biological and medical sciences
Chromosome Mapping - methods
Data Compression - methods
DNA - genetics
Fundamental and applied biological sciences. Psychology
General aspects
Genome, Human - genetics
Humans
Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)
Molecular Sequence Data
Sequence Alignment - methods
Sequence Analysis, DNA - methods
Title Compressed indexing and local alignment of DNA
URI https://api.istex.fr/ark:/67375/HXZ-Z93LG8D4-X/fulltext.pdf
https://www.ncbi.nlm.nih.gov/pubmed/18227115
https://www.proquest.com/docview/198640131
https://www.proquest.com/docview/20765896
https://www.proquest.com/docview/70391974
Volume 24
WOSCitedRecordID wos000254010400008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1460-2059
  dateEnd: 20220930
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4803
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
– providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1460-2059
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4803
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Na9wwEB3StIVC6feHk3brQyn04MS2JEs6hqZpDmXbQwomFyHJ4xISvGF3U9p_n5Flb9jC0uZorBH2SKN5QqP3AN4XWHrKwi7zjeIZF15mWjuXOd1WyHNfNaXrxSbkdKrqWn_fgny8C_P3Eb5m--5sNpCIBuLifbfschYW3UKoIFlw8q2-qenIAzNMfCAkwKOkbWD2Vjkb7-9s6nItM90NTv493np7eGkX5LI2al1sBqN9Ujp6fPvfeQKPBgCaHsQZ8xS2sHsG96Mk5Z_nsBcWiJ5QvEl7JkVKbantmrRPeimh9p99_UA6a9PD6cEL-HH0-eTTcTZoKmQ-SBdkXhdlK1tsc-SCtsXc5Yp7maMnoOcFEj7QFMJIMETYBhGrkgu0ivlCWNpIs5ew3c06fA2pZE5Rfm9EKxsuK26tZq5sXIEVli3aBPjoT-MHwvGge3Fh4sE3M-uuMNEVCeytzC4j48a_DD70g7VqbefnoVxNCnNcn5pTzb5-UYfc1Al8pNH8304na2O-sioDoVClVQK74yQwQ9gvTBHI7gODUQLvVm8pXsMhjO1wdrUge0mgT1ebW8hA2k_bvARexbl188WE5iRB-J1b_MguPIglLiwrxBvYXs6v8C3c87-WZ4v5BO7IWk36ILoGXpke9A
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Compressed+indexing+and+local+alignment+of+DNA&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=LAM%2C+T.+W&rft.au=SUNG%2C+W.+K&rft.au=TAM%2C+S.+L&rft.au=WONG%2C+C.+K&rft.date=2008-03-15&rft.pub=Oxford+University+Press&rft.issn=1367-4803&rft.volume=24&rft.issue=6&rft.spage=791&rft.epage=797&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtn032&rft.externalDBID=n%2Fa&rft.externalDocID=20196698
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon