Compressed indexing and local alignment of DNA
Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G by...
Gespeichert in:
| Veröffentlicht in: | Bioinformatics Jg. 24; H. 6; S. 791 - 797 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Oxford
Oxford University Press
15.03.2008
Oxford Publishing Limited (England) |
| Schlagworte: | |
| ISSN: | 1367-4803, 1367-4811, 1460-2059, 1367-4811 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments. Results: In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(|T|0.628|P|) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments). Availability: www.cs.hku.hk/~ckwong3/bwtsw Contact: twlam@cs.hku.hk |
|---|---|
| AbstractList | Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments.
Results: In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(|T|0.628|P|) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments).
Availability:
www.cs.hku.hk/~ckwong3/bwtsw
Contact:
twlam@cs.hku.hk Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments. In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(/T/(0.628)/P/) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments). www.cs.hku.hk/~ckwong3/bwtsw twlam@cs.hku.hk. Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments. Results: In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(|T|[sup]0.628|P|) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments). Availability: www.cs.hku.hk/~ckwong3/bwtsw Contact: twlam@cs.hku.hk Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments. Results: In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(|T|0.628|P|) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments). Availability: www.cs.hku.hk/~ckwong3/bwtsw Contact: twlam@cs.hku.hk Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments. Results: In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(|T|0.628|P|) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments). Availability: www.cs.hku.hk/~ckwong3/bwtsw Contact: twlam@cs.hku.hk MOTIVATION: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments. RESULTS: In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(|T| super(0.628)|P|) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments). AVAILABILITY: www.cs.hku.hk/~ckwong3/bwtsw. Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments.MOTIVATIONRecent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments.In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(/T/(0.628)/P/) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments).RESULTSIn this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(/T/(0.628)/P/) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments).www.cs.hku.hk/~ckwong3/bwtswAVAILABILITYwww.cs.hku.hk/~ckwong3/bwtswtwlam@cs.hku.hk.CONTACTtwlam@cs.hku.hk. |
| Author | Tam, S. L. Sung, W. K. Yiu, S. M. Wong, C. K. Lam, T. W. |
| Author_xml | – sequence: 1 givenname: T. W. surname: Lam fullname: Lam, T. W. organization: Department of Computer Science, University of Hong Kong, Hong Kong, China and Department of Computer Science, National University of Singapore, Singapore – sequence: 2 givenname: W. K. surname: Sung fullname: Sung, W. K. organization: Department of Computer Science, University of Hong Kong, Hong Kong, China and Department of Computer Science, National University of Singapore, Singapore – sequence: 3 givenname: S. L. surname: Tam fullname: Tam, S. L. organization: Department of Computer Science, University of Hong Kong, Hong Kong, China and Department of Computer Science, National University of Singapore, Singapore – sequence: 4 givenname: C. K. surname: Wong fullname: Wong, C. K. organization: Department of Computer Science, University of Hong Kong, Hong Kong, China and Department of Computer Science, National University of Singapore, Singapore – sequence: 5 givenname: S. M. surname: Yiu fullname: Yiu, S. M. organization: Department of Computer Science, University of Hong Kong, Hong Kong, China and Department of Computer Science, National University of Singapore, Singapore |
| BackLink | http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=20196698$$DView record in Pascal Francis https://www.ncbi.nlm.nih.gov/pubmed/18227115$$D View this record in MEDLINE/PubMed |
| BookMark | eNqNkV1rFTEQhoNU7If-BGUR6t22k6_NBq_KqbbCQW8USm9CTna2pO4mx2QX6r83sqct9qZeZQLPM8PMe0j2QgxIyFsKJxQ0P9346EMf02gn7_LpZgrA2QtyQEUDNQOp90rNG1WLFvg-Ocz5FkBSIcQrsk9bxhSl8oCcrOK4TZgzdpUPHd75cFPZ0FVDdHao7OBvwohhqmJfnX89e01e9nbI-Gb3HpEfnz99X13W628XX1Zn69pJRqfaacp61WMPKKQSTGygFU4BOsapk0g16LZ8uZTSdojYMCHRttxRaTUofkQ-LH23Kf6aMU9m9NnhMNiAcc5GAddUK_EsyEA1stVNAd8_AW_jnEJZwlDdNgIopwV6t4PmzYid2SY_2vTb3N-rAMc7wOZynz7Z4Hx-4BhQ3TS6LZxcOJdizgn7x1Zg_uZn_s3PLPkV7-MTz_mpEDFMyfrhWRsWO87b_x5YL4rPE949SDb9NI3iSprLq2tzrfn6oj0X5or_ARouxXI |
| CODEN | BOINFP |
| CitedBy_id | crossref_primary_10_1109_TCBB_2016_2586070 crossref_primary_10_1007_s00453_013_9794_z crossref_primary_10_1007_s11280_022_01128_w crossref_primary_10_1093_bib_bbq015 crossref_primary_10_1186_s12864_017_3734_2 crossref_primary_10_1016_j_ins_2015_08_008 crossref_primary_10_1186_s12864_020_6569_1 crossref_primary_10_1080_13102818_2014_959711 crossref_primary_10_1371_journal_pone_0126409 crossref_primary_10_1007_s13721_014_0067_9 crossref_primary_10_1007_s40484_020_0214_5 crossref_primary_10_1093_bioinformatics_btp336 crossref_primary_10_1186_gb_2009_10_3_r25 crossref_primary_10_1109_JPROC_2015_2455551 crossref_primary_10_1186_1756_0500_5_27 crossref_primary_10_1093_bib_bbt088 crossref_primary_10_1145_1882471_1882478 crossref_primary_10_1186_s12864_019_6241_9 crossref_primary_10_1016_j_jda_2011_01_002 crossref_primary_10_1089_cmb_2009_0169 crossref_primary_10_1109_TIT_2020_2996543 crossref_primary_10_1155_2014_309650 crossref_primary_10_1186_1756_0381_5_6 crossref_primary_10_1016_j_jda_2015_01_004 crossref_primary_10_1109_TCBB_2018_2884701 crossref_primary_10_1016_j_gene_2012_06_014 crossref_primary_10_1109_TSP_2011_2157915 crossref_primary_10_1145_2629691 crossref_primary_10_1145_2635816 crossref_primary_10_1145_2847525 crossref_primary_10_1186_s13059_021_02443_7 crossref_primary_10_1109_TPDS_2021_3051011 crossref_primary_10_1186_1471_2105_12_S9_S15 crossref_primary_10_3389_fgene_2021_615958 crossref_primary_10_1093_bioinformatics_btp324 crossref_primary_10_1016_j_aquaculture_2024_741259 crossref_primary_10_1016_j_ic_2012_02_002 crossref_primary_10_1145_3296957_3173193 crossref_primary_10_1002_spe_2227 crossref_primary_10_1016_j_entcs_2014_01_021 |
| Cites_doi | 10.1101/gr.1350803 10.1016/S0022-2836(05)80360-2 10.1109/69.979973 10.1093/nar/25.17.3389 10.1145/321941.321946 10.1016/0022-2836(81)90087-5 10.1093/bioinformatics/18.6.873 10.1007/s007780200064 10.1145/335305.335351 10.1017/CBO9780511574931 10.1145/299432.299460 10.1142/S0219720004000661 10.1016/S0196-6774(03)00087-7 10.1007/s00453-006-1228-8 10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O 10.1089/cmb.2005.12.407 |
| ContentType | Journal Article |
| Copyright | The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 2008 2008 INIST-CNRS The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org |
| Copyright_xml | – notice: The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 2008 – notice: 2008 INIST-CNRS – notice: The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org |
| DBID | BSCLL AAYXX CITATION IQODW CGR CUY CVF ECM EIF NPM 7QF 7QO 7QQ 7SC 7SE 7SP 7SR 7TA 7TB 7TM 7TO 7U5 8BQ 8FD F28 FR3 H8D H8G H94 JG9 JQ2 K9. KR7 L7M L~C L~D P64 7X8 |
| DOI | 10.1093/bioinformatics/btn032 |
| DatabaseName | Istex CrossRef Pascal-Francis Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Aluminium Industry Abstracts Biotechnology Research Abstracts Ceramic Abstracts Computer and Information Systems Abstracts Corrosion Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts Materials Business File Mechanical & Transportation Engineering Abstracts Nucleic Acids Abstracts Oncogenes and Growth Factors Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database Aerospace Database Copper Technical Reference Library AIDS and Cancer Research Abstracts Materials Research Database ProQuest Computer Science Collection ProQuest Health & Medical Complete (Alumni) Civil Engineering Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Biotechnology and BioEngineering Abstracts MEDLINE - Academic |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Materials Research Database Oncogenes and Growth Factors Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Nucleic Acids Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Health & Medical Complete (Alumni) Materials Business File Aerospace Database Copper Technical Reference Library Engineered Materials Abstracts Biotechnology Research Abstracts AIDS and Cancer Research Abstracts Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Civil Engineering Abstracts Aluminium Industry Abstracts Electronics & Communications Abstracts Ceramic Abstracts METADEX Biotechnology and BioEngineering Abstracts Computer and Information Systems Abstracts Professional Solid State and Superconductivity Abstracts Engineering Research Database Corrosion Abstracts MEDLINE - Academic |
| DatabaseTitleList | MEDLINE Materials Research Database CrossRef Engineering Research Database MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 1460-2059 1367-4811 |
| EndPage | 797 |
| ExternalDocumentID | 1450217121 18227115 20196698 10_1093_bioinformatics_btn032 10.1093/bioinformatics/btn032 ark_67375_HXZ_Z93LG8D4_X |
| Genre | Evaluation Studies Research Support, Non-U.S. Gov't Journal Article |
| GroupedDBID | -~X .2P .I3 482 48X 5GY AAMVS ABGNP ABJNI ABPTD ACGFS ACUFI ADZXQ ALMA_UNASSIGNED_HOLDINGS BSCLL CZ4 EE~ F5P F9B H5~ HAR HW0 IOX KSI KSN NGC Q5Y RD5 ROZ RXO TLC TN5 TOX WH7 ~91 ADRIX BCRHZ KOP ROX --- -E4 .DC 0R~ 1TH 23N 2WC 4.4 53G 5WA 70D AAIJN AAIMJ AAJKP AAJQQ AAKPC AAMDB AAOGV AAPQZ AAPXW AAUQX AAVAP AAVLN AAYXX ABEJV ABEUO ABIXL ABNGD ABNKS ABPQP ABQLI ABWST ABXVV ABZBJ ACIWK ACPRK ACUKT ACUXJ ACYTK ADBBV ADEYI ADEZT ADFTL ADGKP ADGZP ADHKW ADHZD ADMLS ADOCK ADPDF ADRDM ADRTK ADVEK ADYVW ADZTZ AECKG AEGPL AEJOX AEKKA AEKSI AELWJ AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFNX AFFZL AFGWE AFIYH AFOFC AFRAH AGINJ AGKEF AGQPQ AGQXC AGSYK AHMBA AHXPO AIJHB AJEEA AJEUX AKHUL AKWXX ALTZX ALUQC AMNDL APIBT APWMN ARIXL ASPBG AVWKF AXUDD AYOIW AZFZN AZVOD BAWUL BAYMD BHONS BQDIO BQUQU BSWAC BTQHN C1A C45 CAG CDBKE CITATION COF CS3 DAKXR DIK DILTD DU5 D~K EBD EBS EJD EMOBN FEDTE FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GROUPED_DOAJ GX1 H13 HVGLF HZ~ J21 JXSIZ KAQDR KQ8 M-Z MK~ ML0 N9A NLBLG NMDNZ NOMLY NU- NVLIB O0~ O9- OAWHX ODMLO OJQWA OK1 OVD OVEED P2P PAFKI PB- PEELM PQQKQ Q1. R44 RNS ROL RPM RUSNO RW1 SV3 TEORI TJP TR2 W8F WOQ X7H YAYTL YKOAZ YXANX ZKX ~KM .-4 .GJ ABEFU AI. AQDSO ATTQO ELUNK IQODW NTWIH O~Y RIG RNI RZF RZO VH1 ZGI ABQTQ AFXEN CGR CUY CVF ECM EIF M49 NPM 7QF 7QO 7QQ 7SC 7SE 7SP 7SR 7TA 7TB 7TM 7TO 7U5 8BQ 8FD F28 FR3 H8D H8G H94 JG9 JQ2 K9. KR7 L7M L~C L~D P64 7X8 |
| ID | FETCH-LOGICAL-c521t-c912f7fef0e457424b084c70ec231c5e1909870e3555adeee6245ea83c15a9073 |
| IEDL.DBID | TOX |
| ISICitedReferencesCount | 76 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000254010400008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1367-4803 1367-4811 |
| IngestDate | Thu Sep 04 19:06:36 EDT 2025 Tue Oct 07 08:19:47 EDT 2025 Mon Oct 06 17:17:26 EDT 2025 Wed Feb 19 01:48:58 EST 2025 Mon Jul 21 09:13:32 EDT 2025 Sat Nov 29 05:33:35 EST 2025 Tue Nov 18 21:56:26 EST 2025 Wed Aug 28 03:24:15 EDT 2024 Sat Sep 20 11:02:38 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 6 |
| Keywords | Bioinformatics DNA Indexing |
| Language | English |
| License | CC BY 4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c521t-c912f7fef0e457424b084c70ec231c5e1909870e3555adeee6245ea83c15a9073 |
| Notes | istex:F39666AAE5C2748BA55BB896AB27E4628DC32053 ark:/67375/HXZ-Z93LG8D4-X To whom correspondence should be addressed. ArticleID:btn032 Associate Editor: Thomas Lengauer ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 ObjectType-Undefined-1 ObjectType-Feature-3 |
| OpenAccessLink | https://academic.oup.com/bioinformatics/article-pdf/24/6/791/16886605/btn032.pdf |
| PMID | 18227115 |
| PQID | 198640131 |
| PQPubID | 36124 |
| PageCount | 7 |
| ParticipantIDs | proquest_miscellaneous_70391974 proquest_miscellaneous_20765896 proquest_journals_198640131 pubmed_primary_18227115 pascalfrancis_primary_20196698 crossref_primary_10_1093_bioinformatics_btn032 crossref_citationtrail_10_1093_bioinformatics_btn032 oup_primary_10_1093_bioinformatics_btn032 istex_primary_ark_67375_HXZ_Z93LG8D4_X |
| PublicationCentury | 2000 |
| PublicationDate | 2008-Mar-15 |
| PublicationDateYYYYMMDD | 2008-03-15 |
| PublicationDate_xml | – month: 03 year: 2008 text: 2008-Mar-15 day: 15 |
| PublicationDecade | 2000 |
| PublicationPlace | Oxford |
| PublicationPlace_xml | – name: Oxford – name: England |
| PublicationTitle | Bioinformatics |
| PublicationTitleAlternate | Bioinformatics |
| PublicationYear | 2008 |
| Publisher | Oxford University Press Oxford Publishing Limited (England) |
| Publisher_xml | – name: Oxford University Press – name: Oxford Publishing Limited (England) |
| References | Giladi (2023020209513941500_B8) 2002; 18 Ferragina (2023020209513941500_B7) 2001 Li (2023020209513941500_B17) 2004; 2 McCreight (2023020209513941500_B19) 1976; 23 Hon (2023020209513941500_B13) 2004 Hunt (2023020209513941500_B14) 2002; 11 Lippert (2023020209513941500_B18) 2005; 12 Healy (2023020209513941500_B11) 2003; 13 Ozturk (2023020209513941500_B21) 2003 Williams (2023020209513941500_B24) 2002; 14 Meek (2023020209513941500_B20) 2003 Ferragina (2023020209513941500_B6) 2000 Altschul (2023020209513941500_B1) 1990; 215 Burrow (2023020209513941500_B4) 1994 Smith (2023020209513941500_B23) 1981; 147 Cao (2023020209513941500_B5) 2005 Kurtz (2023020209513941500_B16) 1999; 29 Altschul (2023020209513941500_B2) 1997; 25 Sadakane (2023020209513941500_B22) 2003; 48 Grossi (2023020209513941500_B9) 2000 Gusfield (2023020209513941500_B10) 1997 Hon (2023020209513941500_B12) 2007; 48 Karlin (2023020209513941500_B15) 1990 Burkhardt (2023020209513941500_B3) 1999 |
| References_xml | – volume: 13 start-page: 2306 year: 2003 ident: 2023020209513941500_B11 article-title: Annotating large genomes with exact word matches publication-title: Genomes Research doi: 10.1101/gr.1350803 – start-page: 31 year: 2004 ident: 2023020209513941500_B13 article-title: Practical aspects of compressed suffix arrays and FM-Index in searching DNA sequences publication-title: ALENEX/ANALC – start-page: 2264 year: 1990 ident: 2023020209513941500_B15 article-title: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes – volume: 215 start-page: 403 year: 1990 ident: 2023020209513941500_B1 article-title: Basic local alignment search tool publication-title: J. Mol. Biol doi: 10.1016/S0022-2836(05)80360-2 – start-page: 4 year: 2005 ident: 2023020209513941500_B5 article-title: Indexing DNA sequences using q-grams publication-title: DASFAA – volume: 14 start-page: 63 year: 2002 ident: 2023020209513941500_B24 article-title: Indexing and retrieval for genomic databases publication-title: IEEE Trans. Knowledge Data Eng doi: 10.1109/69.979973 – volume: 25 start-page: 3389 year: 1997 ident: 2023020209513941500_B2 article-title: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs publication-title: Nucl. Acids Res doi: 10.1093/nar/25.17.3389 – volume: 23 start-page: 262 year: 1976 ident: 2023020209513941500_B19 article-title: A space-economical suffix tree construction algorithm publication-title: J. ACM doi: 10.1145/321941.321946 – volume: 147 start-page: 195 year: 1981 ident: 2023020209513941500_B23 article-title: Identification of common molecular subsequences publication-title: J. Mol. Biol doi: 10.1016/0022-2836(81)90087-5 – start-page: 269 year: 2001 ident: 2023020209513941500_B7 article-title: An experimental study of an opportunistic index publication-title: SODA – volume: 18 start-page: 873 year: 2002 ident: 2023020209513941500_B8 article-title: SST: An algorithm for finding near-exact sequence matches in time proportional to the logarithm of the database size publication-title: Bioinformatics doi: 10.1093/bioinformatics/18.6.873 – start-page: 359 year: 2003 ident: 2023020209513941500_B21 article-title: Effective indexing and filtering for similarity search in large biosequence databases publication-title: BIBE – volume: 11 start-page: 256 year: 2002 ident: 2023020209513941500_B14 article-title: Database indexing for large DNA and protein sequence collections publication-title: The VLDB J doi: 10.1007/s007780200064 – start-page: 397 year: 2000 ident: 2023020209513941500_B9 article-title: Compressed suffix arrays and suffix trees with applications to text indexing and string matching publication-title: STOC doi: 10.1145/335305.335351 – year: 1997 ident: 2023020209513941500_B10 article-title: Algorithms on Strings, Trees, and Sequences doi: 10.1017/CBO9780511574931 – start-page: 77 year: 1999 ident: 2023020209513941500_B3 article-title: q-Gram based database searching using a suffix array (quasar) publication-title: RECOMB doi: 10.1145/299432.299460 – volume: 2 start-page: 417 year: 2004 ident: 2023020209513941500_B17 article-title: PatterHunter II: Highly sensitive and fast homology search publication-title: J. Bioinformatics Comput. Biol doi: 10.1142/S0219720004000661 – volume: 48 start-page: 294 year: 2003 ident: 2023020209513941500_B22 article-title: New text indexing functionalities of the compressed suffix arrays publication-title: J. Algorithms doi: 10.1016/S0196-6774(03)00087-7 – volume: 48 start-page: 23 year: 2007 ident: 2023020209513941500_B12 article-title: Constructing compressed suffix arrays with large alphabets publication-title: Algorithmica doi: 10.1007/s00453-006-1228-8 – start-page: 390 year: 2000 ident: 2023020209513941500_B6 article-title: Opportunistic data structures with applications publication-title: FOCS – volume: 29 start-page: 1149 year: 1999 ident: 2023020209513941500_B16 article-title: Reducing the space requirement of suffix trees publication-title: Software - Practice and Exp doi: 10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O – year: 1994 ident: 2023020209513941500_B4 article-title: A block-sorting lossless data compression algorithm publication-title: Technical Report 124, Digital Equipment Corporation – volume: 12 start-page: 407 year: 2005 ident: 2023020209513941500_B18 article-title: Space-efficient whole genome comparisons with Burrows-Wheeler transforms publication-title: J. Comput. Biol doi: 10.1089/cmb.2005.12.407 – start-page: 910 year: 2003 ident: 2023020209513941500_B20 article-title: OASIS: An online and accurate technique for local-alignment searches on biological sequences publication-title: VLDB |
| SSID | ssj0051444 ssj0005056 |
| Score | 2.2640975 |
| Snippet | Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the... Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human... MOTIVATION: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the... |
| SourceID | proquest pubmed pascalfrancis crossref oup istex |
| SourceType | Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 791 |
| SubjectTerms | Algorithms Base Sequence Biological and medical sciences Chromosome Mapping - methods Data Compression - methods DNA - genetics Fundamental and applied biological sciences. Psychology General aspects Genome, Human - genetics Humans Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Molecular Sequence Data Sequence Alignment - methods Sequence Analysis, DNA - methods |
| Title | Compressed indexing and local alignment of DNA |
| URI | https://api.istex.fr/ark:/67375/HXZ-Z93LG8D4-X/fulltext.pdf https://www.ncbi.nlm.nih.gov/pubmed/18227115 https://www.proquest.com/docview/198640131 https://www.proquest.com/docview/20765896 https://www.proquest.com/docview/70391974 |
| Volume | 24 |
| WOSCitedRecordID | wos000254010400008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1460-2059 dateEnd: 20220930 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4803 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1460-2059 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4803 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Na9wwEB3StIVC6feHk3brQyn04MS2JEs6hqZpDmXbQwomFyHJ4xISvGF3U9p_n5Flb9jC0uZorBH2SKN5QqP3AN4XWHrKwi7zjeIZF15mWjuXOd1WyHNfNaXrxSbkdKrqWn_fgny8C_P3Eb5m--5sNpCIBuLifbfschYW3UKoIFlw8q2-qenIAzNMfCAkwKOkbWD2Vjkb7-9s6nItM90NTv493np7eGkX5LI2al1sBqN9Ujp6fPvfeQKPBgCaHsQZ8xS2sHsG96Mk5Z_nsBcWiJ5QvEl7JkVKbantmrRPeimh9p99_UA6a9PD6cEL-HH0-eTTcTZoKmQ-SBdkXhdlK1tsc-SCtsXc5Yp7maMnoOcFEj7QFMJIMETYBhGrkgu0ivlCWNpIs5ew3c06fA2pZE5Rfm9EKxsuK26tZq5sXIEVli3aBPjoT-MHwvGge3Fh4sE3M-uuMNEVCeytzC4j48a_DD70g7VqbefnoVxNCnNcn5pTzb5-UYfc1Al8pNH8304na2O-sioDoVClVQK74yQwQ9gvTBHI7gODUQLvVm8pXsMhjO1wdrUge0mgT1ebW8hA2k_bvARexbl188WE5iRB-J1b_MguPIglLiwrxBvYXs6v8C3c87-WZ4v5BO7IWk36ILoGXpke9A |
| linkProvider | Oxford University Press |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Compressed+indexing+and+local+alignment+of+DNA&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=LAM%2C+T.+W&rft.au=SUNG%2C+W.+K&rft.au=TAM%2C+S.+L&rft.au=WONG%2C+C.+K&rft.date=2008-03-15&rft.pub=Oxford+University+Press&rft.issn=1367-4803&rft.volume=24&rft.issue=6&rft.spage=791&rft.epage=797&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtn032&rft.externalDBID=n%2Fa&rft.externalDocID=20196698 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon |