Fixed Block Compression Boosting in FM-Indexes: Theory and Practice
The FM index (Ferragina and Manzini in J ACM 52(4):552–581, 2005 ) is a widely-used compressed data structure that stores a string T in a compressed form and also supports fast pattern matching queries. In this paper, we describe new FM-index variants that combine nice theoretical properties, simple...
Uložené v:
| Vydané v: | Algorithmica Ročník 81; číslo 4; s. 1370 - 1391 |
|---|---|
| Hlavní autori: | , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
New York
Springer US
01.04.2019
Springer Nature B.V |
| Predmet: | |
| ISSN: | 0178-4617, 1432-0541 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | The FM index (Ferragina and Manzini in J ACM 52(4):552–581,
2005
) is a widely-used compressed data structure that stores a string
T
in a compressed form and also supports fast pattern matching queries. In this paper, we describe new FM-index variants that combine nice theoretical properties, simple implementation and improved practical performance. Our main theoretical result is a new technique called
fixed block compression boosting
, which is a simpler and faster alternative to optimal compression boosting and implicit compression boosting used in previous FM-indexes. We also describe several new techniques for implementing fixed-block boosting efficiently, including a new, fast, and space-efficient implementation of wavelet trees. Our extensive experiments show the new indexes to be consistently fast and small relative to the state-of-the-art, and thus they make a good “off-the-shelf” choice for many applications. |
|---|---|
| Bibliografia: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0178-4617 1432-0541 |
| DOI: | 10.1007/s00453-018-0475-9 |