Fixed Block Compression Boosting in FM-Indexes: Theory and Practice

The FM index (Ferragina and Manzini in J ACM 52(4):552–581, 2005 ) is a widely-used compressed data structure that stores a string T in a compressed form and also supports fast pattern matching queries. In this paper, we describe new FM-index variants that combine nice theoretical properties, simple...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Algorithmica Ročník 81; číslo 4; s. 1370 - 1391
Hlavní autori: Gog, Simon, Kärkkäinen, Juha, Kempa, Dominik, Petri, Matthias, Puglisi, Simon J.
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York Springer US 01.04.2019
Springer Nature B.V
Predmet:
ISSN:0178-4617, 1432-0541
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:The FM index (Ferragina and Manzini in J ACM 52(4):552–581, 2005 ) is a widely-used compressed data structure that stores a string T in a compressed form and also supports fast pattern matching queries. In this paper, we describe new FM-index variants that combine nice theoretical properties, simple implementation and improved practical performance. Our main theoretical result is a new technique called fixed block compression boosting , which is a simpler and faster alternative to optimal compression boosting and implicit compression boosting used in previous FM-indexes. We also describe several new techniques for implementing fixed-block boosting efficiently, including a new, fast, and space-efficient implementation of wavelet trees. Our extensive experiments show the new indexes to be consistently fast and small relative to the state-of-the-art, and thus they make a good “off-the-shelf” choice for many applications.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0178-4617
1432-0541
DOI:10.1007/s00453-018-0475-9