Fixed Block Compression Boosting in FM-Indexes: Theory and Practice

The FM index (Ferragina and Manzini in J ACM 52(4):552–581, 2005 ) is a widely-used compressed data structure that stores a string T in a compressed form and also supports fast pattern matching queries. In this paper, we describe new FM-index variants that combine nice theoretical properties, simple...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Algorithmica Jg. 81; H. 4; S. 1370 - 1391
Hauptverfasser: Gog, Simon, Kärkkäinen, Juha, Kempa, Dominik, Petri, Matthias, Puglisi, Simon J.
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York Springer US 01.04.2019
Springer Nature B.V
Schlagworte:
ISSN:0178-4617, 1432-0541
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The FM index (Ferragina and Manzini in J ACM 52(4):552–581, 2005 ) is a widely-used compressed data structure that stores a string T in a compressed form and also supports fast pattern matching queries. In this paper, we describe new FM-index variants that combine nice theoretical properties, simple implementation and improved practical performance. Our main theoretical result is a new technique called fixed block compression boosting , which is a simpler and faster alternative to optimal compression boosting and implicit compression boosting used in previous FM-indexes. We also describe several new techniques for implementing fixed-block boosting efficiently, including a new, fast, and space-efficient implementation of wavelet trees. Our extensive experiments show the new indexes to be consistently fast and small relative to the state-of-the-art, and thus they make a good “off-the-shelf” choice for many applications.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0178-4617
1432-0541
DOI:10.1007/s00453-018-0475-9