A grammar-based compression using a variation of Chomsky normal form of context free grammar

This paper proposes a new class of grammar-based lossless source code. Grammar-based code is a class of universal data compression algorithm using a context-free grammar. A Semi-Chomsky Normal Form (semi-CNF) of context free grammar, which is a modified form of the context free grammar (CNF), is new...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:ISITA 2016 : proceedings of 2016 International Symposium on Information Theory and Its Applications : Hyatt Regency Monterey Hotel, Monterey, California, USA, October 30 - November 2, 2016 s. 246 - 250
Hlavný autor: Arimura, Mitsuharu
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEICE 01.10.2016
Predmet:
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:This paper proposes a new class of grammar-based lossless source code. Grammar-based code is a class of universal data compression algorithm using a context-free grammar. A Semi-Chomsky Normal Form (semi-CNF) of context free grammar, which is a modified form of the context free grammar (CNF), is newly introduced. The proposed algorithm encodes a given sequence to a binary codeword in three step. In the first step, semi-CNF of the set of production rules is constructed using repeated substitution from a pair of symbols or variables to a new variable. In the second step, semi-CNF is translated to an irreducible or smaller grammar by eliminating production rules which are used only once in the other production rules. A produced grammar is encoded to a binary codeword in the third step. LZ78, Multilevel Pattern Matching (MPM) and Byte Pair Encoding (BPE) algorithms can be treated as examples of this class of codes. LZ78 and MPM algorithms does not use the second step of this procedure. Therefore, the proposed method can improve the compression performance of these algorithms by the unified procedure. This method has an advantage that, transformation from a given sequence to the grammar is quite simple, by using the three-step algorithm through semi-CNF.