A grammar-based compression using a variation of Chomsky normal form of context free grammar

This paper proposes a new class of grammar-based lossless source code. Grammar-based code is a class of universal data compression algorithm using a context-free grammar. A Semi-Chomsky Normal Form (semi-CNF) of context free grammar, which is a modified form of the context free grammar (CNF), is new...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	ISITA 2016 : proceedings of 2016 International Symposium on Information Theory and Its Applications : Hyatt Regency Monterey Hotel, Monterey, California, USA, October 30 - November 2, 2016 s. 246 - 250
Hlavní autor:	Arimura, Mitsuharu
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEICE 01.10.2016
Témata:	Context Data compression Encoding Grammar Mars Pattern matching Production
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	This paper proposes a new class of grammar-based lossless source code. Grammar-based code is a class of universal data compression algorithm using a context-free grammar. A Semi-Chomsky Normal Form (semi-CNF) of context free grammar, which is a modified form of the context free grammar (CNF), is newly introduced. The proposed algorithm encodes a given sequence to a binary codeword in three step. In the first step, semi-CNF of the set of production rules is constructed using repeated substitution from a pair of symbols or variables to a new variable. In the second step, semi-CNF is translated to an irreducible or smaller grammar by eliminating production rules which are used only once in the other production rules. A produced grammar is encoded to a binary codeword in the third step. LZ78, Multilevel Pattern Matching (MPM) and Byte Pair Encoding (BPE) algorithms can be treated as examples of this class of codes. LZ78 and MPM algorithms does not use the second step of this procedure. Therefore, the proposed method can improve the compression performance of these algorithms by the unified procedure. This method has an advantage that, transformation from a given sequence to the grammar is quite simple, by using the three-step algorithm through semi-CNF.