On Finite Memory Universal Data Compression and Classification of Individual Sequences
Consider the case where consecutive blocks of letters of a semi-infinite individual sequence over a finite-alphabet are being compressed into binary sequences by some one-to-one mapping. No a priori information about is available at the encoder, which must therefore adopt a universal data-compressio...
Uložené v:
| Vydané v: | IEEE transactions on information theory Ročník 54; číslo 4; s. 1626 - 1636 |
|---|---|
| Hlavný autor: | |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
New York, NY
IEEE
01.04.2008
Institute of Electrical and Electronics Engineers The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Predmet: | |
| ISSN: | 0018-9448, 1557-9654 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Consider the case where consecutive blocks of letters of a semi-infinite individual sequence over a finite-alphabet are being compressed into binary sequences by some one-to-one mapping. No a priori information about is available at the encoder, which must therefore adopt a universal data-compression algorithm. It is known that if the universal Lempel-Ziv (LZ) data compression algorithm is successively applied to -blocks then the best error-free compression, for the particular individual sequence is achieved as tends to infinity. The best possible compression that may be achieved by any universal data compression algorithm for finite -blocks is discussed. It is demonstrated that context tree coding essentially achieves it. Next, consider a device called classifier (or discriminator) that observes an individual training sequence . The classifier's task is to examine individual test sequences of length and decide whether the test -sequence has the same features as those that are captured by the training sequence , or is sufficiently different, according to some appropriate criterion. Here again, it is demonstrated that a particular universal context classifier with a storage-space complexity that is linear in , is essentially optimal. This may contribute a theoretical ldquoindividual sequencerdquo justification for the Probabilistic Suffix Tree (PST) approach in learning theory and in computational biology. |
|---|---|
| Bibliografia: | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-2 content type line 23 |
| ISSN: | 0018-9448 1557-9654 |
| DOI: | 10.1109/TIT.2008.917666 |