Text image compression based on pattern matching

Text images contain many repeated text characters. Text image compression based on pattern matching achieves high compression by taking advantage of this high character-level redundancy. In a pattern matching based coding system, the encoder first transmits the dictionary, a representative subset of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Ye, Yan
Format: Dissertation
Sprache:Englisch
Veröffentlicht: ProQuest Dissertations & Theses 01.01.2002
Schlagworte:
ISBN:0493499369, 9780493499369
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Text images contain many repeated text characters. Text image compression based on pattern matching achieves high compression by taking advantage of this high character-level redundancy. In a pattern matching based coding system, the encoder first transmits the dictionary, a representative subset of symbols (bitmaps of characters) selected from all symbols on the input image. The encoder then codes the page based on the dictionary. Compared to entropy coding the image as a binary bit-plane, pattern matching based systems can improve compression by 40%. This dissertation addresses several key encoding issues in a pattern matching based system, the first of which is achieving high compression. We discuss the problem of choosing the subset of symbols to go into the dictionary. We propose three new dictionary design techniques which improve compression by 8% in lossless coding and 16–18% in lossy coding over existing dictionary designs. Although more efficient, pattern matching based systems require more physical memory and longer coding time. Page striping splits the input page into horizontal stripes and processes one stripe at a time. This reduces the system complexity but also incurs bit rate penalty. We propose dynamic dictionary updating schemes that can reduce the bit rate penalty. We further study the trade-offs between coding time and efficiency and between memory usage and efficiency. We propose an adaptive dictionary updating scheme that can resolve both trade-offs favorably at the same time. To save encoding time, we propose three speedup techniques for pattern matching, the most time consuming encoding activity. These techniques, limited dictionary search, early jump-out, and enhanced prescreening, can reduce the total encoding time by as much as 75% with at most 1.7% of bit rate penalty. In lossy compression, it is important to be able to correctly recognize the text content in the reconstructed text images. Hence we use the number of substitution errors to measure the reconstructed image quality. We propose enhanced prescreening and feature monitored shape unifying to effectively suppress more than half of the substitution errors.
Bibliographie:SourceType-Dissertations & Theses-1
ObjectType-Dissertation/Thesis-1
content type line 12
ISBN:0493499369
9780493499369