Development of word-based text compression algorithm for Indonesian language document

Information technology is growing very rapidly, in particular for data handling. Data is a valuable asset for everyone, especially for larger companies with branches in several places. Data transmission from headquarters to branch offices make the company must provide good tools to do it. These comp...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2015 3rd International Conference on Information and Communication Technology (ICoICT) s. 450 - 454
Hlavní autoři: Sinaga, Ardiles, Adiwijaya, Nugroho, Hertog
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.05.2015
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Information technology is growing very rapidly, in particular for data handling. Data is a valuable asset for everyone, especially for larger companies with branches in several places. Data transmission from headquarters to branch offices make the company must provide good tools to do it. These companies also need tools that can be used to compress data to reduce their size. The main idea of the word-based encoding is to extract each word of the source text, then it is checked whether containing capital letters or not. After that, it is checked if there is a symbol or number. The particle will be separated from the basic word using stemming algorithm. Symbols, numbers and affixes will be indexed in the basic dictionary. The basic word will also be checked whether it exists in the basic dictionary or not. If there is not a match, then the word will be stored in the supplement dictionary. The experiment was conducted on the text file with the size from about 10K bytes up to 500K bytes with 16-bits length codewords. The result shows that the compression ratio of the proposed method is comparable with the previous ones, while its processing time is much better than the Reversed Sequence of Characters on LZW method.
DOI:10.1109/ICoICT.2015.7231466