A compression method of double-array structures using linear functions

A trie is one of the data structures for keyword search algorithms and is utilized in natural language processing, reserved words search for compilers and so on. The double-array and LOUDS are efficient representation methods for the trie. The double-array provides fast traversal at time complexity...

Full description

Saved in:
Bibliographic Details
Published in:Knowledge and information systems Vol. 48; no. 1; pp. 55 - 80
Main Authors: Kanda, Shunsuke, Fuketa, Masao, Morita, Kazuhiro, Aoe, Jun-ichi
Format: Journal Article
Language:English
Published: London Springer London 01.07.2016
Springer Nature B.V
Subjects:
ISSN:0219-1377, 0219-3116
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A trie is one of the data structures for keyword search algorithms and is utilized in natural language processing, reserved words search for compilers and so on. The double-array and LOUDS are efficient representation methods for the trie. The double-array provides fast traversal at time complexity of O (1), but the space usage of the double-array is larger than that of LOUDS. LOUDS is a succinct data structure with bit-string, and its space usage is extremely compact. However, its traversal speed is not so fast. This paper presents a new compression method of the double-array with keeping the retrieval speed. Our new method compresses the double-array by dividing the double-array into blocks and by using linear functions. Experimental results for varied keywords show that our new method reduced space usage of the double-array up to about 44 %, and the retrieval speed of the new method was 9–14 times faster than that of LOUDS. Moreover, the results show that the construction speed of the new method was faster than that of the conventional method for a large keyword set.
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ISSN:0219-1377
0219-3116
DOI:10.1007/s10115-015-0873-0