MultiPLZW: A novel multiple pattern matching search in LZW-compressed data

Searching encrypted or compressed data provides security and privacy without sacrificing efficiency. It has many applications in cloud storage, bioinformatics, IoT, unmanned aerial vehicles and drones. This paper introduces a novel, simple, and efficient algorithm to locate all occurrences of a set...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computer communications Jg. 145; S. 126 - 136
Hauptverfasser:	Aldwairi, Monther, Hamzah, Abdulmughni Y., Jarrah, Moath
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Elsevier B.V 01.09.2019
Schlagworte:	Aho–Corasick algorithm Algorithm complexity Compressed data LZW compression Pattern matching Compressed data Aho–Corasick algorithm LZW compression Pattern matching Algorithm complexity
ISSN:	0140-3664, 1873-703X
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Searching encrypted or compressed data provides security and privacy without sacrificing efficiency. It has many applications in cloud storage, bioinformatics, IoT, unmanned aerial vehicles and drones. This paper introduces a novel, simple, and efficient algorithm to locate all occurrences of a set of patterns in LZW-compressed data, in a single pass. The algorithm comprises a preprocessing phase and a subsequent search phase. It uses a modified version of the generalized suffix tree, a lookup table, a mapping table, and a history tree. The proposed algorithm is superior in terms of the time complexity, while maintaining a space complexity of the same order as the best of existing algorithms. The time complexity is O(n+m+r), which is proportional to the length of the LZW-compressed data, where n is the length of the compressed data, m is the total size of the patterns, and r is the number of pattern occurrences in the compressed data. The space complexity is O(m2+t+r), where t is the size of the dictionary table that is used during compression. Experimental results show a significant improvement in search time, approximately twice as fast, compared to decompressing and then searching using Aho–Corasick algorithm. Also, results on various dataset sizes, demonstrate the algorithm’s superior scalability, which improves as the size of the dataset increases.
ISSN:	0140-3664 1873-703X
DOI:	10.1016/j.comcom.2019.06.011