Accelerating the LZ-complexity algorithm

The Lempel Ziv complexity of a string has recently been used in pattern recognition and classification as part of a string distance function. Its main advantage is that it can measure dissimilarity between a pair of strings of different lengths. This is very useful for machine learning on unstructur...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Proceedings - International Conference on Parallel and Distributed Systems s. 200 - 207
Hlavní autori:	Ratsaby, Joel, Timashkov, Alexander
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE 17.12.2023
Predmet:	Approximation algorithms Complexity theory CUDA GPU Graphics processing units Length measurement LZ-complexity Machine learning Memory management Pattern recognition string distance UID distance
ISSN:	2690-5965
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	The Lempel Ziv complexity of a string has recently been used in pattern recognition and classification as part of a string distance function. Its main advantage is that it can measure dissimilarity between a pair of strings of different lengths. This is very useful for machine learning on unstructured data since such data is not restricted to a fixed input dimensionality. The standard computation of LZ-complexity is inherently serial and is not suitable for processing large unstructured data. Hence, we propose a parallel algorithm that computes the LZ-complexity of strings whose length is limited only by the amount of memory, typically in the tens of gigabytes. The algorithm is implemented in CUDA on a GPU. Its speed-up factor is approximately n 2/3 for strings of length n, for at least up to n = 2Mb. For instance, on 2Mb strings, the speed-up is 150. We compare the execution times of kernel variants with shared and global memory. The more efficient variant obtains approximately 90% GPU utilization.
ISSN:	2690-5965
DOI:	10.1109/ICPADS60453.2023.00038