Efficient Exact Online String Matching Through Linked Weak Factors

Uloženo v:
Podrobná bibliografie
Název: Efficient Exact Online String Matching Through Linked Weak Factors
Autoři: Matthew N. Palmer, Simone Faro, Stefano Scafiti
Přispěvatelé: Matthew N. Palmer and Simone Faro and Stefano Scafiti
Informace o vydavateli: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024.
Rok vydání: 2024
Témata: String matching, text processing, weak recognition, hashing, experimental algorithms, design and analysis of algorithms, design and analysis of algorithms, experimental algorithms, String matching, text processing, ddc:004, weak recognition, hashing
Popis: Online exact string matching is a fundamental computational problem in computer science, involving the sequential search for a pattern within a large text without prior access to the entire text. Its significance is underscored by its diverse applications in data compression, data mining, text editing, and bioinformatics, just to cite a few, where efficient substring matching is crucial. While the problem has been a subject of study for years, recent decades have witnessed a heightened focus on experimental solutions, employing various techniques to achieve superior performance. Notably, approaches centered around weak factor recognition have emerged as leaders in experimental settings, gaining increasing attention. This paper introduces Hash Chain, a novel algorithm founded on a robust weak factor recognition approach that links adjacent factors through hashing. Building upon the efficacy of weak recognition techniques, the proposed algorithm incorporates innovative strategies for organizing data structures and optimizations to enhance performance. Despite its quadratic worst-case time complexity, the new proposed algorithm demonstrates sublinear behavior in practice, outperforming currently known algorithms in the literature.
Druh dokumentu: Conference object
Article
Popis souboru: application/pdf
Jazyk: English
DOI: 10.4230/lipics.sea.2024.24
Přístupová URL adresa: https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2024.24
https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2024.24
https://hdl.handle.net/20.500.11769/641810
https://doi.org/10.4230/lipics.sea.2024.24
Rights: CC BY
Přístupové číslo: edsair.dedup.wf.002..6e09623261eddb40a7e4f85b8bdfa382
Databáze: OpenAIRE
Popis
Abstrakt:Online exact string matching is a fundamental computational problem in computer science, involving the sequential search for a pattern within a large text without prior access to the entire text. Its significance is underscored by its diverse applications in data compression, data mining, text editing, and bioinformatics, just to cite a few, where efficient substring matching is crucial. While the problem has been a subject of study for years, recent decades have witnessed a heightened focus on experimental solutions, employing various techniques to achieve superior performance. Notably, approaches centered around weak factor recognition have emerged as leaders in experimental settings, gaining increasing attention. This paper introduces Hash Chain, a novel algorithm founded on a robust weak factor recognition approach that links adjacent factors through hashing. Building upon the efficacy of weak recognition techniques, the proposed algorithm incorporates innovative strategies for organizing data structures and optimizations to enhance performance. Despite its quadratic worst-case time complexity, the new proposed algorithm demonstrates sublinear behavior in practice, outperforming currently known algorithms in the literature.
DOI:10.4230/lipics.sea.2024.24