Comparative Analysis of Text Similarity Algorithms and Their Practical Applications in Computer Science

In an era defined by vast volumes of digital text, the capacity to compare, interpret, and quantify textual similarity is a cornerstone of modern computational linguistics and natural language processing (NLP). Text similarity algorithms support critical applications in information retrieval, plagia...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Elektrotehniski Vestnik Ročník 92; číslo 3; s. 151 - 156
Hlavní autoři:	Poljak, Josip, Crčić, Dražen, Horvat, Tomislav
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Ljubljana Elektrotehniski Vestnik 01.01.2025
Témata:	Accuracy Algorithms Automatic text analysis Automation Comparative analysis Documents Information retrieval Linguistics Machine learning Measurement techniques Natural language processing Plagiarism Semantics Sentiment analysis Similarity Similarity measures
ISSN:	0013-5852, 2232-3236
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	In an era defined by vast volumes of digital text, the capacity to compare, interpret, and quantify textual similarity is a cornerstone of modern computational linguistics and natural language processing (NLP). Text similarity algorithms support critical applications in information retrieval, plagiarism detection, sentiment analysis, text summarization, and beyond. This paper provides a comprehensive survey and comparative analysis of established text similarity algorithms, including edit-distance-based metrics (Levenshtein and Damerau-Levenshtein), character-based measures (Jaro and Jaro-Winkler), local sequence alignment (Smith-Waterman), vector-based semantic measures (Cosine similarity), and methods reliant on subsequence statistics (N-gram similarity). Each algorithm is analyzed in terms of its underlying theoretical foundations, computational complexity, performance characteristics, and domain-specific suitability. While traditional approaches excel in correcting typographical errors or identifying subtle lexical variations, more robust methods handle semantically rich corpora, larger text bodies, and intricate linguistic phenomena. Moreover, potential avenues for improvement are explored, including hybridization of existing approaches and the integration of emerging machine learning and deep neural models. This holistic examination aims to inform the selection and development of text similarity measures for diverse real-world applications and to guide future research directions in computational linguistics.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0013-5852 2232-3236