Metaheuristički pristup detekciji plagijata

Uložené v:
Podrobná bibliografia
Názov: Metaheuristički pristup detekciji plagijata
Autori: Kučak, Danijel
Prispievatelia: Juričić, Vedran
Informácie o vydavateľovi: Sveučilište u Zagrebu. Filozofski fakultet., 2025.
Rok vydania: 2025
Predmety: metaheuristics, paralelizacija, computer-aided plagiarism detection, DRUŠTVENE ZNANOSTI. Informacijske i komunikacijske znanosti. Informacijsko i programsko inženjerstvo, Computer science and technology. Computing. Data processing, detekcija sličnosti, plagijat, similarity detection, računalno podržana detekcija plagijata, genetic algorithms, parallelization, SOCIAL SCIENCES. Information and Communication Sciences. Information and Software Engineering, genetski algoritmi, metaheuristike, optimizacija, plagiarism, optimization, Računalna znanost i tehnologija. Računalstvo. Obrada podataka, detekcija plagijata, plagiarism detection
Popis: Plagiarism detection has become a critical issue in academia and beyond due to the exponential growth of digital content. Traditional methods for detecting similarities between documents often suffer from computational inefficiency, particularly when applied to large corpora. This dissertation investigates the application of metaheuristic algorithms to optimize computer-aided plagiarism detection systems, aiming to improve both the effectiveness and efficiency of these systems. The primary hypotheses tested in the study are: H1: A metaheuristic approach can effectively implement a method for computer-aided plagiarism detection. H2: Parallelization of the developed method improves the system's response time. H3: Using evolutionary algorithms for parameter selection significantly improves system performance. Plagiarism, both intentional and unintentional, poses ethical challenges and undermines the integrity of intellectual work. Existing systems for plagiarism detection primarily rely on exhaustive comparisons, which are computationally intensive. Metaheuristics, such as Genetic Algorithms (GA), Ant Colony Optimization (ACO), and Particle Swarm Optimization (PSO), offer innovative solutions for optimizing the selection and comparison of documents. This research integrates these techniques into a framework that addresses the limitations of traditional plagiarism detection methods. The research was conducted in distinct phases: 1. Document Corpus Creation - a repository was established, including preprocessing steps like tokenization, lemmatization and feature extraction using CountVectorizer, TF-IDF, LDA, and Word2Vec; 2. Similarity Method Development - a novel method combining multiple algorithms was developed to compute document similarity efficiently; 3. Metaheuristic Integration - models using GA, ACO and PSO were implemented to intelligently select subsets of documents for comparison; 4. Performance Optimization - GA was employed to tune parameters for the best performing model; 5. Parallelization - parallel processing techniques were applied to improve the system ´s response time; 6. Evaluation - comparative analysis using standard measures such as precision, recall, F1-score, and execution time. The study successfully validated the proposed hypotheses: H1 Validation: metaheuristic models demonstrated effective implementation for plagiarism detection. H2 Validation: parallelization reduced response times significantly. H3 Validation: parameter optimization using GA enhanced detection accuracy and efficiency, leading to statistically significant performance improvements. The integrated system showed superior results in F1-score, precision and recall metrics while achieving a notable reduction in execution time. This dissertation makes several contributions to the field: • Development of a hybrid similarity detection method combining algorithms such as CountVectorizer, TF-IDF, LDA and Word2Vec. • Implementation of a metaheuristic-based framework for optimizing document selection. • Introduction of parallelization to further enhance computational performance. • Systematic evaluation of the impact of metaheuristics on plagiarism detection systems. • Empirical validation of the hypotheses through comprehensive testing and analysis. The metaheuristic approach to plagiarism detection addresses key limitations of traditional methods, offering improved performance and scalability. The study highlights the potential of combining optimization algorithms with parallel processing for large scale applications. Future work could explore hybrid metaheuristic strategies and adaptive systems to further refine these models.
Disertacija istražuje primjenu metaheurističkih pristupa u računalno potpomognutoj detekciji plagijata, s posebnim naglaskom na optimizaciju sustava i poboljšanje performansi. Glavni cilj istraživanja je istražiti mogućnost razvoja modela računalno potpomognute detekcije plagijata temeljenih na primjeni metaheuristika s ciljem efikasnijeg odabira skupa dokumenata za usporedbu sa sumnjivim dokumentom. U tu svrhu analizirane su mogućnosti primjene genetskih algoritama, optimizacije kolonijom mrava i optimizacije rojem čestica. Istraživanje je provedeno u pet faza: prikupljanje i obrada dokumenata nad kojima će se vršiti testiranje, razvoj optimalne metode za izračun sličnosti između dokumenata, razvoj sustava za računalno potpomognutu detekciju plagijata koji nije temeljen na metaheuristikama, zatim razvoj sustava za računalno potpomognutu detekciju plagijata koji je temeljen na metaheuristikama te evaluaciju performansi razvijenih sustava. Modeli su evaluirani koristeći standardne mjere za učinkovitost sustava za računalno potpomognutu detekciju plagijata. Za model s najboljim rezultatima je u kasnijim fazama istraživanja naglasak stavljen na optimizaciju parametara korištenjem genetskih algoritama te implementaciju paralelizacije u prikladne dijelove sustava, čime se dodatno povećala učinkovitost. Rezultati istraživanja pokazuju da modeli temeljeni na metaheuristikama mogu ostvariti značajna poboljšanja u usporedbi s tradicionalnim metodama, bez da se značajno naruše performanse detekcije plagijata. Disertacija pridonosi području detekcije plagijata kroz razvoj novih metodoloških pristupa, analizu primjene metaheuristika i sustavnu evaluaciju performansi.
Druh dokumentu: Doctoral thesis
Popis súboru: application/pdf
Jazyk: Croatian
Prístupová URL adresa: https://repozitorij.ffzg.unizg.hr/islandora/object/ffzg:13538/datastream/PDF
https://repozitorij.ffzg.unizg.hr/islandora/object/ffzg:13538
https://urn.nsk.hr/urn:nbn:hr:131:626503
Rights: URL: http://rightsstatements.org/vocab/InC/1.0/
Prístupové číslo: edsair.od......9415..65b3fa55fde6a5caa4a01d35d07acf6a
Databáza: OpenAIRE
Popis
Abstrakt:Plagiarism detection has become a critical issue in academia and beyond due to the exponential growth of digital content. Traditional methods for detecting similarities between documents often suffer from computational inefficiency, particularly when applied to large corpora. This dissertation investigates the application of metaheuristic algorithms to optimize computer-aided plagiarism detection systems, aiming to improve both the effectiveness and efficiency of these systems. The primary hypotheses tested in the study are: H1: A metaheuristic approach can effectively implement a method for computer-aided plagiarism detection. H2: Parallelization of the developed method improves the system's response time. H3: Using evolutionary algorithms for parameter selection significantly improves system performance. Plagiarism, both intentional and unintentional, poses ethical challenges and undermines the integrity of intellectual work. Existing systems for plagiarism detection primarily rely on exhaustive comparisons, which are computationally intensive. Metaheuristics, such as Genetic Algorithms (GA), Ant Colony Optimization (ACO), and Particle Swarm Optimization (PSO), offer innovative solutions for optimizing the selection and comparison of documents. This research integrates these techniques into a framework that addresses the limitations of traditional plagiarism detection methods. The research was conducted in distinct phases: 1. Document Corpus Creation - a repository was established, including preprocessing steps like tokenization, lemmatization and feature extraction using CountVectorizer, TF-IDF, LDA, and Word2Vec; 2. Similarity Method Development - a novel method combining multiple algorithms was developed to compute document similarity efficiently; 3. Metaheuristic Integration - models using GA, ACO and PSO were implemented to intelligently select subsets of documents for comparison; 4. Performance Optimization - GA was employed to tune parameters for the best performing model; 5. Parallelization - parallel processing techniques were applied to improve the system ´s response time; 6. Evaluation - comparative analysis using standard measures such as precision, recall, F1-score, and execution time. The study successfully validated the proposed hypotheses: H1 Validation: metaheuristic models demonstrated effective implementation for plagiarism detection. H2 Validation: parallelization reduced response times significantly. H3 Validation: parameter optimization using GA enhanced detection accuracy and efficiency, leading to statistically significant performance improvements. The integrated system showed superior results in F1-score, precision and recall metrics while achieving a notable reduction in execution time. This dissertation makes several contributions to the field: • Development of a hybrid similarity detection method combining algorithms such as CountVectorizer, TF-IDF, LDA and Word2Vec. • Implementation of a metaheuristic-based framework for optimizing document selection. • Introduction of parallelization to further enhance computational performance. • Systematic evaluation of the impact of metaheuristics on plagiarism detection systems. • Empirical validation of the hypotheses through comprehensive testing and analysis. The metaheuristic approach to plagiarism detection addresses key limitations of traditional methods, offering improved performance and scalability. The study highlights the potential of combining optimization algorithms with parallel processing for large scale applications. Future work could explore hybrid metaheuristic strategies and adaptive systems to further refine these models.<br />Disertacija istražuje primjenu metaheurističkih pristupa u računalno potpomognutoj detekciji plagijata, s posebnim naglaskom na optimizaciju sustava i poboljšanje performansi. Glavni cilj istraživanja je istražiti mogućnost razvoja modela računalno potpomognute detekcije plagijata temeljenih na primjeni metaheuristika s ciljem efikasnijeg odabira skupa dokumenata za usporedbu sa sumnjivim dokumentom. U tu svrhu analizirane su mogućnosti primjene genetskih algoritama, optimizacije kolonijom mrava i optimizacije rojem čestica. Istraživanje je provedeno u pet faza: prikupljanje i obrada dokumenata nad kojima će se vršiti testiranje, razvoj optimalne metode za izračun sličnosti između dokumenata, razvoj sustava za računalno potpomognutu detekciju plagijata koji nije temeljen na metaheuristikama, zatim razvoj sustava za računalno potpomognutu detekciju plagijata koji je temeljen na metaheuristikama te evaluaciju performansi razvijenih sustava. Modeli su evaluirani koristeći standardne mjere za učinkovitost sustava za računalno potpomognutu detekciju plagijata. Za model s najboljim rezultatima je u kasnijim fazama istraživanja naglasak stavljen na optimizaciju parametara korištenjem genetskih algoritama te implementaciju paralelizacije u prikladne dijelove sustava, čime se dodatno povećala učinkovitost. Rezultati istraživanja pokazuju da modeli temeljeni na metaheuristikama mogu ostvariti značajna poboljšanja u usporedbi s tradicionalnim metodama, bez da se značajno naruše performanse detekcije plagijata. Disertacija pridonosi području detekcije plagijata kroz razvoj novih metodoloških pristupa, analizu primjene metaheuristika i sustavnu evaluaciju performansi.