Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights
Plagiarism detection (PD) in natural language processing involves locating similar words in two distinct sources. The paper introduces a new approach to plagiarism detection utilizing bidirectional encoder representations from transformers (BERT)-generated embedding, an enhanced artificial bee colon...
Uloženo v:
| Vydáno v: | Expert systems with applications Ročník 238; s. 122088 |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier Ltd
15.03.2024
|
| Témata: | |
| ISSN: | 0957-4174, 1873-6793 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Plagiarism detection (PD) in natural language processing involves locating similar words in two distinct sources. The paper introduces a new approach to plagiarism detection utilizing bidirectional encoder representations from transformers (BERT)-generated embedding, an enhanced artificial bee colony (ABC) optimization algorithm for pre-training, and a training process based on reinforcement learning (RL). The BERT model can be incorporated into a subsequent task and meticulously refined to function as a model, enabling it to apprehend a variety of linguistic characteristics. Imbalanced classification is one of the fundamental obstacles to PD. To handle this predicament, we present a novel methodology utilizing RL, in which the problem is framed as a series of sequential decisions in which an agent receives a reward at each level for classifying a received instance. To address the disparity between classes, it is determined that the majority class will receive a lower reward than the minority class. We also focus on the training stage, which often utilizes gradient-based learning techniques like backpropagation (BP), leading to certain drawbacks such as sensitivity to initialization. In our proposed model, we utilize a mutual learning-based ABC (ML-ABC) approach that adjusts the food source with the most beneficial results for the candidate by considering a mutual learning factor that incorporates the initial weight. We evaluated the efficacy of our novel approach by contrasting its results with those of population-based techniques using three standard datasets, namely Stanford Natural Language Inference (SNLI), Microsoft Research Paraphrase Corpus (MSRP), and Semantic Evaluation Database (SemEval2014). Our model attained excellent results that outperformed state-of-the-art models. Optimal values for important parameters, including reward function are identified for the model based on experiments on the study dataset. Ablation studies that exclude the proposed ML-ABC and reinforcement learning from the model confirm the independent positive incremental impact of these components on model performance.
•BERT-based plagiarism detection with RL and ML-ABC.•Reward function in RL improves detection of minority plagiarism class.•Model outperforms state-of-the-art on SNLI, MSRP, SemEval2014 datasets.•Ablation studies highlight the impact of ML-ABC and RL on performance. |
|---|---|
| ISSN: | 0957-4174 1873-6793 |
| DOI: | 10.1016/j.eswa.2023.122088 |