Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights

Plagiarism detection (PD) in natural language processing involves locating similar words in two distinct sources. The paper introduces a new approach to plagiarism detection utilizing bidirectional encoder representations from transformers (BERT)-generated embedding, an enhanced artificial bee colon...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Expert systems with applications Ročník 238; s. 122088
Hlavní autoři:	Xiong, Jiale, Yang, Jing, Yan, Lei, Awais, Muhammad, Khan, Abdullah Ayub, Alizadehsani, Roohallah, Acharya, U. Rajendra
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier Ltd 15.03.2024
Témata:	Artificial bee colony Bidirectional encoder representations from transformers Plagiarism detection Reinforcement learning Unbalanced classification Bidirectional encoder representations from transformers Artificial bee colony Unbalanced classification Reinforcement learning Plagiarism detection
ISSN:	0957-4174, 1873-6793
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Plagiarism detection (PD) in natural language processing involves locating similar words in two distinct sources. The paper introduces a new approach to plagiarism detection utilizing bidirectional encoder representations from transformers (BERT)-generated embedding, an enhanced artificial bee colony (ABC) optimization algorithm for pre-training, and a training process based on reinforcement learning (RL). The BERT model can be incorporated into a subsequent task and meticulously refined to function as a model, enabling it to apprehend a variety of linguistic characteristics. Imbalanced classification is one of the fundamental obstacles to PD. To handle this predicament, we present a novel methodology utilizing RL, in which the problem is framed as a series of sequential decisions in which an agent receives a reward at each level for classifying a received instance. To address the disparity between classes, it is determined that the majority class will receive a lower reward than the minority class. We also focus on the training stage, which often utilizes gradient-based learning techniques like backpropagation (BP), leading to certain drawbacks such as sensitivity to initialization. In our proposed model, we utilize a mutual learning-based ABC (ML-ABC) approach that adjusts the food source with the most beneficial results for the candidate by considering a mutual learning factor that incorporates the initial weight. We evaluated the efficacy of our novel approach by contrasting its results with those of population-based techniques using three standard datasets, namely Stanford Natural Language Inference (SNLI), Microsoft Research Paraphrase Corpus (MSRP), and Semantic Evaluation Database (SemEval2014). Our model attained excellent results that outperformed state-of-the-art models. Optimal values for important parameters, including reward function are identified for the model based on experiments on the study dataset. Ablation studies that exclude the proposed ML-ABC and reinforcement learning from the model confirm the independent positive incremental impact of these components on model performance. •BERT-based plagiarism detection with RL and ML-ABC.•Reward function in RL improves detection of minority plagiarism class.•Model outperforms state-of-the-art on SNLI, MSRP, SemEval2014 datasets.•Ablation studies highlight the impact of ML-ABC and RL on performance.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2023.122088