A parallel hybrid approach integrating clonal selection with artificial bee colony for logistic regression in spam email detection

Spam emails are sent to recipients for advertisement and phishing purposes. In either case, it disturbs recipients and reduces communication quality. Addressing this issue requires classifying emails on servers as either spam or ham. Numerous methods have been proposed for this classification task....

Full description

Saved in:
Bibliographic Details
Published in:Neural computing & applications Vol. 37; no. 27; pp. 22401 - 22419
Main Authors: Dedeturk, Bilge Kagan, Akay, Bahriye
Format: Journal Article
Language:English
Published: London Springer London 01.09.2025
Springer Nature B.V
Subjects:
ISSN:0941-0643, 1433-3058
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Spam emails are sent to recipients for advertisement and phishing purposes. In either case, it disturbs recipients and reduces communication quality. Addressing this issue requires classifying emails on servers as either spam or ham. Numerous methods have been proposed for this classification task. Among them, logistic regression (LR) stands out for its simplicity, speed, and ease of implementation. However, LR suffers from low detection rates caused by the gradient descent algorithm used in its training phase. To overcome this limitation, we propose a novel method based on the clonal selection algorithm (CSA), renowned for its success in optimization problems due to its local and global search capabilities. Despite CSA’s effective optimization performance, it suffers from robustness and slow training time. Therefore, the CSA and artificial bee colony (ABC) algorithms are hybridized to improve CSA’s robustness and are parallelized to reduce the training time significantly. This hybrid method is employed to optimize the weights of LR by minimizing the cost at the output of LR. The empirical results denote that the proposed method, named CSA–ABC–LR, yields better classification performance compared to state-of-the-art models reported by previous studies, demonstrating an accuracy rate of 99.13% on the Enron-1 dataset, 99.22% on the CSDMC2010 dataset, and 94.49% on the Spambase dataset.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-024-10505-7