Zobraziť v EDS

A Filter-Based Feature Selection Framework to Detect Phishing URLs Using Stacking Ensemble Machine Learning.

Uložené v:

Podrobná bibliografia
Názov:	A Filter-Based Feature Selection Framework to Detect Phishing URLs Using Stacking Ensemble Machine Learning.
Autori:	Bari, Nimra, Saleem, Tahir, Shah, Munam, Algarni, Abdulmohsen, Patel, Asma, Ullah, Insaf
Zdroj:	Computer Modeling in Engineering & Sciences (CMES); 2025, Vol. 145 Issue 1, p1167-1187, 21p
Predmety:	PHISHING, FEATURE selection, ENSEMBLE learning, ERROR rates, PHISHING prevention, MACHINE learning, DATA analysis, STATISTICAL accuracy
Abstrakt:	Today, phishing is an online attack designed to obtain sensitive information such as credit card and bank account numbers, passwords, and usernames. We can find several anti-phishing solutions, such as heuristic detection, virtual similarity detection, black and white lists, and machine learning (ML). However, phishing attempts remain a problem, and establishing an effective anti-phishing strategy is a work in progress. Furthermore, while most anti-phishing solutions achieve the highest levels of accuracy on a given dataset, their methods suffer from an increased number of false positives. These methods are ineffective against zero-hour attacks. Phishing sites with a high False Positive Rate (FPR) are considered genuine because they can cause people to lose a lot of money by visiting them. Feature selection is critical when developing phishing detection strategies. Good feature selection helps improve accuracy; however, duplicate features can also increase noise in the dataset and reduce the accuracy of the algorithm. Therefore, a combination of filter-based feature selection methods is proposed to detect phishing attacks, including constant feature removal, duplicate feature removal, quasi-feature removal, correlated feature removal, mutual information extraction, and Analysis of Variance (ANOVA) testing. The technique has been tested with different Machine Learning classifiers: Random Forest, Artificial Neural Network (ANN), Ada-Boost, Extreme Gradient Boosting (XGBoost), Logistic Regression, Decision Trees, Gradient Boosting Classifiers, Support Vector Machine (SVM), and two types of ensemble models, stacking and majority voting to gain A low false positive rate is achieved. Stacked ensemble classifiers (gradient boosting, random forest, support vector machine) achieve 1.31% FPR and 98.17% accuracy on Dataset 1, 2.81% FPR and Dataset 3 shows 2.81% FPR and 97.61% accuracy, while Dataset 2 shows 3.47% FPR and 96.47% accuracy. [ABSTRACT FROM AUTHOR]
	Copyright of Computer Modeling in Engineering & Sciences (CMES) is the property of Tech Science Press and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáza:	Complementary Index

Full Text Finder

Nájsť tento článok vo Web of Science

Popis
Abstrakt:	Today, phishing is an online attack designed to obtain sensitive information such as credit card and bank account numbers, passwords, and usernames. We can find several anti-phishing solutions, such as heuristic detection, virtual similarity detection, black and white lists, and machine learning (ML). However, phishing attempts remain a problem, and establishing an effective anti-phishing strategy is a work in progress. Furthermore, while most anti-phishing solutions achieve the highest levels of accuracy on a given dataset, their methods suffer from an increased number of false positives. These methods are ineffective against zero-hour attacks. Phishing sites with a high False Positive Rate (FPR) are considered genuine because they can cause people to lose a lot of money by visiting them. Feature selection is critical when developing phishing detection strategies. Good feature selection helps improve accuracy; however, duplicate features can also increase noise in the dataset and reduce the accuracy of the algorithm. Therefore, a combination of filter-based feature selection methods is proposed to detect phishing attacks, including constant feature removal, duplicate feature removal, quasi-feature removal, correlated feature removal, mutual information extraction, and Analysis of Variance (ANOVA) testing. The technique has been tested with different Machine Learning classifiers: Random Forest, Artificial Neural Network (ANN), Ada-Boost, Extreme Gradient Boosting (XGBoost), Logistic Regression, Decision Trees, Gradient Boosting Classifiers, Support Vector Machine (SVM), and two types of ensemble models, stacking and majority voting to gain A low false positive rate is achieved. Stacked ensemble classifiers (gradient boosting, random forest, support vector machine) achieve 1.31% FPR and 98.17% accuracy on Dataset 1, 2.81% FPR and Dataset 3 shows 2.81% FPR and 97.61% accuracy, while Dataset 2 shows 3.47% FPR and 96.47% accuracy. [ABSTRACT FROM AUTHOR]
ISSN:	15261492
DOI:	10.32604/cmes.2025.070311