SOLVING THE PROBLEM OF DETECTING PHISHING WEBSITES USING ENSEMBLE LEARNING MODELS

Due to the popularity of the easiest way to obtain personal information among attackers, phishing detection is becoming a popular area for research aimed at countering the implementation of such attacks. Malicious website detection is essential to prevent the spread of malware and protect end users...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Scientific journal of Astana IT University (Online) Ročník 12; s. 55 - 64
Hlavní autori: Kaibassova, Dinara, Nurtay, Margulan, Tau, Ardak, Kissina, Mira
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Astana IT University 30.12.2022
Predmet:
ISSN:2707-9031, 2707-904X
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Due to the popularity of the easiest way to obtain personal information among attackers, phishing detection is becoming a popular area for research aimed at countering the implementation of such attacks. Malicious website detection is essential to prevent the spread of malware and protect end users from victims. Unfortunately, malicious URL detection still needs to be better understood due to a lack of features and inaccurate classification. Possible sources were examined in order to investigate the subject. Based on the collected information from previous studies, this study is devoted to solving the problem of detecting phishing websites using Ensemble Learning. The aim of the work is to choose the most optimal algorithm for classifying phishing websites using gradient boosting algorithms. AdaBoost, CatBoost, and Gradient Boosting Classifier were chosen as Ensemble Learning algorithms and were used to improve the efficiency of classifiers. Practical studies of the parameters of each algorithm for finding the optimal classification model are given. Research and experiments were carried out on a dataset containing information extracted from the contents of a URL: main URL, domain, directory, and file. A thorough Exploratory Data Analysis (EDA) was carried out, as a result of which the main dependencies and patterns of determining phishing resources were identified using correlation analysis. ROC AUC Score was chosen as an evaluation metric for the algorithms. The best result for predicting phishing websites was demonstrated by the AdaBoost Classifier algorithm, with an average ROC AUC score of 99%. The results of the experiments were illustrated in the form of graphs and tables.
AbstractList Due to the popularity of the easiest way to obtain personal information among attackers, phishing detection is becoming a popular area for research aimed at countering the implementation of such attacks. Malicious website detection is essential to prevent the spread of malware and protect end users from victims. Unfortunately, malicious URL detection still needs to be better understood due to a lack of features and inaccurate classification. Possible sources were examined in order to investigate the subject. Based on the collected information from previous studies, this study is devoted to solving the problem of detecting phishing websites using Ensemble Learning. The aim of the work is to choose the most optimal algorithm for classifying phishing websites using gradient boosting algorithms. AdaBoost, CatBoost, and Gradient Boosting Classifier were chosen as Ensemble Learning algorithms and were used to improve the efficiency of classifiers. Practical studies of the parameters of each algorithm for finding the optimal classification model are given. Research and experiments were carried out on a dataset containing information extracted from the contents of a URL: main URL, domain, directory, and file. A thorough Exploratory Data Analysis (EDA) was carried out, as a result of which the main dependencies and patterns of determining phishing resources were identified using correlation analysis. ROC AUC Score was chosen as an evaluation metric for the algorithms. The best result for predicting phishing websites was demonstrated by the AdaBoost Classifier algorithm, with an average ROC AUC score of 99%. The results of the experiments were illustrated in the form of graphs and tables.
Author Kissina, Mira
Nurtay, Margulan
Kaibassova, Dinara
Tau, Ardak
Author_xml – sequence: 1
  givenname: Dinara
  orcidid: 0000-0002-8410-7758
  surname: Kaibassova
  fullname: Kaibassova, Dinara
– sequence: 2
  givenname: Margulan
  orcidid: 0000-0002-0786-6195
  surname: Nurtay
  fullname: Nurtay, Margulan
– sequence: 3
  givenname: Ardak
  orcidid: 0000-0003-4883-6328
  surname: Tau
  fullname: Tau, Ardak
– sequence: 4
  givenname: Mira
  orcidid: 0000-0003-2232-1203
  surname: Kissina
  fullname: Kissina, Mira
BookMark eNpFkFtLw0AQhRepYK198g_kXap7Szb72Mu2CaRN7aZensIkuyuRaiTxxX9v0kqFgTlzZvgYzjUafNafFqFbgu-ZkJw9EJq-7jRnklygIRVYTCTmL4OzZuQKjdu2KrCPBQt9IoboUafJU7xZeVmkvO0unSVq7aVLb6EyNc_6xTaKddSLZzXTcaa0t9f9qDZarbtzL1HT3aZ31ulCJfoGXTo4tHb810dov1TZPJok6SqeT5NJSboPJ4aBA3A0kIZasExyAAE-MdYZy5mlpTDYUW58SoBx4ktnCJYyDLvCDNgIxSeuqeE9_2qqD2h-8hqq_GjUzVsOzXdVHmxeUCqkC0IRgORFQaUT0ncmkF0GJXd-x7o7scqmbtvGujOP4PwYbv4fLvsFXPFmXg
ContentType Journal Article
DBID AAYXX
CITATION
DOA
DOI 10.37943/12OYRS4391
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
CrossRef
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
EISSN 2707-904X
EndPage 64
ExternalDocumentID oai_doaj_org_article_b2279f6876a94bb29f795fd69851c4f5
10_37943_12OYRS4391
GroupedDBID AAYXX
ALMA_UNASSIGNED_HOLDINGS
ARCSS
CITATION
EN8
GROUPED_DOAJ
ID FETCH-LOGICAL-c1391-d3afaaf269d2eae394aa7a51defde43e2c7d0f24d521a34159fd10998898803a3
IEDL.DBID DOA
ISSN 2707-9031
IngestDate Fri Oct 03 12:44:04 EDT 2025
Sat Nov 29 04:09:12 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License https://creativecommons.org/licenses/by-nc-nd/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1391-d3afaaf269d2eae394aa7a51defde43e2c7d0f24d521a34159fd10998898803a3
ORCID 0000-0003-4883-6328
0000-0003-2232-1203
0000-0002-8410-7758
0000-0002-0786-6195
OpenAccessLink https://doaj.org/article/b2279f6876a94bb29f795fd69851c4f5
PageCount 10
ParticipantIDs doaj_primary_oai_doaj_org_article_b2279f6876a94bb29f795fd69851c4f5
crossref_primary_10_37943_12OYRS4391
PublicationCentury 2000
PublicationDate 2022-12-30
PublicationDateYYYYMMDD 2022-12-30
PublicationDate_xml – month: 12
  year: 2022
  text: 2022-12-30
  day: 30
PublicationDecade 2020
PublicationTitle Scientific journal of Astana IT University (Online)
PublicationYear 2022
Publisher Astana IT University
Publisher_xml – name: Astana IT University
SSID ssib050738517
ssj0002873317
Score 2.2053852
Snippet Due to the popularity of the easiest way to obtain personal information among attackers, phishing detection is becoming a popular area for research aimed at...
SourceID doaj
crossref
SourceType Open Website
Index Database
StartPage 55
SubjectTerms ensemble learning
gradient boosting
imbalanced classification
phishing detection
Title SOLVING THE PROBLEM OF DETECTING PHISHING WEBSITES USING ENSEMBLE LEARNING MODELS
URI https://doaj.org/article/b2279f6876a94bb29f795fd69851c4f5
Volume 12
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2707-904X
  dateEnd: 20241231
  omitProxy: false
  ssIdentifier: ssj0002873317
  issn: 2707-9031
  databaseCode: DOA
  dateStart: 20200101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2707-904X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssib050738517
  issn: 2707-9031
  databaseCode: M~E
  dateStart: 20200101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELZQxcCCQIAoL3noGjWxnYfHPtwWKY_SFChT5MS2xFJQKYz8ds5JKWViQYoyOFYU3V1y39m570OoYyo_ikzAHFJp4jCIKYdTZum1ocD1pKGhrNn14zBNo8WCT3ekvuw_YQ09cGO4bmkp7kwAL63krCwJNyH3jQo4QIWKmZq9FFDPTjEFkQQgh1rR-e1qC9QFlNbyuyR0Q4dDKDfNetQSpHU9kj3NctuE-is97bD41-lmdIQONzgR95rnO0Z7enmC7vIsfrhNx3g-EXg6y_qxSHA2wkMxF4O5vTCdNAtQ-FH0c_gq5djqaoyxSHORwHQci94stSNJNhRxforuR2I-mDgbTQSnAqzmOYpKI6UhAVdES005kzKUvqe0UZpRTapQuYYwBWlZQobyuVF28yuK4HCppGeotXxZ6nOEA1NRWZYAGBhntgzxIxMpBZDGqvQR1Uadb1MUrw31RQElQ22x4sdibdS3ZtpOsXzV9QB4sdh4sfjLixf_cZNLdEBsc4KlYXSvUGu9etfXaL_6WD-_rW7qAIFz8im-AHAXtGM
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SOLVING+THE+PROBLEM+OF+DETECTING+PHISHING+WEBSITES+USING+ENSEMBLE+LEARNING+MODELS&rft.jtitle=Scientific+journal+of+Astana+IT+University+%28Online%29&rft.au=Kaibassova%2C+Dinara&rft.au=Nurtay%2C+Margulan&rft.au=Tau%2C+Ardak&rft.au=Kissina%2C+Mira&rft.date=2022-12-30&rft.issn=2707-9031&rft.eissn=2707-904X&rft.spage=55&rft.epage=64&rft_id=info:doi/10.37943%2F12OYRS4391&rft.externalDBID=n%2Fa&rft.externalDocID=10_37943_12OYRS4391
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2707-9031&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2707-9031&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2707-9031&client=summon