SOLVING THE PROBLEM OF DETECTING PHISHING WEBSITES USING ENSEMBLE LEARNING MODELS
Due to the popularity of the easiest way to obtain personal information among attackers, phishing detection is becoming a popular area for research aimed at countering the implementation of such attacks. Malicious website detection is essential to prevent the spread of malware and protect end users...
Uložené v:
| Vydané v: | Scientific journal of Astana IT University (Online) Ročník 12; s. 55 - 64 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Astana IT University
30.12.2022
|
| Predmet: | |
| ISSN: | 2707-9031, 2707-904X |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Due to the popularity of the easiest way to obtain personal information among attackers, phishing detection is becoming a popular area for research aimed at countering the implementation of such attacks. Malicious website detection is essential to prevent the spread of malware and protect end users from victims. Unfortunately, malicious URL detection still needs to be better understood due to a lack of features and inaccurate classification. Possible sources were examined in order to investigate the subject. Based on the collected information from previous studies, this study is devoted to solving the problem of detecting phishing websites using Ensemble Learning. The aim of the work is to choose the most optimal algorithm for classifying phishing websites using gradient boosting algorithms. AdaBoost, CatBoost, and Gradient Boosting Classifier were chosen as Ensemble Learning algorithms and were used to improve the efficiency of classifiers. Practical studies of the parameters of each algorithm for finding the optimal classification model are given. Research and experiments were carried out on a dataset containing information extracted from the contents of a URL: main URL, domain, directory, and file. A thorough Exploratory Data Analysis (EDA) was carried out, as a result of which the main dependencies and patterns of determining phishing resources were identified using correlation analysis. ROC AUC Score was chosen as an evaluation metric for the algorithms. The best result for predicting phishing websites was demonstrated by the AdaBoost Classifier algorithm, with an average ROC AUC score of 99%. The results of the experiments were illustrated in the form of graphs and tables. |
|---|---|
| AbstractList | Due to the popularity of the easiest way to obtain personal information among attackers, phishing detection is becoming a popular area for research aimed at countering the implementation of such attacks. Malicious website detection is essential to prevent the spread of malware and protect end users from victims. Unfortunately, malicious URL detection still needs to be better understood due to a lack of features and inaccurate classification. Possible sources were examined in order to investigate the subject. Based on the collected information from previous studies, this study is devoted to solving the problem of detecting phishing websites using Ensemble Learning. The aim of the work is to choose the most optimal algorithm for classifying phishing websites using gradient boosting algorithms. AdaBoost, CatBoost, and Gradient Boosting Classifier were chosen as Ensemble Learning algorithms and were used to improve the efficiency of classifiers. Practical studies of the parameters of each algorithm for finding the optimal classification model are given. Research and experiments were carried out on a dataset containing information extracted from the contents of a URL: main URL, domain, directory, and file. A thorough Exploratory Data Analysis (EDA) was carried out, as a result of which the main dependencies and patterns of determining phishing resources were identified using correlation analysis. ROC AUC Score was chosen as an evaluation metric for the algorithms. The best result for predicting phishing websites was demonstrated by the AdaBoost Classifier algorithm, with an average ROC AUC score of 99%. The results of the experiments were illustrated in the form of graphs and tables. |
| Author | Kissina, Mira Nurtay, Margulan Kaibassova, Dinara Tau, Ardak |
| Author_xml | – sequence: 1 givenname: Dinara orcidid: 0000-0002-8410-7758 surname: Kaibassova fullname: Kaibassova, Dinara – sequence: 2 givenname: Margulan orcidid: 0000-0002-0786-6195 surname: Nurtay fullname: Nurtay, Margulan – sequence: 3 givenname: Ardak orcidid: 0000-0003-4883-6328 surname: Tau fullname: Tau, Ardak – sequence: 4 givenname: Mira orcidid: 0000-0003-2232-1203 surname: Kissina fullname: Kissina, Mira |
| BookMark | eNpFkFtLw0AQhRepYK198g_kXap7Szb72Mu2CaRN7aZensIkuyuRaiTxxX9v0kqFgTlzZvgYzjUafNafFqFbgu-ZkJw9EJq-7jRnklygIRVYTCTmL4OzZuQKjdu2KrCPBQt9IoboUafJU7xZeVmkvO0unSVq7aVLb6EyNc_6xTaKddSLZzXTcaa0t9f9qDZarbtzL1HT3aZ31ulCJfoGXTo4tHb810dov1TZPJok6SqeT5NJSboPJ4aBA3A0kIZasExyAAE-MdYZy5mlpTDYUW58SoBx4ktnCJYyDLvCDNgIxSeuqeE9_2qqD2h-8hqq_GjUzVsOzXdVHmxeUCqkC0IRgORFQaUT0ncmkF0GJXd-x7o7scqmbtvGujOP4PwYbv4fLvsFXPFmXg |
| ContentType | Journal Article |
| DBID | AAYXX CITATION DOA |
| DOI | 10.37943/12OYRS4391 |
| DatabaseName | CrossRef DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
| DeliveryMethod | fulltext_linktorsrc |
| EISSN | 2707-904X |
| EndPage | 64 |
| ExternalDocumentID | oai_doaj_org_article_b2279f6876a94bb29f795fd69851c4f5 10_37943_12OYRS4391 |
| GroupedDBID | AAYXX ALMA_UNASSIGNED_HOLDINGS ARCSS CITATION EN8 GROUPED_DOAJ |
| ID | FETCH-LOGICAL-c1391-d3afaaf269d2eae394aa7a51defde43e2c7d0f24d521a34159fd10998898803a3 |
| IEDL.DBID | DOA |
| ISSN | 2707-9031 |
| IngestDate | Fri Oct 03 12:44:04 EDT 2025 Sat Nov 29 04:09:12 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| License | https://creativecommons.org/licenses/by-nc-nd/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c1391-d3afaaf269d2eae394aa7a51defde43e2c7d0f24d521a34159fd10998898803a3 |
| ORCID | 0000-0003-4883-6328 0000-0003-2232-1203 0000-0002-8410-7758 0000-0002-0786-6195 |
| OpenAccessLink | https://doaj.org/article/b2279f6876a94bb29f795fd69851c4f5 |
| PageCount | 10 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_b2279f6876a94bb29f795fd69851c4f5 crossref_primary_10_37943_12OYRS4391 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-12-30 |
| PublicationDateYYYYMMDD | 2022-12-30 |
| PublicationDate_xml | – month: 12 year: 2022 text: 2022-12-30 day: 30 |
| PublicationDecade | 2020 |
| PublicationTitle | Scientific journal of Astana IT University (Online) |
| PublicationYear | 2022 |
| Publisher | Astana IT University |
| Publisher_xml | – name: Astana IT University |
| SSID | ssib050738517 ssj0002873317 |
| Score | 2.2053852 |
| Snippet | Due to the popularity of the easiest way to obtain personal information among attackers, phishing detection is becoming a popular area for research aimed at... |
| SourceID | doaj crossref |
| SourceType | Open Website Index Database |
| StartPage | 55 |
| SubjectTerms | ensemble learning gradient boosting imbalanced classification phishing detection |
| Title | SOLVING THE PROBLEM OF DETECTING PHISHING WEBSITES USING ENSEMBLE LEARNING MODELS |
| URI | https://doaj.org/article/b2279f6876a94bb29f795fd69851c4f5 |
| Volume | 12 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2707-904X dateEnd: 20241231 omitProxy: false ssIdentifier: ssj0002873317 issn: 2707-9031 databaseCode: DOA dateStart: 20200101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2707-904X dateEnd: 99991231 omitProxy: false ssIdentifier: ssib050738517 issn: 2707-9031 databaseCode: M~E dateStart: 20200101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELZQxcCCQIAoL3noGjWxnYfHPtwWKY_SFChT5MS2xFJQKYz8ds5JKWViQYoyOFYU3V1y39m570OoYyo_ikzAHFJp4jCIKYdTZum1ocD1pKGhrNn14zBNo8WCT3ekvuw_YQ09cGO4bmkp7kwAL63krCwJNyH3jQo4QIWKmZq9FFDPTjEFkQQgh1rR-e1qC9QFlNbyuyR0Q4dDKDfNetQSpHU9kj3NctuE-is97bD41-lmdIQONzgR95rnO0Z7enmC7vIsfrhNx3g-EXg6y_qxSHA2wkMxF4O5vTCdNAtQ-FH0c_gq5djqaoyxSHORwHQci94stSNJNhRxforuR2I-mDgbTQSnAqzmOYpKI6UhAVdES005kzKUvqe0UZpRTapQuYYwBWlZQobyuVF28yuK4HCppGeotXxZ6nOEA1NRWZYAGBhntgzxIxMpBZDGqvQR1Uadb1MUrw31RQElQ22x4sdibdS3ZtpOsXzV9QB4sdh4sfjLixf_cZNLdEBsc4KlYXSvUGu9etfXaL_6WD-_rW7qAIFz8im-AHAXtGM |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SOLVING+THE+PROBLEM+OF+DETECTING+PHISHING+WEBSITES+USING+ENSEMBLE+LEARNING+MODELS&rft.jtitle=Scientific+journal+of+Astana+IT+University+%28Online%29&rft.au=Kaibassova%2C+Dinara&rft.au=Nurtay%2C+Margulan&rft.au=Tau%2C+Ardak&rft.au=Kissina%2C+Mira&rft.date=2022-12-30&rft.issn=2707-9031&rft.eissn=2707-904X&rft.spage=55&rft.epage=64&rft_id=info:doi/10.37943%2F12OYRS4391&rft.externalDBID=n%2Fa&rft.externalDocID=10_37943_12OYRS4391 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2707-9031&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2707-9031&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2707-9031&client=summon |