PhiKitA: Phishing Kit Attacks dataset for Phishing Websites Identification
Recent studies have shown that phishers are using phishing kits to deploy phishing attacks faster, easier and more massive. Detecting phishing kits in deployed websites might help to detect phishing campaigns earlier. To the best of our knowledge, there are no datasets providing a set of phishing ki...
Gespeichert in:
| Veröffentlicht in: | IEEE access Jg. 11; S. 1 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Piscataway
IEEE
01.01.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Schlagworte: | |
| ISSN: | 2169-3536, 2169-3536 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Recent studies have shown that phishers are using phishing kits to deploy phishing attacks faster, easier and more massive. Detecting phishing kits in deployed websites might help to detect phishing campaigns earlier. To the best of our knowledge, there are no datasets providing a set of phishing kits that are used in websites that were attacked by phishing. In this work, we propose PhiKitA, a novel dataset that contains phishing kits and also phishing websites generated using these kits. We have applied MD5 hashes, fingerprints, and graph representation DOM algorithms to obtain baseline results in PhiKitA in three experiments: familiarity analysis of phishing kit samples, phishing website detection and identifying the source of a phishing website. In the familiarity analysis, we find evidence of different types of phishing kits and a small phishing campaign. In the binary classification problem for phishing detection, the graph representation algorithm achieved an accuracy of 92.50%, showing that the phishing kit data contain useful information to classify phishing. Finally, the MD5 hash representation obtained a 39.54% F1 score, which means that this algorithm does not extract enough information to distinguish phishing websites and their phishing kit sources properly. |
|---|---|
| AbstractList | Recent studies have shown that phishers are using phishing kits to deploy phishing attacks faster, easier and more massive. Detecting phishing kits in deployed websites might help to detect phishing campaigns earlier. To the best of our knowledge, there are no datasets providing a set of phishing kits that are used in websites that were attacked by phishing. In this work, we propose PhiKitA, a novel dataset that contains phishing kits and also phishing websites generated using these kits. We have applied MD5 hashes, fingerprints, and graph representation DOM algorithms to obtain baseline results in PhiKitA in three experiments: familiarity analysis of phishing kit samples, phishing website detection and identifying the source of a phishing website. In the familiarity analysis, we find evidence of different types of phishing kits and a small phishing campaign. In the binary classification problem for phishing detection, the graph representation algorithm achieved an accuracy of 92.50%, showing that the phishing kit data contain useful information to classify phishing. Finally, the MD5 hash representation obtained a 39.54% F1 score, which means that this algorithm does not extract enough information to distinguish phishing websites and their phishing kit sources properly. |
| Author | Alaiz-Rodriguez, Rocio Alegre, Enrique Castano, Felipe Fernandez, Eduardo Fidalgo |
| Author_xml | – sequence: 1 givenname: Felipe orcidid: 0000-0001-9157-4111 surname: Castano fullname: Castano, Felipe organization: Department of Electrical, Systems and Automation Engineering, Universidad de León, León, Spain – sequence: 2 givenname: Eduardo Fidalgo orcidid: 0000-0003-1202-5232 surname: Fernandez fullname: Fernandez, Eduardo Fidalgo organization: Department of Electrical, Systems and Automation Engineering, Universidad de León, León, Spain – sequence: 3 givenname: Rocio orcidid: 0000-0003-4164-5887 surname: Alaiz-Rodriguez fullname: Alaiz-Rodriguez, Rocio organization: Department of Electrical, Systems and Automation Engineering, Universidad de León, León, Spain – sequence: 4 givenname: Enrique orcidid: 0000-0003-2081-774X surname: Alegre fullname: Alegre, Enrique organization: Department of Electrical, Systems and Automation Engineering, Universidad de León, León, Spain |
| BookMark | eNp9UU1PGzEQtSoqQSm_gB5W4pzU9uz6g1sUQZsWqZUC4mh5vWNwCGtqm0P_fQ0bVMSBucx49N6bZ71PZG-MIxJyzOicMaq_LpbLs_V6zimHOXChKJcfyAFnQs-gA7H3at4nRzlvaC1VV508ID9-34afoSxOmzrk2zDeNPXZLEqx7i43gy02Y2l8TP8B19jnUDA3qwHHEnxwtoQ4fiYfvd1mPNr1Q3J1fna5_D67-PVttVxczFxLdZl1TiKgRlYNK8UGoallwirwEgftmGtl3_VCdtpTpZAOjCNtJQqrQYoW4JCsJt0h2o15SOHepr8m2mCeFzHdGJtKcFs0AjQMXutWStUqlD1H9L1U3FmvPPCqdTJpPaT45xFzMZv4mMZq33BVz0MnFa0oPaFcijkn9MaF8vznkmzYGkbNUxJmSsI8JWF2SVQuvOG-OH6f9WViBUR8xWAUlAD4B2owlNg |
| CODEN | IAECCG |
| CitedBy_id | crossref_primary_10_1007_s10462_024_11055_z crossref_primary_10_1109_ACCESS_2023_3293063 crossref_primary_10_1016_j_ipm_2024_103928 crossref_primary_10_1109_ACCESS_2025_3609252 crossref_primary_10_1007_s10660_025_10029_9 crossref_primary_10_26634_jdf_2_1_20840 crossref_primary_10_1109_ACCESS_2025_3525998 |
| Cites_doi | 10.1109/ACCESS.2021.3108222 10.1109/ACCESS.2022.3168681 10.1007/978-3-030-78375-4_11 10.1007/s00521-021-06401-z 10.1016/j.jisa.2022.103229 10.1016/j.diin.2019.05.004 10.14569/IJACSA.2020.0110523 10.1007/978-981-19-0604-6_9 10.1145/2976749.2978330 10.5220/0006074900150024 10.1016/j.eswa.2022.118010 10.1109/CAC51589.2020.9327200 10.1109/DSC49826.2021.9346256 10.7717/peerj-cs.868 10.1016/j.cosrev.2015.04.001 10.1109/SP40001.2021.00021 10.1016/j.compeleceng.2022.107716 10.1016/j.compeleceng.2022.107689 10.3390/s20164491 10.1109/ACCESS.2020.3048839 10.1049/cp.2019.1164 10.1016/j.aci.2020.01.002 10.1109/ICEE49691.2020.9249892 10.1109/TNSM.2022.3162885 10.1109/SP.2019.00049 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 |
| DBID | 97E ESBDL RIA RIE AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D DOA |
| DOI | 10.1109/ACCESS.2023.3268027 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE Xplore Open Access Journals IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts METADEX Technology Research Database Materials Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef Materials Research Database Engineered Materials Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace METADEX Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Materials Research Database |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Open Access Full Text url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 2169-3536 |
| EndPage | 1 |
| ExternalDocumentID | oai_doaj_org_article_6393df99477848e7b2eefb782caf8f32 10_1109_ACCESS_2023_3268027 10103863 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Instituto Nacional de Ciberseguridad funderid: 10.13039/501100013410 |
| GroupedDBID | 0R~ 5VS 6IK 97E AAJGR ABAZT ABVLG ACGFS ADBBV ALMA_UNASSIGNED_HOLDINGS BCNDV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS ESBDL GROUPED_DOAJ IPLJI JAVBF KQ8 M43 M~E O9- OCL OK1 RIA RIE RNS 4.4 AAYXX AGSQL CITATION EJD 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c409t-5c7e3e9e1202881d690a16a83f7ed9c1c47b5b6759f088e0d12e047e6a9376433 |
| IEDL.DBID | DOA |
| ISICitedReferencesCount | 10 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001033137500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2169-3536 |
| IngestDate | Fri Oct 03 12:32:19 EDT 2025 Mon Jun 30 04:15:10 EDT 2025 Sat Nov 29 04:02:34 EST 2025 Tue Nov 18 22:35:08 EST 2025 Wed Aug 27 02:21:20 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| License | https://creativecommons.org/licenses/by-nc-nd/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c409t-5c7e3e9e1202881d690a16a83f7ed9c1c47b5b6759f088e0d12e047e6a9376433 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0001-9157-4111 0000-0003-1202-5232 0000-0003-4164-5887 0000-0003-2081-774X |
| OpenAccessLink | https://doaj.org/article/6393df99477848e7b2eefb782caf8f32 |
| PQID | 2808835780 |
| PQPubID | 4845423 |
| PageCount | 1 |
| ParticipantIDs | crossref_citationtrail_10_1109_ACCESS_2023_3268027 doaj_primary_oai_doaj_org_article_6393df99477848e7b2eefb782caf8f32 proquest_journals_2808835780 crossref_primary_10_1109_ACCESS_2023_3268027 ieee_primary_10103863 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-01-01 |
| PublicationDateYYYYMMDD | 2023-01-01 |
| PublicationDate_xml | – month: 01 year: 2023 text: 2023-01-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Piscataway |
| PublicationPlace_xml | – name: Piscataway |
| PublicationTitle | IEEE access |
| PublicationTitleAbbrev | Access |
| PublicationYear | 2023 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref35 ref12 ref34 cova (ref21) 2008 ref15 ref31 casta no (ref22) 2022 ref11 ref33 ref10 ref32 ref2 ref17 ref16 ref19 ref18 oest (ref26) 2020 canali (ref30) 2013 já nez-martino (ref3) 2020 ref24 ref23 ref25 (ref14) 2022 ref28 union (ref1) 2021 bijmans (ref20) 2021 ref8 ref7 ref9 ref4 ref6 atkinson (ref29) 2001; 33 ref5 britt (ref27) 2012 |
| References_xml | – ident: ref4 doi: 10.1109/ACCESS.2021.3108222 – ident: ref33 doi: 10.1109/ACCESS.2022.3168681 – ident: ref18 doi: 10.1007/978-3-030-78375-4_11 – start-page: 361 year: 2020 ident: ref26 article-title: Sunrise to sunset: Analyzing the end-to-end life cycle and effectiveness of phishing attacks at scale publication-title: Proc 29th USENIX Secur Symp – start-page: 3757 year: 2021 ident: ref20 article-title: Catching phishers by their bait: Investigating the Dutch phishing landscape through phishing kit detection publication-title: Proc 30th USENIX Secur Symp (USENIX Secur ) – ident: ref34 doi: 10.1007/s00521-021-06401-z – ident: ref8 doi: 10.1016/j.jisa.2022.103229 – year: 2020 ident: ref3 article-title: Classification of spam emails through hierarchical clustering and supervised learning publication-title: arXiv 2005 08773 – ident: ref6 doi: 10.1016/j.diin.2019.05.004 – ident: ref16 doi: 10.14569/IJACSA.2020.0110523 – ident: ref9 doi: 10.1007/978-981-19-0604-6_9 – ident: ref31 doi: 10.1145/2976749.2978330 – start-page: 1 year: 2013 ident: ref30 article-title: Behind the scenes of online attacks: An analysis of exploitation behaviors on the web publication-title: Proc 20th Annu Netw Distrib Syst Secur Symp (NDSS) – ident: ref23 doi: 10.5220/0006074900150024 – start-page: 1 year: 2012 ident: ref27 article-title: Clustering potential phishing websites using DEEPMD5 publication-title: Proc 5th USENIX Workshop Large-Scale Exploits Emergent Threats (LEET) – start-page: 1 year: 2008 ident: ref21 article-title: There is no free phish: An analysis of 'free' and live phishing kits publication-title: Proc 2nd USENIX Workshop Offensive Technol (WOOT) – ident: ref32 doi: 10.1016/j.eswa.2022.118010 – ident: ref11 doi: 10.1109/CAC51589.2020.9327200 – ident: ref25 doi: 10.1109/DSC49826.2021.9346256 – ident: ref28 doi: 10.7717/peerj-cs.868 – year: 2021 ident: ref1 publication-title: Measuring digital development facts and figures – ident: ref17 doi: 10.1016/j.cosrev.2015.04.001 – ident: ref19 doi: 10.1109/SP40001.2021.00021 – volume: 33 start-page: 1 year: 2001 ident: ref29 article-title: Accessing hidden and hard-to-reach populations: Snowball research strategies publication-title: Social research Update – year: 2022 ident: ref14 publication-title: Phishing Activity Trends Report 2 Quarter – ident: ref5 doi: 10.1016/j.compeleceng.2022.107716 – ident: ref12 doi: 10.1016/j.compeleceng.2022.107689 – ident: ref10 doi: 10.3390/s20164491 – ident: ref15 doi: 10.1109/ACCESS.2020.3048839 – ident: ref7 doi: 10.1049/cp.2019.1164 – ident: ref2 doi: 10.1016/j.aci.2020.01.002 – ident: ref35 doi: 10.1109/ICEE49691.2020.9249892 – ident: ref13 doi: 10.1109/TNSM.2022.3162885 – year: 2022 ident: ref22 publication-title: Creation of a Phishing Kit Dataset for Phishing Websites Identification – ident: ref24 doi: 10.1109/SP.2019.00049 |
| SSID | ssj0000816957 |
| Score | 2.373549 |
| Snippet | Recent studies have shown that phishers are using phishing kits to deploy phishing attacks faster, easier and more massive. Detecting phishing kits in deployed... |
| SourceID | doaj proquest crossref ieee |
| SourceType | Open Website Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 1 |
| SubjectTerms | Algorithms Classification algorithms Computer crime Computer security Cyber threat intelligence Cyber threats Cybercrime Cybersecurity Datasets Feature extraction Graph representations Graphical representations Internet Phishing Phishing kits Social engineering Social engineering (security) Uniform resource locators Websites |
| SummonAdditionalLinks | – databaseName: IEEE Electronic Library (IEL) dbid: RIE link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT9wwEB0B6qEcCqUgtkCVA0eyJLET271tV0VVixAHqnKzYnsiVkK7iA39_Z1xzHalikpcoiRyPuwXx2_GnjcAp4FYsSew88AbKUPIW-1F7uvQKqmLGmUMFL5UV1f69tZcp2D1GAuDiHHxGY55N87lh4V_YlcZ9XCW827EJmwq1QzBWiuHCmeQMLVKykJlYc4n0ylVYswJwsfEUnTBqWPWRp8o0p-yqvzzK47jy8XOK99sF94lIplNBuTfwwbO92B7TV7wA3y_vpv9mPWTzxntRFdTRofZpO85sD7jxaFL7DOirX8L_ELH88nLbIjg7ZJLbx9-Xny9mX7LU-6E3JPF1ue1VyjQYEm118RJyQhuy6bVolMYjC-9VK52ZC2Yjv4zWISywkIqbFriK8RSxAFszRdzPITMOdbwCgIr18i2apwoZak6lK2vAt11BNVzm1qfhMU5v8W9jQZGYewAhGUgbAJiBGerix4GXY3_F__CYK2Ksih2PEEo2NTHLJEtETpjpFJaalSuQuwcUSDfdroT1Qj2Gbm15w2gjeD4GXubevDSVprahaWAio8vXHYEb_kVB3_MMWz1j094Am_87362fPwUP84_iELgKw priority: 102 providerName: IEEE |
| Title | PhiKitA: Phishing Kit Attacks dataset for Phishing Websites Identification |
| URI | https://ieeexplore.ieee.org/document/10103863 https://www.proquest.com/docview/2808835780 https://doaj.org/article/6393df99477848e7b2eefb782caf8f32 |
| Volume | 11 |
| WOSCitedRecordID | wos001033137500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Open Access Full Text customDbUrl: eissn: 2169-3536 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000816957 issn: 2169-3536 databaseCode: DOA dateStart: 20130101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2169-3536 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000816957 issn: 2169-3536 databaseCode: M~E dateStart: 20130101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LS8QwEA4iHvQgPnF90YNHq22TNom3dVXEFx4UvYU8prggq7jVo7_dSZpdC4JevJS2pE3zZTqPtPMNIXsOvWKLk506v2HMuVQLS1NbOs2ZyEpgIVH4it_ciMdHedsp9eX_CWvpgVvgDtGCUldLyTgXTAA3BUBt0K5ZXYuaBu2bcdkJpoIOFnklSx5phvJMHvYHAxzRga8WfoAui8h8HZmOKQqM_bHEyg-9HIzN2RJZjF5i0m-fbpnMwGiFLHS4A1fJxe3T8HLY9I8S3AnrSAkeJv2m8VnzyYlu0D41Cfqk3w0ewPiPxeOkTc-t43rdGrk_O70bnKexMEJqMRxr0tJyoCAhx9EIdDgxwtV5pQWtOThpc8u4KQ2GArJGJQKZywvIGIdKozOCLghdJ7OjlxFskMQYT9DlKBSmYrqoDM1Zzmtg2hYO79ojxQQjZSNruC9e8axC9JBJ1QKrPLAqAtsj-9OLXlvSjN-bH3vwp00943U4gXKgohyov-SgR9b81HX689TvFe2R7clcqvh6jlUhEBfP85Nt_kffW2Tej6ddmdkms83bO-yQOfvRDMdvu0EycXv9ebob8gu_ACMb5RY |
| linkProvider | Directory of Open Access Journals |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nb9QwEB2VFgk4UD5asVAghx7JksRObHNbVlQt3a56KKI3K7Yn6krVFnVTfj8zjrushEDiEiWR82G_OH4z9rwBOAzEij2BnQfeSBlC3movcl-HVkld1ChjoPBMzef68tKcp2D1GAuDiHHxGY55N87lhxt_x64y6uEs592IB7DDqbNSuNbapcI5JEytkrZQWZiPk-mUqjHmFOFj4im64OQxG-NPlOlPeVX--BnHEeZo9z_f7Rk8TVQymwzYP4ctXL6AJxsCgy_h6_nV4nTRTz5ltBOdTRkdZpO-59D6jJeHrrDPiLj-LvAdHc8or7IhhrdLTr09-Hb05WJ6nKfsCbknm63Pa69QoMGSaq-JlZIZ3JZNq0WnMBhfeqlc7cheMB39abAIZYWFVNi0xFiIp4h92F7eLPEVZM6xilcQWLlGtlXjRClL1aFsfRXoriOo7tvU-iQtzhkurm00MQpjByAsA2ETECP4sL7ox6Cs8e_inxmsdVGWxY4nCAWbepkluiVCZ4xUSkuNylWInSMS5NtOd6IawR4jt_G8AbQRHNxjb1MfXtlKU7uwGFDx-i-XvYdHxxdnMzs7mZ--gcf8uoN35gC2-9s7fAsP_c9-sbp9Fz_UX14O43Q |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=PhiKitA%3A+Phishing+Kit+Attacks+Dataset+for+Phishing+Websites+Identification&rft.jtitle=IEEE+access&rft.au=Castano%2C+Felipe&rft.au=Eduardo+Fidalgo+Fernandez&rft.au=Alaiz-Rodriguez%2C+Rocio&rft.au=Alegre%2C+Enrique&rft.date=2023-01-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.eissn=2169-3536&rft.volume=11&rft.spage=40779&rft_id=info:doi/10.1109%2FACCESS.2023.3268027&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon |