PhiKitA: Phishing Kit Attacks dataset for Phishing Websites Identification

Recent studies have shown that phishers are using phishing kits to deploy phishing attacks faster, easier and more massive. Detecting phishing kits in deployed websites might help to detect phishing campaigns earlier. To the best of our knowledge, there are no datasets providing a set of phishing ki...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access Jg. 11; S. 1
Hauptverfasser: Castano, Felipe, Fernandez, Eduardo Fidalgo, Alaiz-Rodriguez, Rocio, Alegre, Enrique
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Piscataway IEEE 01.01.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:2169-3536, 2169-3536
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Recent studies have shown that phishers are using phishing kits to deploy phishing attacks faster, easier and more massive. Detecting phishing kits in deployed websites might help to detect phishing campaigns earlier. To the best of our knowledge, there are no datasets providing a set of phishing kits that are used in websites that were attacked by phishing. In this work, we propose PhiKitA, a novel dataset that contains phishing kits and also phishing websites generated using these kits. We have applied MD5 hashes, fingerprints, and graph representation DOM algorithms to obtain baseline results in PhiKitA in three experiments: familiarity analysis of phishing kit samples, phishing website detection and identifying the source of a phishing website. In the familiarity analysis, we find evidence of different types of phishing kits and a small phishing campaign. In the binary classification problem for phishing detection, the graph representation algorithm achieved an accuracy of 92.50%, showing that the phishing kit data contain useful information to classify phishing. Finally, the MD5 hash representation obtained a 39.54% F1 score, which means that this algorithm does not extract enough information to distinguish phishing websites and their phishing kit sources properly.
AbstractList Recent studies have shown that phishers are using phishing kits to deploy phishing attacks faster, easier and more massive. Detecting phishing kits in deployed websites might help to detect phishing campaigns earlier. To the best of our knowledge, there are no datasets providing a set of phishing kits that are used in websites that were attacked by phishing. In this work, we propose PhiKitA, a novel dataset that contains phishing kits and also phishing websites generated using these kits. We have applied MD5 hashes, fingerprints, and graph representation DOM algorithms to obtain baseline results in PhiKitA in three experiments: familiarity analysis of phishing kit samples, phishing website detection and identifying the source of a phishing website. In the familiarity analysis, we find evidence of different types of phishing kits and a small phishing campaign. In the binary classification problem for phishing detection, the graph representation algorithm achieved an accuracy of 92.50%, showing that the phishing kit data contain useful information to classify phishing. Finally, the MD5 hash representation obtained a 39.54% F1 score, which means that this algorithm does not extract enough information to distinguish phishing websites and their phishing kit sources properly.
Author Alaiz-Rodriguez, Rocio
Alegre, Enrique
Castano, Felipe
Fernandez, Eduardo Fidalgo
Author_xml – sequence: 1
  givenname: Felipe
  orcidid: 0000-0001-9157-4111
  surname: Castano
  fullname: Castano, Felipe
  organization: Department of Electrical, Systems and Automation Engineering, Universidad de León, León, Spain
– sequence: 2
  givenname: Eduardo Fidalgo
  orcidid: 0000-0003-1202-5232
  surname: Fernandez
  fullname: Fernandez, Eduardo Fidalgo
  organization: Department of Electrical, Systems and Automation Engineering, Universidad de León, León, Spain
– sequence: 3
  givenname: Rocio
  orcidid: 0000-0003-4164-5887
  surname: Alaiz-Rodriguez
  fullname: Alaiz-Rodriguez, Rocio
  organization: Department of Electrical, Systems and Automation Engineering, Universidad de León, León, Spain
– sequence: 4
  givenname: Enrique
  orcidid: 0000-0003-2081-774X
  surname: Alegre
  fullname: Alegre, Enrique
  organization: Department of Electrical, Systems and Automation Engineering, Universidad de León, León, Spain
BookMark eNp9UU1PGzEQtSoqQSm_gB5W4pzU9uz6g1sUQZsWqZUC4mh5vWNwCGtqm0P_fQ0bVMSBucx49N6bZ71PZG-MIxJyzOicMaq_LpbLs_V6zimHOXChKJcfyAFnQs-gA7H3at4nRzlvaC1VV508ID9-34afoSxOmzrk2zDeNPXZLEqx7i43gy02Y2l8TP8B19jnUDA3qwHHEnxwtoQ4fiYfvd1mPNr1Q3J1fna5_D67-PVttVxczFxLdZl1TiKgRlYNK8UGoallwirwEgftmGtl3_VCdtpTpZAOjCNtJQqrQYoW4JCsJt0h2o15SOHepr8m2mCeFzHdGJtKcFs0AjQMXutWStUqlD1H9L1U3FmvPPCqdTJpPaT45xFzMZv4mMZq33BVz0MnFa0oPaFcijkn9MaF8vznkmzYGkbNUxJmSsI8JWF2SVQuvOG-OH6f9WViBUR8xWAUlAD4B2owlNg
CODEN IAECCG
CitedBy_id crossref_primary_10_1007_s10462_024_11055_z
crossref_primary_10_1109_ACCESS_2023_3293063
crossref_primary_10_1016_j_ipm_2024_103928
crossref_primary_10_1109_ACCESS_2025_3609252
crossref_primary_10_1007_s10660_025_10029_9
crossref_primary_10_26634_jdf_2_1_20840
crossref_primary_10_1109_ACCESS_2025_3525998
Cites_doi 10.1109/ACCESS.2021.3108222
10.1109/ACCESS.2022.3168681
10.1007/978-3-030-78375-4_11
10.1007/s00521-021-06401-z
10.1016/j.jisa.2022.103229
10.1016/j.diin.2019.05.004
10.14569/IJACSA.2020.0110523
10.1007/978-981-19-0604-6_9
10.1145/2976749.2978330
10.5220/0006074900150024
10.1016/j.eswa.2022.118010
10.1109/CAC51589.2020.9327200
10.1109/DSC49826.2021.9346256
10.7717/peerj-cs.868
10.1016/j.cosrev.2015.04.001
10.1109/SP40001.2021.00021
10.1016/j.compeleceng.2022.107716
10.1016/j.compeleceng.2022.107689
10.3390/s20164491
10.1109/ACCESS.2020.3048839
10.1049/cp.2019.1164
10.1016/j.aci.2020.01.002
10.1109/ICEE49691.2020.9249892
10.1109/TNSM.2022.3162885
10.1109/SP.2019.00049
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
DBID 97E
ESBDL
RIA
RIE
AAYXX
CITATION
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
DOA
DOI 10.1109/ACCESS.2023.3268027
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE Xplore Open Access Journals
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
METADEX
Technology Research Database
Materials Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
Materials Research Database
Engineered Materials Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
METADEX
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Materials Research Database

Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Open Access Full Text
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2169-3536
EndPage 1
ExternalDocumentID oai_doaj_org_article_6393df99477848e7b2eefb782caf8f32
10_1109_ACCESS_2023_3268027
10103863
Genre orig-research
GrantInformation_xml – fundername: Instituto Nacional de Ciberseguridad
  funderid: 10.13039/501100013410
GroupedDBID 0R~
5VS
6IK
97E
AAJGR
ABAZT
ABVLG
ACGFS
ADBBV
ALMA_UNASSIGNED_HOLDINGS
BCNDV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
ESBDL
GROUPED_DOAJ
IPLJI
JAVBF
KQ8
M43
M~E
O9-
OCL
OK1
RIA
RIE
RNS
4.4
AAYXX
AGSQL
CITATION
EJD
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c409t-5c7e3e9e1202881d690a16a83f7ed9c1c47b5b6759f088e0d12e047e6a9376433
IEDL.DBID DOA
ISICitedReferencesCount 10
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001033137500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2169-3536
IngestDate Fri Oct 03 12:32:19 EDT 2025
Mon Jun 30 04:15:10 EDT 2025
Sat Nov 29 04:02:34 EST 2025
Tue Nov 18 22:35:08 EST 2025
Wed Aug 27 02:21:20 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License https://creativecommons.org/licenses/by-nc-nd/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c409t-5c7e3e9e1202881d690a16a83f7ed9c1c47b5b6759f088e0d12e047e6a9376433
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-9157-4111
0000-0003-1202-5232
0000-0003-4164-5887
0000-0003-2081-774X
OpenAccessLink https://doaj.org/article/6393df99477848e7b2eefb782caf8f32
PQID 2808835780
PQPubID 4845423
PageCount 1
ParticipantIDs crossref_citationtrail_10_1109_ACCESS_2023_3268027
doaj_primary_oai_doaj_org_article_6393df99477848e7b2eefb782caf8f32
proquest_journals_2808835780
crossref_primary_10_1109_ACCESS_2023_3268027
ieee_primary_10103863
PublicationCentury 2000
PublicationDate 2023-01-01
PublicationDateYYYYMMDD 2023-01-01
PublicationDate_xml – month: 01
  year: 2023
  text: 2023-01-01
  day: 01
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE access
PublicationTitleAbbrev Access
PublicationYear 2023
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref35
ref12
ref34
cova (ref21) 2008
ref15
ref31
casta no (ref22) 2022
ref11
ref33
ref10
ref32
ref2
ref17
ref16
ref19
ref18
oest (ref26) 2020
canali (ref30) 2013
já nez-martino (ref3) 2020
ref24
ref23
ref25
(ref14) 2022
ref28
union (ref1) 2021
bijmans (ref20) 2021
ref8
ref7
ref9
ref4
ref6
atkinson (ref29) 2001; 33
ref5
britt (ref27) 2012
References_xml – ident: ref4
  doi: 10.1109/ACCESS.2021.3108222
– ident: ref33
  doi: 10.1109/ACCESS.2022.3168681
– ident: ref18
  doi: 10.1007/978-3-030-78375-4_11
– start-page: 361
  year: 2020
  ident: ref26
  article-title: Sunrise to sunset: Analyzing the end-to-end life cycle and effectiveness of phishing attacks at scale
  publication-title: Proc 29th USENIX Secur Symp
– start-page: 3757
  year: 2021
  ident: ref20
  article-title: Catching phishers by their bait: Investigating the Dutch phishing landscape through phishing kit detection
  publication-title: Proc 30th USENIX Secur Symp (USENIX Secur )
– ident: ref34
  doi: 10.1007/s00521-021-06401-z
– ident: ref8
  doi: 10.1016/j.jisa.2022.103229
– year: 2020
  ident: ref3
  article-title: Classification of spam emails through hierarchical clustering and supervised learning
  publication-title: arXiv 2005 08773
– ident: ref6
  doi: 10.1016/j.diin.2019.05.004
– ident: ref16
  doi: 10.14569/IJACSA.2020.0110523
– ident: ref9
  doi: 10.1007/978-981-19-0604-6_9
– ident: ref31
  doi: 10.1145/2976749.2978330
– start-page: 1
  year: 2013
  ident: ref30
  article-title: Behind the scenes of online attacks: An analysis of exploitation behaviors on the web
  publication-title: Proc 20th Annu Netw Distrib Syst Secur Symp (NDSS)
– ident: ref23
  doi: 10.5220/0006074900150024
– start-page: 1
  year: 2012
  ident: ref27
  article-title: Clustering potential phishing websites using DEEPMD5
  publication-title: Proc 5th USENIX Workshop Large-Scale Exploits Emergent Threats (LEET)
– start-page: 1
  year: 2008
  ident: ref21
  article-title: There is no free phish: An analysis of 'free' and live phishing kits
  publication-title: Proc 2nd USENIX Workshop Offensive Technol (WOOT)
– ident: ref32
  doi: 10.1016/j.eswa.2022.118010
– ident: ref11
  doi: 10.1109/CAC51589.2020.9327200
– ident: ref25
  doi: 10.1109/DSC49826.2021.9346256
– ident: ref28
  doi: 10.7717/peerj-cs.868
– year: 2021
  ident: ref1
  publication-title: Measuring digital development facts and figures
– ident: ref17
  doi: 10.1016/j.cosrev.2015.04.001
– ident: ref19
  doi: 10.1109/SP40001.2021.00021
– volume: 33
  start-page: 1
  year: 2001
  ident: ref29
  article-title: Accessing hidden and hard-to-reach populations: Snowball research strategies
  publication-title: Social research Update
– year: 2022
  ident: ref14
  publication-title: Phishing Activity Trends Report 2 Quarter
– ident: ref5
  doi: 10.1016/j.compeleceng.2022.107716
– ident: ref12
  doi: 10.1016/j.compeleceng.2022.107689
– ident: ref10
  doi: 10.3390/s20164491
– ident: ref15
  doi: 10.1109/ACCESS.2020.3048839
– ident: ref7
  doi: 10.1049/cp.2019.1164
– ident: ref2
  doi: 10.1016/j.aci.2020.01.002
– ident: ref35
  doi: 10.1109/ICEE49691.2020.9249892
– ident: ref13
  doi: 10.1109/TNSM.2022.3162885
– year: 2022
  ident: ref22
  publication-title: Creation of a Phishing Kit Dataset for Phishing Websites Identification
– ident: ref24
  doi: 10.1109/SP.2019.00049
SSID ssj0000816957
Score 2.373549
Snippet Recent studies have shown that phishers are using phishing kits to deploy phishing attacks faster, easier and more massive. Detecting phishing kits in deployed...
SourceID doaj
proquest
crossref
ieee
SourceType Open Website
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1
SubjectTerms Algorithms
Classification algorithms
Computer crime
Computer security
Cyber threat intelligence
Cyber threats
Cybercrime
Cybersecurity
Datasets
Feature extraction
Graph representations
Graphical representations
Internet
Phishing
Phishing kits
Social engineering
Social engineering (security)
Uniform resource locators
Websites
SummonAdditionalLinks – databaseName: IEEE Electronic Library (IEL)
  dbid: RIE
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT9wwEB0B6qEcCqUgtkCVA0eyJLET271tV0VVixAHqnKzYnsiVkK7iA39_Z1xzHalikpcoiRyPuwXx2_GnjcAp4FYsSew88AbKUPIW-1F7uvQKqmLGmUMFL5UV1f69tZcp2D1GAuDiHHxGY55N87lh4V_YlcZ9XCW827EJmwq1QzBWiuHCmeQMLVKykJlYc4n0ylVYswJwsfEUnTBqWPWRp8o0p-yqvzzK47jy8XOK99sF94lIplNBuTfwwbO92B7TV7wA3y_vpv9mPWTzxntRFdTRofZpO85sD7jxaFL7DOirX8L_ELH88nLbIjg7ZJLbx9-Xny9mX7LU-6E3JPF1ue1VyjQYEm118RJyQhuy6bVolMYjC-9VK52ZC2Yjv4zWISywkIqbFriK8RSxAFszRdzPITMOdbwCgIr18i2apwoZak6lK2vAt11BNVzm1qfhMU5v8W9jQZGYewAhGUgbAJiBGerix4GXY3_F__CYK2Ksih2PEEo2NTHLJEtETpjpFJaalSuQuwcUSDfdroT1Qj2Gbm15w2gjeD4GXubevDSVprahaWAio8vXHYEb_kVB3_MMWz1j094Am_87362fPwUP84_iELgKw
  priority: 102
  providerName: IEEE
Title PhiKitA: Phishing Kit Attacks dataset for Phishing Websites Identification
URI https://ieeexplore.ieee.org/document/10103863
https://www.proquest.com/docview/2808835780
https://doaj.org/article/6393df99477848e7b2eefb782caf8f32
Volume 11
WOSCitedRecordID wos001033137500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Open Access Full Text
  customDbUrl:
  eissn: 2169-3536
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000816957
  issn: 2169-3536
  databaseCode: DOA
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2169-3536
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000816957
  issn: 2169-3536
  databaseCode: M~E
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LS8QwEA4iHvQgPnF90YNHq22TNom3dVXEFx4UvYU8prggq7jVo7_dSZpdC4JevJS2pE3zZTqPtPMNIXsOvWKLk506v2HMuVQLS1NbOs2ZyEpgIVH4it_ciMdHedsp9eX_CWvpgVvgDtGCUldLyTgXTAA3BUBt0K5ZXYuaBu2bcdkJpoIOFnklSx5phvJMHvYHAxzRga8WfoAui8h8HZmOKQqM_bHEyg-9HIzN2RJZjF5i0m-fbpnMwGiFLHS4A1fJxe3T8HLY9I8S3AnrSAkeJv2m8VnzyYlu0D41Cfqk3w0ewPiPxeOkTc-t43rdGrk_O70bnKexMEJqMRxr0tJyoCAhx9EIdDgxwtV5pQWtOThpc8u4KQ2GArJGJQKZywvIGIdKozOCLghdJ7OjlxFskMQYT9DlKBSmYrqoDM1Zzmtg2hYO79ojxQQjZSNruC9e8axC9JBJ1QKrPLAqAtsj-9OLXlvSjN-bH3vwp00943U4gXKgohyov-SgR9b81HX689TvFe2R7clcqvh6jlUhEBfP85Nt_kffW2Tej6ddmdkms83bO-yQOfvRDMdvu0EycXv9ebob8gu_ACMb5RY
linkProvider Directory of Open Access Journals
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nb9QwEB2VFgk4UD5asVAghx7JksRObHNbVlQt3a56KKI3K7Yn6krVFnVTfj8zjrushEDiEiWR82G_OH4z9rwBOAzEij2BnQfeSBlC3movcl-HVkld1ChjoPBMzef68tKcp2D1GAuDiHHxGY55N87lhxt_x64y6uEs592IB7DDqbNSuNbapcI5JEytkrZQWZiPk-mUqjHmFOFj4im64OQxG-NPlOlPeVX--BnHEeZo9z_f7Rk8TVQymwzYP4ctXL6AJxsCgy_h6_nV4nTRTz5ltBOdTRkdZpO-59D6jJeHrrDPiLj-LvAdHc8or7IhhrdLTr09-Hb05WJ6nKfsCbknm63Pa69QoMGSaq-JlZIZ3JZNq0WnMBhfeqlc7cheMB39abAIZYWFVNi0xFiIp4h92F7eLPEVZM6xilcQWLlGtlXjRClL1aFsfRXoriOo7tvU-iQtzhkurm00MQpjByAsA2ETECP4sL7ox6Cs8e_inxmsdVGWxY4nCAWbepkluiVCZ4xUSkuNylWInSMS5NtOd6IawR4jt_G8AbQRHNxjb1MfXtlKU7uwGFDx-i-XvYdHxxdnMzs7mZ--gcf8uoN35gC2-9s7fAsP_c9-sbp9Fz_UX14O43Q
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=PhiKitA%3A+Phishing+Kit+Attacks+Dataset+for+Phishing+Websites+Identification&rft.jtitle=IEEE+access&rft.au=Castano%2C+Felipe&rft.au=Eduardo+Fidalgo+Fernandez&rft.au=Alaiz-Rodriguez%2C+Rocio&rft.au=Alegre%2C+Enrique&rft.date=2023-01-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.eissn=2169-3536&rft.volume=11&rft.spage=40779&rft_id=info:doi/10.1109%2FACCESS.2023.3268027&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon