Feature Extraction for Payload Classification: A Byte Pair Encoding Algorithm

Payload classification is a kind of deep packet inspection model that has been proved effective for many Internet applications such as, but not limited to, intrusion detection and network diagnostics. In typical payload classification, feature extraction is the first and very important step which ma...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2022 IEEE 8th International Conference on Computer and Communications (ICCC) s. 1 - 5
Hlavní autoři: Xu, Tianci, Zhou, Peng
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 09.12.2022
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Payload classification is a kind of deep packet inspection model that has been proved effective for many Internet applications such as, but not limited to, intrusion detection and network diagnostics. In typical payload classification, feature extraction is the first and very important step which makes a great impact on the quality and quantity of classification results. At present, most feature extraction of payloads adopts n-gram model. However, n-gram model generates features in fixed-length (length of n), which may induce kinds of information loss for feature extraction. In this paper, we propose a very different Byte Pair Encoding (BPE) algorithm for payload feature extractions. In this algorithm, we introduce a novel concept of sub-words to express the payload features, and thereby have the feature length not fixed any more. By the BPE, we can first initialize a vocabulary in a single byte basis, and then continuously update the vocabulary by merging the most frequent byte pairs in the payload to form new sub-words until all sub-word pairs reach the (approximately) same frequency, regardless the lengths of these sub-words. We finally have a very flexible and scalable vocabulary for feature extraction and payload embedding. At the end, we conduct sets of payload classification experiments on the CIC-IDS2017 dataset, in order to verify the effectiveness of our algorithm. The results have successfully confirmed the better classification performance by the use of our BPE algorithm than the traditional n-gram methods.
AbstractList Payload classification is a kind of deep packet inspection model that has been proved effective for many Internet applications such as, but not limited to, intrusion detection and network diagnostics. In typical payload classification, feature extraction is the first and very important step which makes a great impact on the quality and quantity of classification results. At present, most feature extraction of payloads adopts n-gram model. However, n-gram model generates features in fixed-length (length of n), which may induce kinds of information loss for feature extraction. In this paper, we propose a very different Byte Pair Encoding (BPE) algorithm for payload feature extractions. In this algorithm, we introduce a novel concept of sub-words to express the payload features, and thereby have the feature length not fixed any more. By the BPE, we can first initialize a vocabulary in a single byte basis, and then continuously update the vocabulary by merging the most frequent byte pairs in the payload to form new sub-words until all sub-word pairs reach the (approximately) same frequency, regardless the lengths of these sub-words. We finally have a very flexible and scalable vocabulary for feature extraction and payload embedding. At the end, we conduct sets of payload classification experiments on the CIC-IDS2017 dataset, in order to verify the effectiveness of our algorithm. The results have successfully confirmed the better classification performance by the use of our BPE algorithm than the traditional n-gram methods.
Author Xu, Tianci
Zhou, Peng
Author_xml – sequence: 1
  givenname: Tianci
  surname: Xu
  fullname: Xu, Tianci
  email: tianei_xu@shu.edu.cn
  organization: School of Mechatronical Engineering and Automation, Shanghai University,Shanghai,China
– sequence: 2
  givenname: Peng
  surname: Zhou
  fullname: Zhou, Peng
  email: pzhou@shu.edu.cn
  organization: School of Mechatronical Engineering and Automation, Shanghai University,Shanghai,China
BookMark eNo1j81KxDAYACPowV19A8G8QGuS5mfjrYauLqzoQc_Ll_TrGug2kkZw315FPc1hYGAW5HRKExJyzVnNObM3G-ec0o2QtWBC1JwxrawxJ2TBtVZSMcXNOXlcI5SPjLT7LBlCiWmiQ8r0GY5jgp66EeY5DjHAj7qlLb07FvzWMdNuCqmP05624z7lWN4OF-RsgHHGyz8uyeu6e3EP1fbpfuPabRU5t6UKxqi-UUbxHpELA94KEEFh8CscpPUMASRw8Mrble-V0cAkCuN1AyhksyRXv92IiLv3HA-Qj7v_xeYL_OFMYg
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICCC56324.2022.10065977
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1665450517
9781665450515
EndPage 5
ExternalDocumentID 10065977
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 61972452
  funderid: 10.13039/501100001809
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i119t-c775d35751dee127ab92a2c5ecb8ef49b0eaa4a1ab5b98bd576a04e27b63ae243
IEDL.DBID RIE
IngestDate Thu Jan 18 11:14:59 EST 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i119t-c775d35751dee127ab92a2c5ecb8ef49b0eaa4a1ab5b98bd576a04e27b63ae243
PageCount 5
ParticipantIDs ieee_primary_10065977
PublicationCentury 2000
PublicationDate 2022-Dec.-9
PublicationDateYYYYMMDD 2022-12-09
PublicationDate_xml – month: 12
  year: 2022
  text: 2022-Dec.-9
  day: 09
PublicationDecade 2020
PublicationTitle 2022 IEEE 8th International Conference on Computer and Communications (ICCC)
PublicationTitleAbbrev ICCC
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.8304089
Snippet Payload classification is a kind of deep packet inspection model that has been proved effective for many Internet applications such as, but not limited to,...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Byte Pair Encoding (BPE)
Classification algorithms
Encoding
Feature extraction
Inspection
Intrusion detection
Merging
payload classification
sub-word model
Vocabulary
word embedding
word segmentation
Title Feature Extraction for Payload Classification: A Byte Pair Encoding Algorithm
URI https://ieeexplore.ieee.org/document/10065977
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA22ePCkYsVvcvC6NZvNbjbe6lJR0NKDSm8lk0x1QXdl3Yr-e5O0VTx48BZCmJAJYeZl5s0QcjpzPjjEOokgsywSUjA3YhBxzsAZHMFNqMT0cCNHo3wyUeMlWT1wYRAxJJ9h3w9DLN_WZu6_ytwL91FAKTukI2W2IGstc7Zips6ui6JIfflxB_s4769W_-qbEszG5eY_N9wivR8CHh1_m5ZtsobVDrn17tq8QTr8aJsFH4E6l5OOPejWlob-lj7zJyj7nA7oxWfr5OiyocPK1F4UHTw_1k3ZPr30yP3l8K64ipbdEKIyjlUbGSlTm_gwiUWMudSguOYmRQM5zoQChloLHWtIQeVgHZDQTCCXkCUauUh2SbeqK9wjNBE2U2qWZgocGgKHuKzM3EETnuaoJe6TntfF9HVR8GK6UsPBH_OHZMNrPGR5qCPSbZs5HpN1896Wb81JuKYvR-uUag
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5aBT2pWPFtDl63ZrPZzcZbXVostqWHKr2VPKa6oLuybkX_vUnaKh48eAshTMiEMPNl5ptB6HJmfXAVyihQiSEB44zYEVEBpURZg8Oo9pWYHvp8OEwnEzFaktU9FwYAfPIZtNzQx_JNqefuq8y-cBcF5HwdbcSMUbKgay2ztkIirnpZlsWuALkFfpS2Vut_dU7xhqO7888td1Hzh4KHR9_GZQ-tQbGPBs5hm1eAOx91tWAkYOt04pGD3dJg3-HS5f54dV_jNr75rK0cmVe4U-jSicLt58eyyuunlya673bG2W2w7IcQ5GEo6kBzHpvIBUoMQEi5VIJKqmPQKoUZE4qAlEyGUsVKpMpYKCEJA8pVEkmgLDpAjaIs4BDhiJlEiFmcCGXxkLKYy_DEHjSicQqSwxFqOl1MXxclL6YrNRz_MX-Btm7Hg_603xvenaBtp32f8yFOUaOu5nCGNvV7nb9V5_7KvgAwFJex
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+IEEE+8th+International+Conference+on+Computer+and+Communications+%28ICCC%29&rft.atitle=Feature+Extraction+for+Payload+Classification%3A+A+Byte+Pair+Encoding+Algorithm&rft.au=Xu%2C+Tianci&rft.au=Zhou%2C+Peng&rft.date=2022-12-09&rft.pub=IEEE&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICCC56324.2022.10065977&rft.externalDocID=10065977