Feature Extraction for Payload Classification: A Byte Pair Encoding Algorithm
Payload classification is a kind of deep packet inspection model that has been proved effective for many Internet applications such as, but not limited to, intrusion detection and network diagnostics. In typical payload classification, feature extraction is the first and very important step which ma...
Saved in:
| Published in: | 2022 IEEE 8th International Conference on Computer and Communications (ICCC) pp. 1 - 5 |
|---|---|
| Main Authors: | , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
09.12.2022
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Payload classification is a kind of deep packet inspection model that has been proved effective for many Internet applications such as, but not limited to, intrusion detection and network diagnostics. In typical payload classification, feature extraction is the first and very important step which makes a great impact on the quality and quantity of classification results. At present, most feature extraction of payloads adopts n-gram model. However, n-gram model generates features in fixed-length (length of n), which may induce kinds of information loss for feature extraction. In this paper, we propose a very different Byte Pair Encoding (BPE) algorithm for payload feature extractions. In this algorithm, we introduce a novel concept of sub-words to express the payload features, and thereby have the feature length not fixed any more. By the BPE, we can first initialize a vocabulary in a single byte basis, and then continuously update the vocabulary by merging the most frequent byte pairs in the payload to form new sub-words until all sub-word pairs reach the (approximately) same frequency, regardless the lengths of these sub-words. We finally have a very flexible and scalable vocabulary for feature extraction and payload embedding. At the end, we conduct sets of payload classification experiments on the CIC-IDS2017 dataset, in order to verify the effectiveness of our algorithm. The results have successfully confirmed the better classification performance by the use of our BPE algorithm than the traditional n-gram methods. |
|---|---|
| AbstractList | Payload classification is a kind of deep packet inspection model that has been proved effective for many Internet applications such as, but not limited to, intrusion detection and network diagnostics. In typical payload classification, feature extraction is the first and very important step which makes a great impact on the quality and quantity of classification results. At present, most feature extraction of payloads adopts n-gram model. However, n-gram model generates features in fixed-length (length of n), which may induce kinds of information loss for feature extraction. In this paper, we propose a very different Byte Pair Encoding (BPE) algorithm for payload feature extractions. In this algorithm, we introduce a novel concept of sub-words to express the payload features, and thereby have the feature length not fixed any more. By the BPE, we can first initialize a vocabulary in a single byte basis, and then continuously update the vocabulary by merging the most frequent byte pairs in the payload to form new sub-words until all sub-word pairs reach the (approximately) same frequency, regardless the lengths of these sub-words. We finally have a very flexible and scalable vocabulary for feature extraction and payload embedding. At the end, we conduct sets of payload classification experiments on the CIC-IDS2017 dataset, in order to verify the effectiveness of our algorithm. The results have successfully confirmed the better classification performance by the use of our BPE algorithm than the traditional n-gram methods. |
| Author | Xu, Tianci Zhou, Peng |
| Author_xml | – sequence: 1 givenname: Tianci surname: Xu fullname: Xu, Tianci email: tianei_xu@shu.edu.cn organization: School of Mechatronical Engineering and Automation, Shanghai University,Shanghai,China – sequence: 2 givenname: Peng surname: Zhou fullname: Zhou, Peng email: pzhou@shu.edu.cn organization: School of Mechatronical Engineering and Automation, Shanghai University,Shanghai,China |
| BookMark | eNo1j81KxDAYACPowV19A8G8QGuS5mfjrYauLqzoQc_Ll_TrGug2kkZw315FPc1hYGAW5HRKExJyzVnNObM3G-ec0o2QtWBC1JwxrawxJ2TBtVZSMcXNOXlcI5SPjLT7LBlCiWmiQ8r0GY5jgp66EeY5DjHAj7qlLb07FvzWMdNuCqmP05624z7lWN4OF-RsgHHGyz8uyeu6e3EP1fbpfuPabRU5t6UKxqi-UUbxHpELA94KEEFh8CscpPUMASRw8Mrble-V0cAkCuN1AyhksyRXv92IiLv3HA-Qj7v_xeYL_OFMYg |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ICCC56324.2022.10065977 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 1665450517 9781665450515 |
| EndPage | 5 |
| ExternalDocumentID | 10065977 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 61972452 funderid: 10.13039/501100001809 |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i119t-c775d35751dee127ab92a2c5ecb8ef49b0eaa4a1ab5b98bd576a04e27b63ae243 |
| IEDL.DBID | RIE |
| IngestDate | Thu Jan 18 11:14:59 EST 2024 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i119t-c775d35751dee127ab92a2c5ecb8ef49b0eaa4a1ab5b98bd576a04e27b63ae243 |
| PageCount | 5 |
| ParticipantIDs | ieee_primary_10065977 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-Dec.-9 |
| PublicationDateYYYYMMDD | 2022-12-09 |
| PublicationDate_xml | – month: 12 year: 2022 text: 2022-Dec.-9 day: 09 |
| PublicationDecade | 2020 |
| PublicationTitle | 2022 IEEE 8th International Conference on Computer and Communications (ICCC) |
| PublicationTitleAbbrev | ICCC |
| PublicationYear | 2022 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.8304089 |
| Snippet | Payload classification is a kind of deep packet inspection model that has been proved effective for many Internet applications such as, but not limited to,... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Byte Pair Encoding (BPE) Classification algorithms Encoding Feature extraction Inspection Intrusion detection Merging payload classification sub-word model Vocabulary word embedding word segmentation |
| Title | Feature Extraction for Payload Classification: A Byte Pair Encoding Algorithm |
| URI | https://ieeexplore.ieee.org/document/10065977 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA62ePCkYsU3OXjdms1js_FWlxY9WHpQ6K3kMdUF3ZXtVvTfm6St4sGDlxBCmMAXwsxk5ptB6FJYoM44SDJp5wmHFBINwBMrLc8ZECZyF5tNyPE4n07VZE1Wj1wYAIjJZ9AP0xjLd7Vdhq8y_8JDFFDKDupIma3IWuucrZSoq7uiKEQoP-7dPkr7m92_-qZEtTHa_eeBe6j3Q8DDk2_Vso-2oDpA98FcWzaAhx9ts-IjYG9y4klwurXDsb9lyPyJYF_jAb75bL0cXTZ4WNk6iMKDl6e6Kdvn1x56HA0fittk3Q0hKdNUtR49KRwLYRIHkFKpjaKaWgHW5DDnyhDQmutUG2FUbpx3JDThQKXJmAbK2SHqVnUFRwgTNufcD5SajBPDlZNATCaoslYybo9RL2Axe1sVvJhtYDj5Y_0U7QTEY5aHOkPdtlnCOdq27225aC7iNX0BphyWNg |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5aBT2pWPFtDl63ZrPJZuOtLi0ttqWHCr2VPKa6oLuybkX_vZu0VTx48BJCCBP4QpiZzHwzCF1zA9RqC0EszDxgEEKgAFhghGFJBCTiifXNJsRolEyncrwiq3suDAD45DNouamP5dvCLNxXWf3CXRRQiE20xRmjZEnXWmVthUTe9NM05a4Aee34Udpa7__VOcUrju7eP4_cR80fCh4efyuXA7QB-SEaOoNtUQLufFTlkpGAa6MTj53brSz2HS5d7o-H-xa38d1nVctRWYk7uSmcKNx-fizKrHp6aaKHbmeS9oJVP4QgC0NZ1fgJbiMXKLEAIRVKS6qo4WB0AnMmNQGlmAqV5lom2tauhCIMqNBxpICy6Ag18iKHY4RJNGesHijVMSOaSSuA6JhTaYyImDlBTYfF7HVZ8mK2huH0j_UrtNObDAezQX90f4Z2Hfo-50Oeo0ZVLuACbZv3KnsrL_2VfQGnH5l9 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+IEEE+8th+International+Conference+on+Computer+and+Communications+%28ICCC%29&rft.atitle=Feature+Extraction+for+Payload+Classification%3A+A+Byte+Pair+Encoding+Algorithm&rft.au=Xu%2C+Tianci&rft.au=Zhou%2C+Peng&rft.date=2022-12-09&rft.pub=IEEE&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICCC56324.2022.10065977&rft.externalDocID=10065977 |