Classification of Low- and High-Entropy File Fragments Using Randomness Measures and Discrete Fourier Transform Coefficients.

Saved in:
Bibliographic Details
Title: Classification of Low- and High-Entropy File Fragments Using Randomness Measures and Discrete Fourier Transform Coefficients.
Authors: Skračić, Kristian, Petrović, Juraj, Pale, Predrag
Source: Vietnam Journal of Computer Science (World Scientific); Nov2023, Vol. 10 Issue 4, p433-462, 30p
Subject Terms: DISCRETE Fourier transforms, SUPPORT vector machines, ARTIFICIAL neural networks, MACHINE learning, COEFFICIENTS (Statistics)
Abstract: This paper presents an approach to improve the file fragment classification by proposing new features for classification and evaluating them on a dataset that includes both low- and high-entropy file fragments. High-entropy fragments, belonging to compressed and encrypted files, are particularly challenging to classify because they lack exploitable patterns. To address this challenge, the proposed feature vectors are constructed based on the byte frequency distribution (BFD) of file fragments, along with discrete Fourier transform coefficients and several randomness measures. These feature vectors are tested using three machine learning models: Support vector machines (SVMs), artificial neural networks (ANNs), and random forests (RFs). The proposed approach is evaluated on the govdocs1 dataset, which is freely available and widely used in this field, to enable reproducibility and fair comparison with other published research. The results show that the proposed approach outperforms existing methods and achieves better classification accuracy for both low- and high-entropy file fragments. [ABSTRACT FROM AUTHOR]
Copyright of Vietnam Journal of Computer Science (World Scientific) is the property of World Scientific Publishing Company and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Complementary Index
Be the first to leave a comment!
You must be logged in first