Classification of Low- and High-Entropy File Fragments Using Randomness Measures and Discrete Fourier Transform Coefficients.
Saved in:
| Title: | Classification of Low- and High-Entropy File Fragments Using Randomness Measures and Discrete Fourier Transform Coefficients. |
|---|---|
| Authors: | Skračić, Kristian, Petrović, Juraj, Pale, Predrag |
| Source: | Vietnam Journal of Computer Science (World Scientific); Nov2023, Vol. 10 Issue 4, p433-462, 30p |
| Subject Terms: | DISCRETE Fourier transforms, SUPPORT vector machines, ARTIFICIAL neural networks, MACHINE learning, COEFFICIENTS (Statistics) |
| Abstract: | This paper presents an approach to improve the file fragment classification by proposing new features for classification and evaluating them on a dataset that includes both low- and high-entropy file fragments. High-entropy fragments, belonging to compressed and encrypted files, are particularly challenging to classify because they lack exploitable patterns. To address this challenge, the proposed feature vectors are constructed based on the byte frequency distribution (BFD) of file fragments, along with discrete Fourier transform coefficients and several randomness measures. These feature vectors are tested using three machine learning models: Support vector machines (SVMs), artificial neural networks (ANNs), and random forests (RFs). The proposed approach is evaluated on the govdocs1 dataset, which is freely available and widely used in this field, to enable reproducibility and fair comparison with other published research. The results show that the proposed approach outperforms existing methods and achieves better classification accuracy for both low- and high-entropy file fragments. [ABSTRACT FROM AUTHOR] |
| Copyright of Vietnam Journal of Computer Science (World Scientific) is the property of World Scientific Publishing Company and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Complementary Index |
Be the first to leave a comment!
Nájsť tento článok vo Web of Science