Fast Content-Based File Type Identification

Uloženo v:
Podrobná bibliografie
Název: Fast Content-Based File Type Identification
Autoři: Ahmed, Irfan, Lhee, Kyung-Suk, Shin, Hyun-Jung, Hong, Man-Pyo
Přispěvatelé: Information Security Institute, Queensland University of Technology Brisbane (QUT), Ajou University, Gilbert Peterson, Sujeet Shenoi, TC 11, WG 11.9
Zdroj: IFIP Advances in Information and Communication Technology ; 7th Digital Forensics (DF) ; https://inria.hal.science/hal-01569553 ; 7th Digital Forensics (DF), Jan 2011, Orlando, FL, United States. pp.65-75, ⟨10.1007/978-3-642-24212-0_5⟩
Informace o vydavateli: CCSD
Springer
Rok vydání: 2011
Témata: File type identification, file content classification, byte frequency, [INFO]Computer Science [cs]
Geografické téma: Orlando, FL, United States
Popis: Part 2: FORENSIC TECHNIQUES ; International audience ; Digital forensic examiners often need to identify the type of a file or file fragment based on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and time consuming. This paper proposes two techniques for reducing the classification time. The first technique selects a subset of features based on the frequency of occurrence. The second speeds up classification by randomly sampling file blocks. Experimental results demonstrate that up to a fifteen-fold reduction in computational time can be achieved with limited impact on accuracy.
Druh dokumentu: conference object
Jazyk: English
DOI: 10.1007/978-3-642-24212-0_5
Dostupnost: https://inria.hal.science/hal-01569553
https://inria.hal.science/hal-01569553v1/document
https://inria.hal.science/hal-01569553v1/file/978-3-642-24212-0_5_Chapter.pdf
https://doi.org/10.1007/978-3-642-24212-0_5
Rights: http://creativecommons.org/licenses/by/ ; info:eu-repo/semantics/OpenAccess
Přístupové číslo: edsbas.54718D05
Databáze: BASE
Popis
Abstrakt:Part 2: FORENSIC TECHNIQUES ; International audience ; Digital forensic examiners often need to identify the type of a file or file fragment based on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and time consuming. This paper proposes two techniques for reducing the classification time. The first technique selects a subset of features based on the frequency of occurrence. The second speeds up classification by randomly sampling file blocks. Experimental results demonstrate that up to a fifteen-fold reduction in computational time can be achieved with limited impact on accuracy.
DOI:10.1007/978-3-642-24212-0_5