Code Type Revealing Using Experiments Framework

Uložené v:
Podrobná bibliografia
Názov: Code Type Revealing Using Experiments Framework
Autori: Sharon, Rami, Gudes, Ehud
Prispievatelia: Open University of Israël, Ben-Gurion University of the Negev (BGU), Nora Cuppens-Boulahia, Frédéric Cuppens, Joaquin Garcia-Alfaro, TC 11, WG 11.3
Zdroj: Lecture Notes in Computer Science ; 26th Conference on Data and Applications Security and Privacy (DBSec) ; https://inria.hal.science/hal-01534762 ; 26th Conference on Data and Applications Security and Privacy (DBSec), Jul 2012, Paris, France. pp.193-206, ⟨10.1007/978-3-642-31540-4_15⟩
Informácie o vydavateľovi: CCSD
Springer
Rok vydania: 2012
Predmety: File Type, Content type revealing framework, Code type, Byte N-Gram statistical analysis, [INFO]Computer Science [cs]
Geografické téma: Paris, France
Popis: Part 6: Data Management ; International audience ; Identifying the type of a code, whether in a file or byte stream, is a challenge that many software companies are facing. Many applications, security and others, base their behavior on the type of code they receive as an input.Today’s traditional identification methods rely on file extensions, magic numbers, propriety headers and trailers or specific type identifying rules. All these are vulnerable to content tampering and discovering it requires investing long and tedious working hours of professionals. This study is aimed to find a method of identifying the best settings to automatically create type signatures that will effectively overcome the content manipulation problem.In this paper we lay out a framework for creating type signatures based on byte N-Grams. The framework allows setting various parameters such as NGram sizes and windows, selecting statistical tests and defining rules for score calculations. The framework serves as a test lab that allows finding the right parameters to satisfy a predefined threshold of type identification accuracy. We demonstrate the framework using basic settings that achieved an F-Measure success rate of 0.996 on 1400 test files.
Druh dokumentu: conference object
Jazyk: English
DOI: 10.1007/978-3-642-31540-4_15
Dostupnosť: https://inria.hal.science/hal-01534762
https://inria.hal.science/hal-01534762v1/document
https://inria.hal.science/hal-01534762v1/file/978-3-642-31540-4_15_Chapter.pdf
https://doi.org/10.1007/978-3-642-31540-4_15
Rights: http://creativecommons.org/licenses/by/ ; info:eu-repo/semantics/OpenAccess
Prístupové číslo: edsbas.D36027C5
Databáza: BASE
Popis
Abstrakt:Part 6: Data Management ; International audience ; Identifying the type of a code, whether in a file or byte stream, is a challenge that many software companies are facing. Many applications, security and others, base their behavior on the type of code they receive as an input.Today’s traditional identification methods rely on file extensions, magic numbers, propriety headers and trailers or specific type identifying rules. All these are vulnerable to content tampering and discovering it requires investing long and tedious working hours of professionals. This study is aimed to find a method of identifying the best settings to automatically create type signatures that will effectively overcome the content manipulation problem.In this paper we lay out a framework for creating type signatures based on byte N-Grams. The framework allows setting various parameters such as NGram sizes and windows, selecting statistical tests and defining rules for score calculations. The framework serves as a test lab that allows finding the right parameters to satisfy a predefined threshold of type identification accuracy. We demonstrate the framework using basic settings that achieved an F-Measure success rate of 0.996 on 1400 test files.
DOI:10.1007/978-3-642-31540-4_15