edu
Gespeichert in:
| Titel: | edu |
|---|---|
| Autoren: | Siddharth Gopal, Yiming Yang, Konstantin Salomatin, Jaime Carbonell |
| Weitere Verfasser: | The Pennsylvania State University CiteSeerX Archives |
| Quelle: | http://www.cs.cmu.edu/~sgopal1/papers/ICMLA-draft.pdf. |
| Bestand: | CiteSeerX |
| Schlagwörter: | General Terms Algorithms, Experimentation, Performance. Keywords Digital Forensics, File-type Identification, Classification, Scalability, Comparative Evaluation |
| Beschreibung: | File-type Identification (FTI) is an important problem in digital forensics, intrusion detection, and other related fields. Using state-of-the-art classification techniques to solve FTI problems has begun to receive research attention; however, general conclusions have not been reached due to the lack of thorough evaluations for method comparison. This paper presents a systematic investigation of the problem, algorithmic solutions and an evaluation methodology. Our focus is on performance comparison of statistical classifiers (e.g., SVM and kNN) and knowledge-based approaches, especially COTS (Commercial Off-The-Shelf) solutions which currently dominate FTI applications. We analyze the robustness of different methods in handling damaged files and file segments. We propose two alternative criteria in measuring performance: 1) treating file-name extensions as the true labels, and 2) treating the predictions by knowledge based approaches on intact files; these rely on signature bytes as the true labels (and removing these signature bytes before testing each method). In our experiments with simulated damages in files, SVM and kNN substantially outperform all the COTS solutions we tested, improving classification accuracy very substantially – some COTS methods cannot identify damaged files at all. Our experiments also show the scalability of SVM and kNN to large applications after adequate feature selection. |
| Publikationsart: | text |
| Dateibeschreibung: | application/pdf |
| Sprache: | English |
| Relation: | http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.464.4469 |
| Verfügbarkeit: | http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.464.4469 http://www.cs.cmu.edu/~sgopal1/papers/ICMLA-draft.pdf |
| Rights: | Metadata may be used without restrictions as long as the oai identifier remains attached to it. |
| Dokumentencode: | edsbas.74A2B199 |
| Datenbank: | BASE |
| Abstract: | File-type Identification (FTI) is an important problem in digital forensics, intrusion detection, and other related fields. Using state-of-the-art classification techniques to solve FTI problems has begun to receive research attention; however, general conclusions have not been reached due to the lack of thorough evaluations for method comparison. This paper presents a systematic investigation of the problem, algorithmic solutions and an evaluation methodology. Our focus is on performance comparison of statistical classifiers (e.g., SVM and kNN) and knowledge-based approaches, especially COTS (Commercial Off-The-Shelf) solutions which currently dominate FTI applications. We analyze the robustness of different methods in handling damaged files and file segments. We propose two alternative criteria in measuring performance: 1) treating file-name extensions as the true labels, and 2) treating the predictions by knowledge based approaches on intact files; these rely on signature bytes as the true labels (and removing these signature bytes before testing each method). In our experiments with simulated damages in files, SVM and kNN substantially outperform all the COTS solutions we tested, improving classification accuracy very substantially – some COTS methods cannot identify damaged files at all. Our experiments also show the scalability of SVM and kNN to large applications after adequate feature selection. |
|---|
Nájsť tento článok vo Web of Science