Statistical learning for file-type identification
Gespeichert in:
| Titel: | Statistical learning for file-type identification |
|---|---|
| Autoren: | Siddharth Gopal, Yiming Yang, Konstantin Salomatin, Jaime Carbonell |
| Weitere Verfasser: | The Pennsylvania State University CiteSeerX Archives |
| Quelle: | http://www.cs.cmu.edu/%7Esgopal1/papers/ICMLA12.pdf. |
| Bestand: | CiteSeerX |
| Schlagwörter: | File-type Identification, Classification, Comparative Evaluation |
| Beschreibung: | —File-type Identification (FTI) is an important problem in digital forensics, intrusion detection, and other related fields. Using state-of-the-art classification techniques to solve FTI problems has begun to receive research attention; however, general conclusions have not been reached due to the lack of thorough evaluations for method comparison. This paper presents a systematic investigation of the problem, algorithmic solutions and an evaluation methodology. Our focus is on performance comparison of statistical classifiers (e.g. SVM and kNN) and knowledge-based approaches, especially COTS (Commercial Off-The-Shelf) solutions which currently dominate FTI applications. We analyze the robustness of different methods in handling damaged files and file segments. We propose two alternative criteria in measuring performance: 1) treating filename extensions as the true labels, and 2) treating the predictions by knowledge based approaches on intact files as true labels; these rely on signature bytes as the true labels (and removing these signature bytes before testing each method). In our experiments with simulated damages in files, SVM and kNN substantially outperform all the COTS solutions we tested, improving classification accuracy very substantially – some COTS methods cannot identify damaged files at all. |
| Publikationsart: | text |
| Dateibeschreibung: | application/pdf |
| Sprache: | English |
| Relation: | http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.303.3596; http://www.cs.cmu.edu/%7Esgopal1/papers/ICMLA12.pdf |
| Verfügbarkeit: | http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.303.3596 http://www.cs.cmu.edu/%7Esgopal1/papers/ICMLA12.pdf |
| Rights: | Metadata may be used without restrictions as long as the oai identifier remains attached to it. |
| Dokumentencode: | edsbas.8B80EED2 |
| Datenbank: | BASE |
| FullText | Text: Availability: 0 CustomLinks: – Url: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.303.3596# Name: EDS - BASE (s4221598) Category: fullText Text: View record from BASE – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Gopal%20S Name: ISI Category: fullText Text: Nájsť tento článok vo Web of Science Icon: https://imagesrvr.epnet.com/ls/20docs.gif MouseOverText: Nájsť tento článok vo Web of Science |
|---|---|
| Header | DbId: edsbas DbLabel: BASE An: edsbas.8B80EED2 RelevancyScore: 750 AccessLevel: 3 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 750 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Statistical learning for file-type identification – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Siddharth+Gopal%22">Siddharth Gopal</searchLink><br /><searchLink fieldCode="AR" term="%22Yiming+Yang%22">Yiming Yang</searchLink><br /><searchLink fieldCode="AR" term="%22Konstantin+Salomatin%22">Konstantin Salomatin</searchLink><br /><searchLink fieldCode="AR" term="%22Jaime+Carbonell%22">Jaime Carbonell</searchLink> – Name: Author Label: Contributors Group: Au Data: The Pennsylvania State University CiteSeerX Archives – Name: TitleSource Label: Source Group: Src Data: <i>http://www.cs.cmu.edu/%7Esgopal1/papers/ICMLA12.pdf</i>. – Name: Subset Label: Collection Group: HoldingsInfo Data: CiteSeerX – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22File-type+Identification%22">File-type Identification</searchLink><br /><searchLink fieldCode="DE" term="%22Classification%22">Classification</searchLink><br /><searchLink fieldCode="DE" term="%22Comparative+Evaluation%22">Comparative Evaluation</searchLink> – Name: Abstract Label: Description Group: Ab Data: —File-type Identification (FTI) is an important problem in digital forensics, intrusion detection, and other related fields. Using state-of-the-art classification techniques to solve FTI problems has begun to receive research attention; however, general conclusions have not been reached due to the lack of thorough evaluations for method comparison. This paper presents a systematic investigation of the problem, algorithmic solutions and an evaluation methodology. Our focus is on performance comparison of statistical classifiers (e.g. SVM and kNN) and knowledge-based approaches, especially COTS (Commercial Off-The-Shelf) solutions which currently dominate FTI applications. We analyze the robustness of different methods in handling damaged files and file segments. We propose two alternative criteria in measuring performance: 1) treating filename extensions as the true labels, and 2) treating the predictions by knowledge based approaches on intact files as true labels; these rely on signature bytes as the true labels (and removing these signature bytes before testing each method). In our experiments with simulated damages in files, SVM and kNN substantially outperform all the COTS solutions we tested, improving classification accuracy very substantially – some COTS methods cannot identify damaged files at all. – Name: TypeDocument Label: Document Type Group: TypDoc Data: text – Name: Format Label: File Description Group: SrcInfo Data: application/pdf – Name: Language Label: Language Group: Lang Data: English – Name: NoteTitleSource Label: Relation Group: SrcInfo Data: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.303.3596; http://www.cs.cmu.edu/%7Esgopal1/papers/ICMLA12.pdf – Name: URL Label: Availability Group: URL Data: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.303.3596<br />http://www.cs.cmu.edu/%7Esgopal1/papers/ICMLA12.pdf – Name: Copyright Label: Rights Group: Cpyrght Data: Metadata may be used without restrictions as long as the oai identifier remains attached to it. – Name: AN Label: Accession Number Group: ID Data: edsbas.8B80EED2 |
| PLink | https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.8B80EED2 |
| RecordInfo | BibRecord: BibEntity: Languages: – Text: English Subjects: – SubjectFull: File-type Identification Type: general – SubjectFull: Classification Type: general – SubjectFull: Comparative Evaluation Type: general Titles: – TitleFull: Statistical learning for file-type identification Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Siddharth Gopal – PersonEntity: Name: NameFull: Yiming Yang – PersonEntity: Name: NameFull: Konstantin Salomatin – PersonEntity: Name: NameFull: Jaime Carbonell – PersonEntity: Name: NameFull: The Pennsylvania State University CiteSeerX Archives IsPartOfRelationships: – BibEntity: Identifiers: – Type: issn-locals Value: edsbas – Type: issn-locals Value: edsbas.oa Titles: – TitleFull: http://www.cs.cmu.edu/%7Esgopal1/papers/ICMLA12.pdf Type: main |
| ResultId | 1 |
Nájsť tento článok vo Web of Science