Feature mining and classifier selection for API calls-based malware detection.

Gespeichert in:
Bibliographische Detailangaben
Titel: Feature mining and classifier selection for API calls-based malware detection.
Autoren: Balan, Gheorghe, Simion, Ciprian-Alin, Gavriluţ, Dragoş Teodor, Luchian, Henri
Quelle: Applied Intelligence; Dec2023, Vol. 53 Issue 23, p29094-29108, 15p
Schlagwörter: MACHINE learning, MALWARE, DATABASES, FEATURE selection, APPLICATION program interfaces, MACHINE performance, DECISION trees
Abstract: This paper deals with a major challenge in cyber-security: the need to respond to ever renewed techniques used by attackers in order to avoid detection based on analysing static features of malware. These constantly renewed techniques consist of various changes in file geometry, entropy a.s.o. As a consequence, static malware features sets describe less and less accurately the malicious files; hence, the performance of machine learning models in detecting new variants of the same malware family may be severely impaired. The paper focuses on a promising approach to this detection challenge: defining file features based on OS (operating system) API (Application Program Interface) calls sequences. We explore in detail the detection potential of such features, since, in order to act maliciously, these features are highly unlikely to be hidden. We studied several tens of thousands of such features, a modest-sized subset of which were subsequently fed to several machine learning models. The database used for training and testing consists of 1.5 million files, including malicious files from the polymorphic families Emotet and Trickbot. Using this database, nearly 4,000 pairings (classifier, feature selection algorithm) were trained / tested. Our experimental results show that the API (Application Program Interface) calls-oriented feature mining process is well suited for detecting polymorphic malware. A comparative discussion of the detection results of the various models is presented; depending on the target optimisation criterion (detection rate / false positive rate / saving resources), three of the 4,000 classification models turn out to be best suited for real-world applications: Random Forrest, Legacy Neural Networks and Decision Tree. [ABSTRACT FROM AUTHOR]
Copyright of Applied Intelligence is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Datenbank: Complementary Index
FullText Text:
  Availability: 0
CustomLinks:
  – Url: https://resolver.ebscohost.com/openurl?sid=EBSCO:edb&genre=article&issn=0924669X&ISBN=&volume=53&issue=23&date=20231201&spage=29094&pages=29094-29108&title=Applied Intelligence&atitle=Feature%20mining%20and%20classifier%20selection%20for%20API%20calls-based%20malware%20detection.&aulast=Balan%2C%20Gheorghe&id=DOI:10.1007/s10489-023-05086-2
    Name: Full Text Finder
    Category: fullText
    Text: Full Text Finder
    Icon: https://imageserver.ebscohost.com/branding/images/FTF.gif
    MouseOverText: Full Text Finder
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Balan%20G
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: edb
DbLabel: Complementary Index
An: 173923740
RelevancyScore: 965
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 965.43994140625
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Feature mining and classifier selection for API calls-based malware detection.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Balan%2C+Gheorghe%22">Balan, Gheorghe</searchLink><br /><searchLink fieldCode="AR" term="%22Simion%2C+Ciprian-Alin%22">Simion, Ciprian-Alin</searchLink><br /><searchLink fieldCode="AR" term="%22Gavriluţ%2C+Dragoş+Teodor%22">Gavriluţ, Dragoş Teodor</searchLink><br /><searchLink fieldCode="AR" term="%22Luchian%2C+Henri%22">Luchian, Henri</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: Applied Intelligence; Dec2023, Vol. 53 Issue 23, p29094-29108, 15p
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22MACHINE+learning%22">MACHINE learning</searchLink><br /><searchLink fieldCode="DE" term="%22MALWARE%22">MALWARE</searchLink><br /><searchLink fieldCode="DE" term="%22DATABASES%22">DATABASES</searchLink><br /><searchLink fieldCode="DE" term="%22FEATURE+selection%22">FEATURE selection</searchLink><br /><searchLink fieldCode="DE" term="%22APPLICATION+program+interfaces%22">APPLICATION program interfaces</searchLink><br /><searchLink fieldCode="DE" term="%22MACHINE+performance%22">MACHINE performance</searchLink><br /><searchLink fieldCode="DE" term="%22DECISION+trees%22">DECISION trees</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: This paper deals with a major challenge in cyber-security: the need to respond to ever renewed techniques used by attackers in order to avoid detection based on analysing static features of malware. These constantly renewed techniques consist of various changes in file geometry, entropy a.s.o. As a consequence, static malware features sets describe less and less accurately the malicious files; hence, the performance of machine learning models in detecting new variants of the same malware family may be severely impaired. The paper focuses on a promising approach to this detection challenge: defining file features based on OS (operating system) API (Application Program Interface) calls sequences. We explore in detail the detection potential of such features, since, in order to act maliciously, these features are highly unlikely to be hidden. We studied several tens of thousands of such features, a modest-sized subset of which were subsequently fed to several machine learning models. The database used for training and testing consists of 1.5 million files, including malicious files from the polymorphic families Emotet and Trickbot. Using this database, nearly 4,000 pairings (classifier, feature selection algorithm) were trained / tested. Our experimental results show that the API (Application Program Interface) calls-oriented feature mining process is well suited for detecting polymorphic malware. A comparative discussion of the detection results of the various models is presented; depending on the target optimisation criterion (detection rate / false positive rate / saving resources), three of the 4,000 classification models turn out to be best suited for real-world applications: Random Forrest, Legacy Neural Networks and Decision Tree. [ABSTRACT FROM AUTHOR]
– Name: Abstract
  Label:
  Group: Ab
  Data: <i>Copyright of Applied Intelligence is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edb&AN=173923740
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1007/s10489-023-05086-2
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 15
        StartPage: 29094
    Subjects:
      – SubjectFull: MACHINE learning
        Type: general
      – SubjectFull: MALWARE
        Type: general
      – SubjectFull: DATABASES
        Type: general
      – SubjectFull: FEATURE selection
        Type: general
      – SubjectFull: APPLICATION program interfaces
        Type: general
      – SubjectFull: MACHINE performance
        Type: general
      – SubjectFull: DECISION trees
        Type: general
    Titles:
      – TitleFull: Feature mining and classifier selection for API calls-based malware detection.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Balan, Gheorghe
      – PersonEntity:
          Name:
            NameFull: Simion, Ciprian-Alin
      – PersonEntity:
          Name:
            NameFull: Gavriluţ, Dragoş Teodor
      – PersonEntity:
          Name:
            NameFull: Luchian, Henri
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 12
              Text: Dec2023
              Type: published
              Y: 2023
          Identifiers:
            – Type: issn-print
              Value: 0924669X
          Numbering:
            – Type: volume
              Value: 53
            – Type: issue
              Value: 23
          Titles:
            – TitleFull: Applied Intelligence
              Type: main
ResultId 1