Obfuscated JavaScript detection using syntactically and lexically enhanced machine learning ; Perdelenmiş JavaScript kodlarının sözdizimsel ve anlamsal yönden iyileştirilmiş makina öğrenmesi ile tespiti

Uloženo v:
Podrobná bibliografie
Název: Obfuscated JavaScript detection using syntactically and lexically enhanced machine learning ; Perdelenmiş JavaScript kodlarının sözdizimsel ve anlamsal yönden iyileştirilmiş makina öğrenmesi ile tespiti
Autoři: Kılıç, Eren
Přispěvatelé: Sandıkkaya, Mehmet Tahir, 866235, Department of Computer Engineering
Informace o vydavateli: Graduate School
Rok vydání: 2024
Sbírka: Istanbul Teknik Üniversitesi: İTÜ Akademik Açık Arşiv / ITU Academic Open Archive
Témata: computer security, computer science, control
Popis: Thesis (M.Sc.) -- İstanbul Technical University, Graduate School, 2023 ; Web-based attacks have always been a critical security concern over the past few decades. Since JavaScript is the most widely used programming language in web application development for years, JavaScript attacks have become increasingly popular among malicious actors. These attacks can lead to significant outcomes, such as gaining unauthorized access, stealing personal information, exposing data, causing financial damage, and disrupting services. Attackers frequently provide various obfuscation techniques to modify and obscure their malicious source code in order to make it more challenging to understand and evade detection by intrusion prevention and detection systems. This situation makes obfuscated JavaScript source codes potentially harmful and highlights the importance of obfuscation detection, which should be supported by security systems as a critical task. Identification of obfuscated JavaScript source codes is difficult, as numerous obfuscation techniques are employed by intruders. In this thesis paper, a literature review and background information about JavaScript attacks, obfuscation, obfuscation techniques, obfuscation detection, machine learning, and natural language processing are given. The existing obfuscation methods, including static and dynamic analysis, are reviewed with their advantages and limitations. Moreover, a novel machine learning model which is built using syntactic and lexical-based analysis features is proposed in this thesis study. This approach presents two novel features that benefit from natural language processing techniques and contributes to the model discussed in previous work. The first feature is the proportion of meaningful words from natural languages like English to the total number of words in the script. Due to the clean coding principles, such as using descriptive names for variables and functions that are easy to follow and understand, non-obfuscated JavaScript source code is likely to have ...
Druh dokumentu: master thesis
Popis souboru: application/pdf
Jazyk: English
Relation: https://hdl.handle.net/11527/25804
Dostupnost: https://hdl.handle.net/11527/25804
Přístupové číslo: edsbas.3727DAAA
Databáze: BASE
Popis
Abstrakt:Thesis (M.Sc.) -- İstanbul Technical University, Graduate School, 2023 ; Web-based attacks have always been a critical security concern over the past few decades. Since JavaScript is the most widely used programming language in web application development for years, JavaScript attacks have become increasingly popular among malicious actors. These attacks can lead to significant outcomes, such as gaining unauthorized access, stealing personal information, exposing data, causing financial damage, and disrupting services. Attackers frequently provide various obfuscation techniques to modify and obscure their malicious source code in order to make it more challenging to understand and evade detection by intrusion prevention and detection systems. This situation makes obfuscated JavaScript source codes potentially harmful and highlights the importance of obfuscation detection, which should be supported by security systems as a critical task. Identification of obfuscated JavaScript source codes is difficult, as numerous obfuscation techniques are employed by intruders. In this thesis paper, a literature review and background information about JavaScript attacks, obfuscation, obfuscation techniques, obfuscation detection, machine learning, and natural language processing are given. The existing obfuscation methods, including static and dynamic analysis, are reviewed with their advantages and limitations. Moreover, a novel machine learning model which is built using syntactic and lexical-based analysis features is proposed in this thesis study. This approach presents two novel features that benefit from natural language processing techniques and contributes to the model discussed in previous work. The first feature is the proportion of meaningful words from natural languages like English to the total number of words in the script. Due to the clean coding principles, such as using descriptive names for variables and functions that are easy to follow and understand, non-obfuscated JavaScript source code is likely to have ...