Obfuscated JavaScript detection using syntactically and lexically enhanced machine learning ; Perdelenmiş JavaScript kodlarının sözdizimsel ve anlamsal yönden iyileştirilmiş makina öğrenmesi ile tespiti
Uloženo v:
| Název: | Obfuscated JavaScript detection using syntactically and lexically enhanced machine learning ; Perdelenmiş JavaScript kodlarının sözdizimsel ve anlamsal yönden iyileştirilmiş makina öğrenmesi ile tespiti |
|---|---|
| Autoři: | Kılıç, Eren |
| Přispěvatelé: | Sandıkkaya, Mehmet Tahir, 866235, Department of Computer Engineering |
| Informace o vydavateli: | Graduate School |
| Rok vydání: | 2024 |
| Sbírka: | Istanbul Teknik Üniversitesi: İTÜ Akademik Açık Arşiv / ITU Academic Open Archive |
| Témata: | computer security, computer science, control |
| Popis: | Thesis (M.Sc.) -- İstanbul Technical University, Graduate School, 2023 ; Web-based attacks have always been a critical security concern over the past few decades. Since JavaScript is the most widely used programming language in web application development for years, JavaScript attacks have become increasingly popular among malicious actors. These attacks can lead to significant outcomes, such as gaining unauthorized access, stealing personal information, exposing data, causing financial damage, and disrupting services. Attackers frequently provide various obfuscation techniques to modify and obscure their malicious source code in order to make it more challenging to understand and evade detection by intrusion prevention and detection systems. This situation makes obfuscated JavaScript source codes potentially harmful and highlights the importance of obfuscation detection, which should be supported by security systems as a critical task. Identification of obfuscated JavaScript source codes is difficult, as numerous obfuscation techniques are employed by intruders. In this thesis paper, a literature review and background information about JavaScript attacks, obfuscation, obfuscation techniques, obfuscation detection, machine learning, and natural language processing are given. The existing obfuscation methods, including static and dynamic analysis, are reviewed with their advantages and limitations. Moreover, a novel machine learning model which is built using syntactic and lexical-based analysis features is proposed in this thesis study. This approach presents two novel features that benefit from natural language processing techniques and contributes to the model discussed in previous work. The first feature is the proportion of meaningful words from natural languages like English to the total number of words in the script. Due to the clean coding principles, such as using descriptive names for variables and functions that are easy to follow and understand, non-obfuscated JavaScript source code is likely to have ... |
| Druh dokumentu: | master thesis |
| Popis souboru: | application/pdf |
| Jazyk: | English |
| Relation: | https://hdl.handle.net/11527/25804 |
| Dostupnost: | https://hdl.handle.net/11527/25804 |
| Přístupové číslo: | edsbas.3727DAAA |
| Databáze: | BASE |
| Abstrakt: | Thesis (M.Sc.) -- İstanbul Technical University, Graduate School, 2023 ; Web-based attacks have always been a critical security concern over the past few decades. Since JavaScript is the most widely used programming language in web application development for years, JavaScript attacks have become increasingly popular among malicious actors. These attacks can lead to significant outcomes, such as gaining unauthorized access, stealing personal information, exposing data, causing financial damage, and disrupting services. Attackers frequently provide various obfuscation techniques to modify and obscure their malicious source code in order to make it more challenging to understand and evade detection by intrusion prevention and detection systems. This situation makes obfuscated JavaScript source codes potentially harmful and highlights the importance of obfuscation detection, which should be supported by security systems as a critical task. Identification of obfuscated JavaScript source codes is difficult, as numerous obfuscation techniques are employed by intruders. In this thesis paper, a literature review and background information about JavaScript attacks, obfuscation, obfuscation techniques, obfuscation detection, machine learning, and natural language processing are given. The existing obfuscation methods, including static and dynamic analysis, are reviewed with their advantages and limitations. Moreover, a novel machine learning model which is built using syntactic and lexical-based analysis features is proposed in this thesis study. This approach presents two novel features that benefit from natural language processing techniques and contributes to the model discussed in previous work. The first feature is the proportion of meaningful words from natural languages like English to the total number of words in the script. Due to the clean coding principles, such as using descriptive names for variables and functions that are easy to follow and understand, non-obfuscated JavaScript source code is likely to have ... |
|---|
Nájsť tento článok vo Web of Science