Leveraging deep learning for Python version identification

Uloženo v:
Podrobná bibliografie
Název: Leveraging deep learning for Python version identification
Autoři: Gerhold, Marcus, Solovyeva, Lola, Zaytsev, Vadim
Zdroj: CEUR workshop proceedings. 3567:33-40
Informace o vydavateli: Rheinisch Westfälische Technische Hochschule, 2023.
Rok vydání: 2023
Témata: Deep Learning, CodeBERT, version identification, Python
Popis: Python, recognized for its dynamic and adaptable nature, has found widespread application in a myriad of projects. As the language evolves, determining the Python version employed in a project becomes pivotal to ensure compatibility and facilitate maintenance. Deep learning (DL) has emerged as a promising tool to automate this process. In this research, we assess various DL techniques in determining the minimum Python version for a code snippet. We explore the complexities of handling Python data and the DL techniques to achieve high classification accuracy. Our experimental results show, that LSTM with CodeBERT embedding achives an accuracy of 92%. This success can be attributed to the LSTM's proficiency in capturing structural details of the hierarchical nature of a source code, complemented by CodeBERT's ability to discern contextual differences between keywords and variable names. This research provides insights into the challenges associated with utilizing programming languages for deep learning models and suggests potential solutions for addressing these issues. The envisioned applications extend to predicting the minimum required version for individual files or an entire code base.
Druh dokumentu: Article
Jazyk: English
ISSN: 1613-0073
Přístupová URL adresa: https://research.utwente.nl/en/publications/f0f8b2f7-dc79-4515-a968-d55e36f0bbae
Přístupové číslo: edsair.dris...02403..749cdc11b2fc3a4b2562daf39d0839b2
Databáze: OpenAIRE
Popis
Abstrakt:Python, recognized for its dynamic and adaptable nature, has found widespread application in a myriad of projects. As the language evolves, determining the Python version employed in a project becomes pivotal to ensure compatibility and facilitate maintenance. Deep learning (DL) has emerged as a promising tool to automate this process. In this research, we assess various DL techniques in determining the minimum Python version for a code snippet. We explore the complexities of handling Python data and the DL techniques to achieve high classification accuracy. Our experimental results show, that LSTM with CodeBERT embedding achives an accuracy of 92%. This success can be attributed to the LSTM's proficiency in capturing structural details of the hierarchical nature of a source code, complemented by CodeBERT's ability to discern contextual differences between keywords and variable names. This research provides insights into the challenges associated with utilizing programming languages for deep learning models and suggests potential solutions for addressing these issues. The envisioned applications extend to predicting the minimum required version for individual files or an entire code base.
ISSN:16130073