Programming Language Prediction using Machine Learning
Uloženo v:
| Název: | Programming Language Prediction using Machine Learning |
|---|---|
| Autoři: | Nidhun M, Sona Maria Sebastian, orcid:0000-0001-7784- |
| Informace o vydavateli: | Department of Computer Applications, Amal Jyothi College of Engineering Kanjirappally, Kottayam |
| Rok vydání: | 2023 |
| Sbírka: | Zenodo |
| Témata: | Classification, Machine learning, Random Forest, NLP, Source code Detection |
| Popis: | The primary tool used in the software development industry is programming languages. Since the 1940s, hundreds of them have been developed, and every day, a sizable number of new lines of code are written in a variety of programming languages and pushed to active repositories. We consider a source code classifier to be a highly valuable tool for automatic syntax highlighting and label suggestion on systems, such as code editors, that can identify the programming language used to write a certain piece of code. This motivated us to use cutting-edge AI methods for text classification to build a model for categorizing code snippets according to their language. We developed a new dataset for our empirical investigation using the GitHub Repos Dataset, which includes 131450 code snippets dispersed over 34 programming languages. |
| Druh dokumentu: | conference object |
| Jazyk: | unknown |
| Relation: | https://zenodo.org/communities/amaljyothi/; https://zenodo.org/records/7961995; oai:zenodo.org:7961995; https://doi.org/10.5281/zenodo.7961995 |
| DOI: | 10.5281/zenodo.7961995 |
| Dostupnost: | https://doi.org/10.5281/zenodo.7961995 https://zenodo.org/records/7961995 |
| Rights: | Creative Commons Attribution 4.0 International ; cc-by-4.0 ; https://creativecommons.org/licenses/by/4.0/legalcode |
| Přístupové číslo: | edsbas.B9989DF |
| Databáze: | BASE |
| Abstrakt: | The primary tool used in the software development industry is programming languages. Since the 1940s, hundreds of them have been developed, and every day, a sizable number of new lines of code are written in a variety of programming languages and pushed to active repositories. We consider a source code classifier to be a highly valuable tool for automatic syntax highlighting and label suggestion on systems, such as code editors, that can identify the programming language used to write a certain piece of code. This motivated us to use cutting-edge AI methods for text classification to build a model for categorizing code snippets according to their language. We developed a new dataset for our empirical investigation using the GitHub Repos Dataset, which includes 131450 code snippets dispersed over 34 programming languages. |
|---|---|
| DOI: | 10.5281/zenodo.7961995 |
Nájsť tento článok vo Web of Science