Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant.

Uloženo v:
Podrobná bibliografie
Název: Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant.
Autoři: Demidova, Liliya A., Andrianova, Elena G., Sovietov, Peter N., Gorchakov, Artyom V.
Zdroj: Data (2306-5729); Jun2023, Vol. 8 Issue 6, p109, 16p
Témata: TEACHERS' assistants, PYTHON programming language, CLASSIFICATION algorithms, HIERARCHICAL clustering (Cluster analysis), MARKOV processes, SOURCE code, ALGORITHMS
Abstrakt: This paper presents a dataset containing automatically collected source codes solving unique programming exercises of different types. The programming exercises were automatically generated by the Digital Teaching Assistant (DTA) system that automates a massive Python programming course at MIREA—Russian Technological University (RTU MIREA). Source codes of the small programs grouped by the type of the solved task can be used for benchmarking source code classification and clustering algorithms. Moreover, the data can be used for training intelligent program synthesizers or benchmarking mutation testing frameworks, and more applications are yet to be discovered. We describe the architecture of the DTA system, aiming to provide detailed insight regarding how and why the dataset was collected. In addition, we describe the algorithms responsible for source code analysis in the DTA system. These algorithms use vector representations of programs based on Markov chains, compute pairwise Jensen–Shannon divergences of programs, and apply hierarchical clustering algorithms in order to automatically discover high-level concepts used by students while solving unique tasks. The proposed approach can be incorporated into massive programming courses when there is a need to identify approaches implemented by students. Dataset: The data presented in this study are openly available in Zenodo at https://doi.org/10.5281/zenodo.7799971 (accessed on 10 June 2023). Dataset License: CC-BY-4.0 [ABSTRACT FROM AUTHOR]
Copyright of Data (2306-5729) is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáze: Complementary Index
Buďte první, kdo okomentuje tento záznam!
Nejprve se musíte přihlásit.