Big Coding Data: Analysis, Insights, and Applications

In recent years, there has been a notable surge in the generation of coding data on various platforms, including programming competitions and educational institutions. These platforms serve as repositories for substantial volumes of real-world code, problem descriptions, test cases, and activity log...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access Vol. 12; pp. 196010 - 196026
Main Authors: Rahman, Md. Mostafizer, Shirafuji, Atsushi, Watanobe, Yutaka
Format: Journal Article
Language:English
Published: Piscataway IEEE 2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:2169-3536, 2169-3536
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, there has been a notable surge in the generation of coding data on various platforms, including programming competitions and educational institutions. These platforms serve as repositories for substantial volumes of real-world code, problem descriptions, test cases, and activity logs. Despite this wealth of coding data, its potential for advancing software engineering, programming, and research remains largely unexplored. To the best of our knowledge, coding data has been partially explored and utilized in previous research projects such as CodeNet and AlphaCode, but has not been fully considered. There exists a compelling need to explore coding data in more depth to explore its potential for programming and research endeavors. Recognizing this gap, our study undertakes a comprehensive analysis of extensive coding data obtained from a programming learning platform. The Aizu Online Judge (AOJ) serves as our chosen programming platform, providing access to coding data and its associated features. We collected approximately 9 million code evaluation logs, code files, as well as a substantial number of problem descriptions and input/output test cases for thorough analysis and experimentation. The goal of this study is to explore the full potential of the coding data for latent knowledge extraction, programming, and research. We conducted experiments with code evaluation logs, code files, problem descriptions, and test cases to demonstrate the suitability of coding data for various research and applications. Additionally, this study introduces a comprehensive array of features and application programming interfaces (APIs) associated with the AOJ platform. These resources facilitate seamless access and use of coding data, making them a valuable tool for professional and educational initiatives as well as research endeavors.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3521383