Unsupervised Voting for Detecting the Algorithmic Solving Strategy in Competitive Programming Solutions.

Saved in:
Bibliographic Details
Title: Unsupervised Voting for Detecting the Algorithmic Solving Strategy in Competitive Programming Solutions.
Authors: Stoica, Alexandru Stefan, Babiceanu, Daniel, Mihaescu, Marian Cristian, Rebedea, Traian
Source: Mathematics (2227-7390); Nov2025, Vol. 13 Issue 22, p3589, 19p
Subject Terms: MACHINE learning, CLUSTERING algorithms, COMPUTER science, ALGORITHMS, SOURCE code
Abstract: The problem of source-code analysis using machine-learning techniques has gained much attention recently, as several powerful code-embedding methods have been created. Having different embedding methods available for source code has opened the way to tackling many practical problems in source-code analysis. This paper addresses the problem of determining the number of distinct algorithmic strategies that may be found in a set of correct solutions to a competitive programming problem. To achieve this, we employ a novel unsupervised algorithm that uses a multiview interpretation of data based on different embedding and clustering methods, a multidimensional assignment problem (MAP) to determine a subset of a higher probability of correctness, and a self-training method based on voting to determine the correct clusters of the remaining set. We investigate the following two aspects: (1) whether the proposed unsupervised approach outperforms existing methods when the number K of distinct algorithmic strategies is known and (2) Whether the approach can also be applied to determine the optimal value of K. We have addressed these using seven embedding methods with three clustering strategies in a data-analysis pipeline that tackles the previously described issues on a newly created dataset consisting of 15 algorithmic problems. According to the results, for the first aspect, the proposed unsupervised voting algorithm significantly improves the baseline clustering approach for a known K. This improvement was observed across all problems in the dataset, except one. In the case of the second one, we prove that the proposed method has a negative impact on determining the optimal number K. Scale-up of the data-analysis pipeline to datasets of thousands of problems may yield the ability to profoundly understand and learn about the innovative process of correctly designing and writing code in the context of competitive programming or even industry code. [ABSTRACT FROM AUTHOR]
Copyright of Mathematics (2227-7390) is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Complementary Index
Description
Abstract:The problem of source-code analysis using machine-learning techniques has gained much attention recently, as several powerful code-embedding methods have been created. Having different embedding methods available for source code has opened the way to tackling many practical problems in source-code analysis. This paper addresses the problem of determining the number of distinct algorithmic strategies that may be found in a set of correct solutions to a competitive programming problem. To achieve this, we employ a novel unsupervised algorithm that uses a multiview interpretation of data based on different embedding and clustering methods, a multidimensional assignment problem (MAP) to determine a subset of a higher probability of correctness, and a self-training method based on voting to determine the correct clusters of the remaining set. We investigate the following two aspects: (1) whether the proposed unsupervised approach outperforms existing methods when the number K of distinct algorithmic strategies is known and (2) Whether the approach can also be applied to determine the optimal value of K. We have addressed these using seven embedding methods with three clustering strategies in a data-analysis pipeline that tackles the previously described issues on a newly created dataset consisting of 15 algorithmic problems. According to the results, for the first aspect, the proposed unsupervised voting algorithm significantly improves the baseline clustering approach for a known K. This improvement was observed across all problems in the dataset, except one. In the case of the second one, we prove that the proposed method has a negative impact on determining the optimal number K. Scale-up of the data-analysis pipeline to datasets of thousands of problems may yield the ability to profoundly understand and learn about the innovative process of correctly designing and writing code in the context of competitive programming or even industry code. [ABSTRACT FROM AUTHOR]
ISSN:22277390
DOI:10.3390/math13223589