Translating to a Low-Resource Language with Compiler Feedback: A Case Study on Cangjie

In the rapidly advancing field of software development, the demand for practical code translation tools has surged, driven by the need for interoperability across different programming environments. Existing learning-based approaches often need help with low-resource programming languages that lack...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE transactions on software engineering Ročník 51; číslo 9; s. 2671 - 2692
Hlavní autori: Wang, Jun, Su, Chenghao, Ou, Yijie, Li, Yanhui, Tan, Jialiang, Chen, Lin, Zhou, Yuming
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York IEEE 01.09.2025
IEEE Computer Society
Predmet:
ISSN:0098-5589, 1939-3520
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:In the rapidly advancing field of software development, the demand for practical code translation tools has surged, driven by the need for interoperability across different programming environments. Existing learning-based approaches often need help with low-resource programming languages that lack sufficient parallel code corpora for training. To address these limitations, we propose a novel training framework that begins with monolingual seed corpora, generating parallel datasets via back-translation and incorporating compiler feedback to optimize the translation model. As a case study, we apply our method to train a code translation model for a new-born low-resource programming language, Cangjie. We also construct a parallel test dataset for <inline-formula><tex-math notation="LaTeX">\mathsf{Java}</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="sans-serif">Java</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="chen-ieq1-3594908.gif"/> </inline-formula>-to-<inline-formula><tex-math notation="LaTeX">\mathsf{Cangjie}</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="sans-serif">Cangjie</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="chen-ieq2-3594908.gif"/> </inline-formula> translation and test cases to evaluate the effectiveness of our approach. Experimental results demonstrate that compiler feedback greatly enhances syntactical correctness, semantic accuracy, and test pass rates of the translated <inline-formula><tex-math notation="LaTeX">\mathsf{Cangjie}</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="sans-serif">Cangjie</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="chen-ieq3-3594908.gif"/> </inline-formula> code. These findings highlight the potential of our method to support code translation in low-resource settings, expanding the capabilities of learning-based models for programming languages with limited data availability.
Bibliografia:ObjectType-Case Study-2
SourceType-Scholarly Journals-1
content type line 14
ObjectType-Feature-4
ObjectType-Report-1
ObjectType-Article-3
ISSN:0098-5589
1939-3520
DOI:10.1109/TSE.2025.3594908