Translating to a Low-Resource Language with Compiler Feedback: A Case Study on Cangjie

In the rapidly advancing field of software development, the demand for practical code translation tools has surged, driven by the need for interoperability across different programming environments. Existing learning-based approaches often need help with low-resource programming languages that lack...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on software engineering Ročník 51; číslo 9; s. 2671 - 2692
Hlavní autoři: Wang, Jun, Su, Chenghao, Ou, Yijie, Li, Yanhui, Tan, Jialiang, Chen, Lin, Zhou, Yuming
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.09.2025
IEEE Computer Society
Témata:
ISSN:0098-5589, 1939-3520
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:In the rapidly advancing field of software development, the demand for practical code translation tools has surged, driven by the need for interoperability across different programming environments. Existing learning-based approaches often need help with low-resource programming languages that lack sufficient parallel code corpora for training. To address these limitations, we propose a novel training framework that begins with monolingual seed corpora, generating parallel datasets via back-translation and incorporating compiler feedback to optimize the translation model. As a case study, we apply our method to train a code translation model for a new-born low-resource programming language, Cangjie. We also construct a parallel test dataset for <inline-formula><tex-math notation="LaTeX">\mathsf{Java}</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="sans-serif">Java</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="chen-ieq1-3594908.gif"/> </inline-formula>-to-<inline-formula><tex-math notation="LaTeX">\mathsf{Cangjie}</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="sans-serif">Cangjie</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="chen-ieq2-3594908.gif"/> </inline-formula> translation and test cases to evaluate the effectiveness of our approach. Experimental results demonstrate that compiler feedback greatly enhances syntactical correctness, semantic accuracy, and test pass rates of the translated <inline-formula><tex-math notation="LaTeX">\mathsf{Cangjie}</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="sans-serif">Cangjie</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="chen-ieq3-3594908.gif"/> </inline-formula> code. These findings highlight the potential of our method to support code translation in low-resource settings, expanding the capabilities of learning-based models for programming languages with limited data availability.
Bibliografie:ObjectType-Case Study-2
SourceType-Scholarly Journals-1
content type line 14
ObjectType-Feature-4
ObjectType-Report-1
ObjectType-Article-3
ISSN:0098-5589
1939-3520
DOI:10.1109/TSE.2025.3594908