Translating to a Low-Resource Language with Compiler Feedback: A Case Study on Cangjie
In the rapidly advancing field of software development, the demand for practical code translation tools has surged, driven by the need for interoperability across different programming environments. Existing learning-based approaches often need help with low-resource programming languages that lack...
Uloženo v:
| Vydáno v: | IEEE transactions on software engineering Ročník 51; číslo 9; s. 2671 - 2692 |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
IEEE
01.09.2025
IEEE Computer Society |
| Témata: | |
| ISSN: | 0098-5589, 1939-3520 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | In the rapidly advancing field of software development, the demand for practical code translation tools has surged, driven by the need for interoperability across different programming environments. Existing learning-based approaches often need help with low-resource programming languages that lack sufficient parallel code corpora for training. To address these limitations, we propose a novel training framework that begins with monolingual seed corpora, generating parallel datasets via back-translation and incorporating compiler feedback to optimize the translation model. As a case study, we apply our method to train a code translation model for a new-born low-resource programming language, Cangjie. We also construct a parallel test dataset for <inline-formula><tex-math notation="LaTeX">\mathsf{Java}</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="sans-serif">Java</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="chen-ieq1-3594908.gif"/> </inline-formula>-to-<inline-formula><tex-math notation="LaTeX">\mathsf{Cangjie}</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="sans-serif">Cangjie</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="chen-ieq2-3594908.gif"/> </inline-formula> translation and test cases to evaluate the effectiveness of our approach. Experimental results demonstrate that compiler feedback greatly enhances syntactical correctness, semantic accuracy, and test pass rates of the translated <inline-formula><tex-math notation="LaTeX">\mathsf{Cangjie}</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="sans-serif">Cangjie</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="chen-ieq3-3594908.gif"/> </inline-formula> code. These findings highlight the potential of our method to support code translation in low-resource settings, expanding the capabilities of learning-based models for programming languages with limited data availability. |
|---|---|
| Bibliografie: | ObjectType-Case Study-2 SourceType-Scholarly Journals-1 content type line 14 ObjectType-Feature-4 ObjectType-Report-1 ObjectType-Article-3 |
| ISSN: | 0098-5589 1939-3520 |
| DOI: | 10.1109/TSE.2025.3594908 |