Translating to a Low-Resource Language with Compiler Feedback: A Case Study on Cangjie

In the rapidly advancing field of software development, the demand for practical code translation tools has surged, driven by the need for interoperability across different programming environments. Existing learning-based approaches often need help with low-resource programming languages that lack...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on software engineering Jg. 51; H. 9; S. 2671 - 2692
Hauptverfasser: Wang, Jun, Su, Chenghao, Ou, Yijie, Li, Yanhui, Tan, Jialiang, Chen, Lin, Zhou, Yuming
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York IEEE 01.09.2025
IEEE Computer Society
Schlagworte:
ISSN:0098-5589, 1939-3520
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the rapidly advancing field of software development, the demand for practical code translation tools has surged, driven by the need for interoperability across different programming environments. Existing learning-based approaches often need help with low-resource programming languages that lack sufficient parallel code corpora for training. To address these limitations, we propose a novel training framework that begins with monolingual seed corpora, generating parallel datasets via back-translation and incorporating compiler feedback to optimize the translation model. As a case study, we apply our method to train a code translation model for a new-born low-resource programming language, Cangjie. We also construct a parallel test dataset for <inline-formula><tex-math notation="LaTeX">\mathsf{Java}</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="sans-serif">Java</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="chen-ieq1-3594908.gif"/> </inline-formula>-to-<inline-formula><tex-math notation="LaTeX">\mathsf{Cangjie}</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="sans-serif">Cangjie</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="chen-ieq2-3594908.gif"/> </inline-formula> translation and test cases to evaluate the effectiveness of our approach. Experimental results demonstrate that compiler feedback greatly enhances syntactical correctness, semantic accuracy, and test pass rates of the translated <inline-formula><tex-math notation="LaTeX">\mathsf{Cangjie}</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="sans-serif">Cangjie</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="chen-ieq3-3594908.gif"/> </inline-formula> code. These findings highlight the potential of our method to support code translation in low-resource settings, expanding the capabilities of learning-based models for programming languages with limited data availability.
Bibliographie:ObjectType-Case Study-2
SourceType-Scholarly Journals-1
content type line 14
ObjectType-Feature-4
ObjectType-Report-1
ObjectType-Article-3
ISSN:0098-5589
1939-3520
DOI:10.1109/TSE.2025.3594908