March 3, 2026Open Access

Advancing Code Translation With Context-Aware Pre-Training in Data-Scarce Environments

Key Points

The adapted model achieves 89.03 BLEU-4 for Python to Java translation, significantly improving accuracy.
Performance metrics include BLEU-4, CodeBLEU, and CrystalBLEU, demonstrating enhanced translation quality in tasks.
This study introduces Contextual Code Completion, focusing on variable-length code spans rather than isolated tokens.
Aligning pre-training with translation objectives highlights the importance of specificity in target languages.

Abstract

Automated code translation plays an important role in software modernization, interoperability, and productivity. However, existing approaches often face three obstacles: limited availability of parallel training data, pre-training strategies that are not aligned with translation, and models that lack focus on a specific target language. In this work, we introduce a task-adapted pre-training method called Contextual Code Completion (CCC). Instead of masking isolated tokens, CCC hides and reconstructs variable-length code spans, a process that more closely resembles real translation tasks. Building on CodeT5 as the base model, we further increase precision by fixing Java as the sole target language, and we apply few-shot learning to make effective use of the small number of Python→Java and C#→Java pairs that are available. To evaluate translation quality, we use BLEU-4, CodeBLEU, and CrystalBLEU alongside structural and semantic checks. The adapted model achieves 89.03 BLEU-4 and 78.98 CodeBLEU on Python→Java, and 81.02 BLEU-4 with 83.25 CodeBLEU on C#→Java, showing relative gains of up to 65% compared with prior work. These results demonstrate that aligning pre-training objectives with translation and specializing on a single target language can substantially improve the accuracy and reliability of code migration.

Advancing Code Translation With Context-Aware Pre-Training in Data-Scarce Environments

Key Points

Abstract

Cite This Study