Automated code translation plays an important role in software modernization, interoperability, and productivity. However, existing approaches often face three obstacles: limited availability of parallel training data, pre-training strategies that are not aligned with translation, and models that lack focus on a specific target language. In this work, we introduce a task-adapted pre-training method called Contextual Code Completion (CCC). Instead of masking isolated tokens, CCC hides and reconstructs variable-length code spans, a process that more closely resembles real translation tasks. Building on CodeT5 as the base model, we further increase precision by fixing Java as the sole target language, and we apply few-shot learning to make effective use of the small number of Python→Java and C#→Java pairs that are available. To evaluate translation quality, we use BLEU-4, CodeBLEU, and CrystalBLEU alongside structural and semantic checks. The adapted model achieves 89.03 BLEU-4 and 78.98 CodeBLEU on Python→Java, and 81.02 BLEU-4 with 83.25 CodeBLEU on C#→Java, showing relative gains of up to 65% compared with prior work. These results demonstrate that aligning pre-training objectives with translation and specializing on a single target language can substantially improve the accuracy and reliability of code migration.
Naik et al. (Thu,) studied this question.