The current ways of translating have shortcomings with regard to precision and flexibility of the text produced. This paper will use a reinforcement learning model that is grounded on Proximal Policy Optimization (PPO) to enhance the precision of the translation models development. The translation process is represented as the sequential decision-making task. The policy network is a pre-trained Transformer, and a composite reward function combining the semantic similarity, local matching degree, and language fluency is developed to direct the model towards maximizing the generation policy. WMT14 English to German, IWSLT14 German to English, and WMT17 Chinese to English experiments show that such a technique is a good way to ensure better translation. On the English-German WMT14 task, as an example, the BERTS score is 89.72%, the TER decreased to 48.95, and the METEOR also increased to 60.03. It is concluded that the translation of text may be optimized as a joint effort of dynamically regulating the decoding process using the reward system of the reinforcement learning.
Ruomu Wang (Thu,) studied this question.