With the acceleration of globalization, the demand for real-time cross-language interaction is increasing. However, traditional translation tools have problems such as translation errors or delays caused by noise interference in voice interaction scenarios. In this paper, a fusion system of English translation robot and speech enhancement technology based on deep learning is proposed. Dual-branch parallel architecture (DBPA) is adopted to perform speech enhancement and translation tasks respectively, and dynamic interaction is realized through feature alignment module to avoid error accumulation. At the same time, the dynamic weight allocation strategy is introduced to optimize the resource allocation in real time according to the noise level and speech complexity, so as to improve the efficiency and robustness of the system. Combined with the multi-modal countermeasure training mechanism, the generalization ability of the model to unknown noise is enhanced. The experimental results show that the system is superior to the traditional cascade model and the most advanced Noise-Adaptive ST(NAST) in noise robustness and real-time, and can maintain high translation quality in different noise environments, which provides new ideas and solutions for the development of cross-language interaction technology.
Yun Feng (Sun,) studied this question.