What question did this study set out to answer?

The aim is to create a real-time system for translating English audiovisual content into Uzbek while ensuring synchronized voice playback.

March 2, 2026Open Access

Development of a System for Real-Time Translation and Voice Synthesis of English Audiovisual Content Into Uzbekistan Based on Artificial Intelligence

Key Points

The aim is to create a real-time system for translating English audiovisual content into Uzbek while ensuring synchronized voice playback.
Developed an AI system utilizing automatic speech recognition, neural machine translation, and text-to-speech technologies.
Employed OpenAI Whisper for speech recognition and Google Translate API for translation.
Used Tacotron2 for high-quality voice synthesis to provide a natural-sounding output.
Successful synchronization of voice synthesis with translated content was achieved.
Enhanced accuracy and naturalness in translation and voice output compared to previous methods.
User feedback indicates improved comprehension and enjoyment of English content in the Uzbek language.

Abstract

This paper describes the creation of a real, time AI, based system for converting English audiovisual content into the Uzbek language with voice synthesis that is synchronized at the same time. The system combines Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), and Text, to, Speech (TTS) technologies. OpenAI Whisper, Google Translate API, and Tacotron2 were used to models to get the best output both in terms of accuracy and the naturalness of the voice. The system proposed gives an opportunity to the user to hear English video content in the Uzbek language with synchronized speech. It is a very effective solution for content localization, education, and media applications.

Development of a System for Real-Time Translation and Voice Synthesis of English Audiovisual Content Into Uzbekistan Based on Artificial Intelligence

Key Points

Abstract

Cite This Study