This study addresses the challenges of end-to-end (E2E) Speech-to-Text Translation (STT) for the low-resource Fongbe-to-French language pair using a transfer learning approach. We first establish robust baselines by integrating state-of-the-art pretrained speech encoders (HuBERT-147, AfriHuBERT, XLS-R, Whisper) with powerful text decoders (mBART, NLLB). This initial phase identified the AfriHuBERT–NLLB and XLS-R–NLLB combinations as the most competitive E2E configurations. To further enhance performance, we propose and evaluate three hybrid feature fusion strategies, the Bidirectional Co-Attention (BCOAT), the Feature-wise Linear Modulation (FiLM), and the Feature Sum (SUM). These methods are designed to strategically fuse intermediate representations extracted from two distinct and powerful encoders, AfriHuBERT and XLS-R, within the E2E architecture. The fusion process aims to enrich the linguistic and tonal information critical for accurate translation of the tonal Fongbe language. Experimental results demonstrate significant performance gains over the baselines. The BLEU score improved from 26.32 to a peak of 27.78 for the AfriHuBERT–NLLB configuration (using FiLM), and from 26.27 to a maximum of 28.05 for the XLS-R–NLLB configuration (using SUM). These findings confirm that translation quality for tonal languages like Fongbe can be substantially improved by extracting and combining high-quality, complementary features through advanced encoder fusion. Our hybrid feature fusion methods present a substantial advance in speech translation quality within resource-scarce linguistic environments.
Building similarity graph...
Analyzing shared references across papers
Loading...
Fortuné Kponou
Frejus Laleye
Eugène C. Ezin
ACM Transactions on Asian and Low-Resource Language Information Processing
Institut Català d'Arqueologia Clàssica
Institute of Criminology and Social Prevention
Building similarity graph...
Analyzing shared references across papers
Loading...
Kponou et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69d896406c1944d70ce0789a — DOI: https://doi.org/10.1145/3806836