What question did this study set out to answer?

The aim is to adapt modern speech recognition models to improve performance for the Romanian language, addressing existing gaps.

February 17, 2026Open Access

Modern Speech Recognition for Romanian Language

Key Points

The aim is to adapt modern speech recognition models to improve performance for the Romanian language, addressing existing gaps.
Comprehensive analysis of wav2vec 2.0 and Conformer models.
Data collection techniques including weakly supervised learning.
Creation of the CRoWL dataset from automatic transcription.
Evaluation of models on Echo and CRoWL datasets.
Conformer achieves 3.01% word error rate (WER) on Echo + CRoWL dataset.
Wav2vec 2.0 reaches 4.04% WER on Echo and 4.17% on Echo + CRoWL.
Models are competitive with or exceed other publicly reported results for Romanian.

Abstract

Despite having approximately 24 million native speakers, Romanian remains a low-resource language for automatic speech recognition (ASR), with few accurate and publicly available systems. To address this gap, this study explores the challenges of adapting modern speech recognition models, such as wav2vec 2.0 and Conformer, to Romanian. Our investigation is a comprehensive analysis of the two models, their capabilities to adapt to Romanian data, and the performance of the trained models. The research also focuses on unique attributes of the Romanian language, data collection techniques, including weakly supervised learning, and processing methodologies. Building on the previously introduced Echo dataset of 378 h, we release CRoWL (Crawled Romanian Weakly Labeled), a weakly supervised dataset of 9000 h created via automatic transcription. We obtain strong results that, to the best of our knowledge, are competitive with or exceed publicly reported results for Romanian under comparable open evaluation settings, with Conformer attaining 3.01% WER on Echo + CRoWL and wav2vec 2.0 reaching 4.04% (Echo) and 4.17% (Echo + CRoWL). In addition to the datasets, we also release our most capable models as open source, along with their training plans, thereby providing a solid foundation for researchers interested in languages with limited representation.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Remus-Dan Ungureanu

Dan Mihailă

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Modern Speech Recognition for Romanian Language

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study