Tibetan speech recognition has important application value in fields such as Tibetan language education, news dissemination and other fields. The Lhasa dialect of Tibetan is widely used in Lhasa City and its surrounding regions. However, due to geographical and other constrains, currently available Tibetan speech data resources remained limited and high-quality annotated data are particularly scarce. For this reason, this study constructs a professionally designed and standardized speech recognition dataset for the Lhasa dialect of Tibetan. The dataset was recorded in real-world environments using self-developed recording software, and was collected from 51 speakers, with a total duration of 31.61 hours, containing 24,289 speech samples, with an average duration of 4.68 seconds per sample. The data content was primarily selected from news-related texts to ensure linguistic standardization and domain representativeness. In order to guarantee data quality, we implemented a strict quality control process: firstly, the original texts were segmented into sentences and manually verified; after the recordings were completed, the Voice Activity Detection (VAD) technique was used to filter and regain high-quality speech samples; in addition, non-pronounced symbols in the text were normalized to improve the accuracy of speech recognition. The establishment of this dataset provides an important foundational resource for Tibetan speech recognition and is expected to facilitate the development of Tibetan speech recognition technology.
Building similarity graph...
Analyzing shared references across papers
Loading...
Like MA
Guanyu LI
Chenyu XIE
China Scientific Data
Building similarity graph...
Analyzing shared references across papers
Loading...
MA et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69ba434a4e9516ffd37a45bb — DOI: https://doi.org/10.11922/11-6035.csd.2025.0122.zh