What question did this study set out to answer?

March 18, 2026

XBMU-bo-Lhasa31: A dataset of speech recognition for the Lhasa Dialect of Tibetan

Key Points

To create a standardized speech recognition dataset for the Lhasa dialect of Tibetan to improve data availability.
Developed a speech recognition dataset recorded from 51 speakers.
Collected 24,289 speech samples totaling 31.61 hours of audio.
Used real-world environments and self-developed recording software for data collection.
Implemented quality control processes including manual verification and voice activity detection.
Created a substantial dataset for Tibetan speech recognition with standardized linguistic content.
Filtered and normalized recordings to enhance data quality and accuracy.

Abstract

Tibetan speech recognition has important application value in fields such as Tibetan language education, news dissemination and other fields. The Lhasa dialect of Tibetan is widely used in Lhasa City and its surrounding regions. However, due to geographical and other constrains, currently available Tibetan speech data resources remained limited and high-quality annotated data are particularly scarce. For this reason, this study constructs a professionally designed and standardized speech recognition dataset for the Lhasa dialect of Tibetan. The dataset was recorded in real-world environments using self-developed recording software, and was collected from 51 speakers, with a total duration of 31.61 hours, containing 24,289 speech samples, with an average duration of 4.68 seconds per sample. The data content was primarily selected from news-related texts to ensure linguistic standardization and domain representativeness. In order to guarantee data quality, we implemented a strict quality control process: firstly, the original texts were segmented into sentences and manually verified; after the recordings were completed, the Voice Activity Detection (VAD) technique was used to filter and regain high-quality speech samples; in addition, non-pronounced symbols in the text were normalized to improve the accuracy of speech recognition. The establishment of this dataset provides an important foundational resource for Tibetan speech recognition and is expected to facilitate the development of Tibetan speech recognition technology.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Like MA

Guanyu LI

Chenyu XIE

Journals

China Scientific Data

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

XBMU-bo-Lhasa31: A dataset of speech recognition for the Lhasa Dialect of Tibetan

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study