Today in the development of voice AI, it faces a big obstacle, called a “linguistic ecosystem imbalance” because the amount of minor language data is very scarce, which makes the model performance difficult, and making it unfair. Grannary: It is Nvidia’s open-sourced speech dataset project announced on August 2025 It has collected around 1 million hours of people’s voice audio. NVIDIA Granary is the first industrial scale speech dataset to cover many minor European languages. It will thus be a landmark for the task. This article seeks to comprehensively understand the development of Granary by researching and comparing versions of the granary: Scale up, language up, quality up, and up to ethics. In this paper, not only to fill the gap on the research about the evolutionary trend of a single dataset, but also to get a concrete industrial-level data building pattern; In the future, it can provide very valuable advice to how to build an even more inclusive, robust and trusted multilingual speech intelligence.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jiongzeng Ye
Building similarity graph...
Analyzing shared references across papers
Loading...
Jiongzeng Ye (Mon,) studied this question.
www.synapsesocial.com/papers/69df2b85e4eeef8a2a6b07fc — DOI: https://doi.org/10.1051/itmconf/20268401006/pdf