This study presents HGRN2-based Flexible Dynamic Encoder Personal VAD (FDE-HGRN2), a recurrent framework for personal voice activity detection (PVAD). Building on the original LSTM-based FDE-RNN backbone, we replace all recurrent modules with the recently introduced HGRN2 gated linear RNN and adopt a cosine-annealing learning rate schedule to improve both detection accuracy and efficiency. HGRN2 uses gated linear recurrence with non-parametric state expansion, enlarging the recurrent state without increasing the number of trainable parameters and enabling more expressive long-range temporal modeling than conventional LSTMs. We evaluate FDE-HGRN2 on a LibriSpeech-derived PVAD benchmark, where multi-speaker mixtures are constructed by concatenating one to three speakers per utterance and randomly designating a target speaker, following established PVAD data construction practices to ensure direct comparability with prior work. The system uses 40-dimensional Mel-filterbank features as acoustic inputs and conditions the detector on 256-dimensional d-vector embeddings extracted from a pretrained speaker verification network. Experimental results show that FDE-HGRN2 consistently outperforms the original FDE-RNN baseline and several state-of-the-art PVAD models in terms of mean Average Precision and frame-level accuracy, while reducing the parameter count of the recurrent backbone by roughly 15% and yielding substantially smaller models than many competing systems. These findings indicate that HGRN2 provides a more temporally expressive and parameter-efficient alternative to LSTM for PVAD, offering a favorable accuracy–efficiency trade-off for real-world, deployment-oriented personalized speech interfaces.
Building similarity graph...
Analyzing shared references across papers
Loading...
Tzu-Wei Wang
Tai-You Chen
Chien-Chia Chiu
Electronics
National Taiwan Normal University
National Chi Nan University
Building similarity graph...
Analyzing shared references across papers
Loading...
Wang et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69d8955f6c1944d70ce064c3 — DOI: https://doi.org/10.3390/electronics15081561