March 3, 2026Open Access

Neural Wavelet Packet-Based Bidirectional Autoencoder for Multi-Resolution Speech Enhancement

Key Points

NWPA achieves state-of-the-art performance in speech enhancement, effectively balancing noise reduction and intelligibility.
Key evidence shows a richer capture of time-frequency features due to the Fast Discrete Wavelet Packet Transform.
Comprehensive evaluations on the VoiceBank-DEMAND dataset highlight significant improvements in perceptual quality and intelligibility.
This framework may enable robust applications in diverse noisy environments, enhancing communication quality for various users.

Abstract

Speech enhancement is a critical challenge in signal processing, particularly in noisy environments where preserving intelligibility and perceptual quality is essential. Unlike conventional deep learning-based models that operate exclusively in either the time or frequency domain, we present an adaptive multi-resolution approach that enables superior noise suppression while meticulously preserving critical speech structures across diverse frequency bands. To this end, we introduce the Neural Wavelet Packet-Based Bidirectional Autoencoder (NWPA), a novel framework for multi-resolution speech enhancement. NWPA leverages the Fast Discrete Wavelet Packet Transform with trainable filters that jointly decompose both approximation and detail sub-bands, capturing richer time-frequency features than traditional fixed-wavelet approaches. A bidirectional autoencoder design reduces parameter overhead by unifying the encoding and decoding stages, while an improved Learnable Asymmetric Hard Thresholding function adaptively suppresses noise in the wavelet domain. Furthermore, a Sparsity-Enforcing Loss Function balances reconstruction fidelity with wavelet sparsity, preserving critical speech components across multiple resolutions. Comprehensive evaluations on the VoiceBank-DEMAND dataset demonstrate NWPA’s state-of-the-art performance, underscoring its effectiveness in both noise reduction and intelligibility preservation. These results highlight NWPA’s potential as a robust and scalable solution for speech enhancement under diverse noise conditions. The source code is available at: https://github.com/alaaNfissi/Neural-Wavelet-Packet-Based-Bidirectional-Autoencoder-for-Multi-Resolution-Speech-Enhancement.

Mark Helpful

Bookmark

Relay

View Full Paper