Speech enhancement is a critical challenge in signal processing, particularly in noisy environments where preserving intelligibility and perceptual quality is essential. Unlike conventional deep learning-based models that operate exclusively in either the time or frequency domain, we present an adaptive multi-resolution approach that enables superior noise suppression while meticulously preserving critical speech structures across diverse frequency bands. To this end, we introduce the Neural Wavelet Packet-Based Bidirectional Autoencoder (NWPA), a novel framework for multi-resolution speech enhancement. NWPA leverages the Fast Discrete Wavelet Packet Transform with trainable filters that jointly decompose both approximation and detail sub-bands, capturing richer time-frequency features than traditional fixed-wavelet approaches. A bidirectional autoencoder design reduces parameter overhead by unifying the encoding and decoding stages, while an improved Learnable Asymmetric Hard Thresholding function adaptively suppresses noise in the wavelet domain. Furthermore, a Sparsity-Enforcing Loss Function balances reconstruction fidelity with wavelet sparsity, preserving critical speech components across multiple resolutions. Comprehensive evaluations on the VoiceBank-DEMAND dataset demonstrate NWPA’s state-of-the-art performance, underscoring its effectiveness in both noise reduction and intelligibility preservation. These results highlight NWPA’s potential as a robust and scalable solution for speech enhancement under diverse noise conditions. The source code is available at: https://github.com/alaaNfissi/Neural-Wavelet-Packet-Based-Bidirectional-Autoencoder-for-Multi-Resolution-Speech-Enhancement.
Nfissi et al. (Mon,) studied this question.