Spatial cues preserving speech enhancement is crucial for achieving both intelligibility and a "being-there" impression in speaker-dominant Ambisonics audio communication. Data-driven speech enhancement for Ambisonics input-output systems faces two key challenges. First, designing the target signal for reverberation shaping merely from a temporal perspective, as was done in single-channel scenarios, tends to degrade spatial perception. Second, the suitability of various filter matrix formulations for different target signals has not been systematically studied. To address the design of the target signal, we formulate it as an Ambisonics room impulse response shaping problem, and we propose a spatial shaping based on maximum directivity, as well as a variant that losslessly passes the omnidirectional component. To estimate these target signals, we establish a neural filtering framework, encompassing both the spherical harmonic domain and the plane wave domain, with three filter matrix parameterizations: mask, beamform-and-project, and unconstrained matrix. The experiments show that the proposed spatio-temporal reverberation shaping yields a more natural spatial auditory impression of the target signal and further enhances the spatial release from masking, where the performance of neural filtering primarily depends on the suitability of the filter matrix's rank for signal spatial covariance matrices rather than the spatial domain transformation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Shiqi Wang
Hongbing Qiu
Xiyu Song
The Journal of the Acoustical Society of America
Guilin University of Electronic Technology
Guilin University of Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Wang et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69d896566c1944d70ce07bdf — DOI: https://doi.org/10.1121/10.0043334