Detecting transient “click” sounds during connector insertion is pivotal for automotive assembly quality but remains intractable due to high-intensity, non-stationary industrial noise. This paper introduces a physics-aware generative demasking framework that integrates acoustic spatial priors with conditional diffusion modeling. We propose the spatially conditioned diffusion probabilistic model (SC-DPM), where an ambient reference signal acts as a physical constraint to steer the reverse diffusion process. By exploiting the spatial decay of insertion sounds, this mechanism effectively disentangles the target transient from the background noise manifold, reconstructing high-fidelity spectro-temporal features. Discriminative temporal patterns are extracted using causal random convolutional kernels with causal dilations and local proportion of positive values (LPPV) pooling. Experiments on real-world datasets demonstrate 93.3% accuracy. The proposed “restore-then-classify” paradigm significantly enhances robustness against acoustic variability, establishing a scalable methodology for precise industrial monitoring under extreme noise conditions.
Cao et al. (Tue,) studied this question.