Rolling bearings exhibit nonlinear and non-stationary fault signals under complex working conditions, rendering single-modal representation insufficient for accurate diagnosis. To address this limitation, this paper proposes a novel parallel multimodal fusion fault diagnosis model based on a Gated Recurrent Unit (GRU), a Residual Network (ResNet), and a Convolutional Block Attention Module (CBAM). First, a systematic multimodal representation selection framework is introduced, identifying the Markov Transition Field (MTF) as the optimal two-dimensional (2D) image modality due to its superior texture clarity and noise resistance compared to other methods. Second, parallel dual-branch architecture is designed to simultaneously process heterogeneous data. The 1D-GRU branch captures long-range temporal dependencies directly from raw vibration signals, while the 2D ResNet-CBAM branch extracts deep spatial features from the MTF images, adaptively focusing on key fault regions. These heterogeneous features are then fused through concatenation to retain complementary diagnostic information. Experimental validation on the Case Western Reserve University (CWRU) dataset demonstrates that the proposed model achieves a 99.57% accuracy in a 10-classification task. Furthermore, it exhibits significant parameter efficiency and outstanding robustness, with the accuracy decreasing by no more than 1.2% under noise interference and cross-load scenarios, comprehensively outperforming existing single-modal and advanced fusion methods.
Xu et al. (Wed,) studied this question.