Rolling bearing vibration signals are inherently non-stationary and exhibit complementary fault-related characteristics across time and frequency domains. Effectively modeling such heterogeneous information remains a key challenge for robust fault diagnosis under complex operating conditions. To address this issue, this study proposes a hybrid fault diagnosis framework based on parallel dual-channel feature extraction and Transformer-based early fusion. In the proposed architecture, raw vibration signals are simultaneously processed by a time-domain branch and a frequency-domain branch, enabling the preservation of domain-specific characteristics without sequential information loss. By performing feature fusion at an early stage, the Transformer encoder is able to capture global dependencies and cross-domain interactions through self-attention, thereby enhancing discriminative capability and robustness in noisy environments. Experimental results on the Case Western Reserve University (CWRU) dataset demonstrate that the proposed method achieves a classification accuracy of 99.99%. Furthermore, cross-dataset validation on the XJTU-SY bearing dataset, together with noise robustness and ablation studies, confirms the effectiveness, generalization ability, and structural rationality of the proposed parallel time–frequency fusion strategy. These findings indicate that the proposed framework provides a reliable and practical solution for rolling bearing fault diagnosis in real-world industrial scenarios.
Ning et al. (Sun,) studied this question.