Fault diagnosis in rotating machinery is critical for industrial safety, yet existing methods struggle with noise interference and unimodal feature limitations. A Heterogeneous Modal Interaction Transformer (HMIT) integrates time-series, time-frequency, and spatial features via a sparse transformer, enabling cross-modal fusion. This method adaptively tunes complex fault signals and provides robust feature inputs for diagnostic models. Additionally, recognizing that unimodal feature data cannot fully capture the health status of mechanical equipment, this study performs modal feature expansion based on time-series signals. A complementary feature set is constructed by integrating time-domain signals, time-frequency energy spectra, and relative position matrices. To enable deep fusion of these heterogeneous modalities, the HMIT framework is proposed and optimized across key stages, including feature encoding, multimodal fusion, and global correlation modeling. Case studies on both public and in-house datasets validate the effectiveness of HMIT. Results show that the method maintains high accuracy and robustness, even under strong noise and complex fault scenarios. Specifically, experiments on the YZU and SEU datasets reveal a 6.86% improvement in accuracy at an SNR of −9 dB, outperforming conventional models.
Building similarity graph...
Analyzing shared references across papers
Loading...
Lin Zhu
Jianxin Wu
Lintong Liu
Proceedings of the Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science
Yangzhou University
Xi’an Jiaotong-Liverpool University
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhu et al. (Wed,) studied this question.
synapsesocial.com/papers/69fd7f65bfa21ec5bbf07e15 — DOI: https://doi.org/10.1177/09544062261446736