Purpose: High-reliability systems rarely fail, resulting in extreme fault-data scarcity and severe class imbalance. These limitations hinder the application of machine learning models for Remaining Useful Life (RUL) prediction and fault diagnosis. This study proposes a hybrid data augmentation framework to overcome these challenges.Methods: A CTGAN-based hybrid augmentation approach was developed by combining environmental sensitivity analysis (XGBoost-SHAP), prototype clustering (K-means), and multi-stage post-processing (ProtoMix, weighted noise injection, Soft Capping, and CDF matching). The framework generates physically consistent and statistically realistic synthetic fault data.Results: Under an extreme stress test with 37,982 normal samples and only 15 fault samples, the Hybrid GAN outperformed Baseline, SMOTE, and Basic GAN. It achieved the highest accuracy (0.942), improved F1-score (0.905), precision (0.964), and reduced false positives by nearly 50% compared with the Basic GAN.Conclusion: The hybrid framework effectively mitigated class imbalance and delivered robust predictive performance under challenging conditions, demonstrating strong potential for predictive maintenance in high-reliability systems.
Lee et al. (Wed,) studied this question.