Abstract Monitoring photovoltaic plants entails the use of techniques that are physically meaningful for detecting deviations in the observed behavior from what is expected. Data-driven methods frequently suffer from poor physical consistency, whereas physics-based models may fail to account for practical variations in operational behavior. In this study, we present a physics-driven machine learning methodology for estimating fault severity in grid-connected photovoltaic systems. The proposed methodology combines a calibrated PVsyst-based expected power generation model and supervisory control and data acquisition data collected from two 56.32 kilowatt-peak photovoltaic plants. Linear bias correction of the simulation results enhanced the correlation between the predicted and actual power generation levels, with coefficient of determination values of 0.685 and 0.566 for Plants 1 and 2, respectively. The Gaussian mixture model approach coupled with Bayesian information criterion tuning revealed the statistical fault-severity threshold, Rnorm = -0.161, for the discrimination between normal and severe states. For the prediction task, a random forest classifier was trained on seven physics-aware variables and achieved accuracy and macro-F1 scores of 0.920 and 0.907, respectively, through walk-forward validation. Comparative benchmarking with other classifiers, including XGBoost, support vector machine, and decision tree algorithms, showed a better classification performance balance. In addition, the explainability study verified the importance of irradiance, expected power, and conversion efficiency metrics.
Mohajon et al. (Thu,) studied this question.