What question did this study set out to answer?

This study aims to develop a physics-guided machine learning framework to detect performance deviations in photovoltaic systems and assess fault severity.

May 31, 2026Open Access

A Physics-Guided Explainable Machine Learning Framework for Residual-Based Performance Deviation Detection and Probabilistic Severity Assessment in Photovoltaic Systems

Key Points

This study aims to develop a physics-guided machine learning framework to detect performance deviations in photovoltaic systems and assess fault severity.
Combined a calibrated PVsyst power generation model with real-time data from two photovoltaic plants.
Implemented linear bias correction to enhance correlation between predicted and actual power generation levels.
Trained a random forest classifier using seven physics-aware variables for fault severity prediction.
Achieved coefficient of determination values of 0.685 and 0.566 for Plants 1 and 2, respectively.
Determined a fault-severity threshold of Rnorm = -0.161 for discerning normal from severe states.
Attained accuracy and macro-F1 scores of 0.920 and 0.907 for the random forest classifier.

Abstract

Abstract Monitoring photovoltaic plants entails the use of techniques that are physically meaningful for detecting deviations in the observed behavior from what is expected. Data-driven methods frequently suffer from poor physical consistency, whereas physics-based models may fail to account for practical variations in operational behavior. In this study, we present a physics-driven machine learning methodology for estimating fault severity in grid-connected photovoltaic systems. The proposed methodology combines a calibrated PVsyst-based expected power generation model and supervisory control and data acquisition data collected from two 56.32 kilowatt-peak photovoltaic plants. Linear bias correction of the simulation results enhanced the correlation between the predicted and actual power generation levels, with coefficient of determination values of 0.685 and 0.566 for Plants 1 and 2, respectively. The Gaussian mixture model approach coupled with Bayesian information criterion tuning revealed the statistical fault-severity threshold, Rnorm = -0.161, for the discrimination between normal and severe states. For the prediction task, a random forest classifier was trained on seven physics-aware variables and achieved accuracy and macro-F1 scores of 0.920 and 0.907, respectively, through walk-forward validation. Comparative benchmarking with other classifiers, including XGBoost, support vector machine, and decision tree algorithms, showed a better classification performance balance. In addition, the explainability study verified the importance of irradiance, expected power, and conversion efficiency metrics.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Mohajon et al. (Thu,) studied this question.

synapsesocial.com/papers/6a1bd0845783ba022b6fc45f https://doi.org/https://doi.org/10.1093/ce/zkag026

Bookmark

View Full Paper