What question did this study set out to answer?

This research aims to evaluate the effectiveness of four anomaly detection algorithms in handling extreme class imbalance in SCADA data.

February 14, 2026Open Access

Evaluating Reconstruction-Based and Proximity-Based Methods: A Four-Way Comparison (AE, LSTM-AE, OCSVM, IF) in SCADA Anomaly Detection Under Inverted Imbalance

Key Points

This research aims to evaluate the effectiveness of four anomaly detection algorithms in handling extreme class imbalance in SCADA data.
Comparison of four unsupervised anomaly detection algorithms: AE, LSTM-AE, OCSVM, and IF.
Analysis focused on telemetry data from an urban wind turbine.
Investigation of performance metrics, including AUC, recall, and F1-score.
AE achieved the best performance with AUC 0.9667 and high recall for both normal and anomaly classes.
IF demonstrated a strong AUC of 0.8616 but failed to detect the normal class (Recall Normal 0.00).
The need for a classification threshold limited the effectiveness of the IF model.

Abstract

This article investigates and compares four unsupervised anomaly detection algorithms: the Autoencoder (AE), LSTM-Autoencoder (LSTM-AE), One-Class SVM (OCSVM), and the Isolation Forest (IF). The analysis focuses on SCADA telemetry data from an urban wind turbine, characterized by a unique case of extreme inverted class imbalance, where operational anomalies constitute 75.7% of the records. The AE model, trained exclusively on the rare normal state, achieved the best overall performance (AUC 0.9667), maintaining balanced and high classification effectiveness for both classes (Recall Normal ≈ 95%, Recall Anomaly ≈ 88.5%; Macro F1-Score 0.8962). In contrast, the IF model, despite a strong discriminative ability (AUC 0.8616), exhibited a complete inability to correctly recognize the normal class (Recall Normal 0.00) when using the optimal F1-score threshold. This performance degradation was a direct consequence of the necessity to apply a classification threshold imposed by the statistical fraction of the anomaly-dominated dataset. These results empirically demonstrate the methodological superiority of the reconstruction-based approach (AE) in constructing a stable decision boundary independent of the statistically dominant class. The study provides quantitative guidelines for the selection and calibration of algorithms in PHM diagnostic systems where states deviating from the operational norm constitute the majority.

Bookmark

View Full Paper

Cite This Study

Lukasz Pawlik (Wed,) studied this question.

synapsesocial.com/papers/699010df2ccff479cfe572bb https://doi.org/https://doi.org/10.3390/fi18020096

Bookmark

View Full Paper