SCIO-Bench represents an open-source anomaly detection benchmark and evaluation framework engineered for off-grid renewable energy IoT systems—specifically solar photovoltaic (PV) and battery storage installations deployed in remote, developing regions like Indonesia's 3T (Frontier, Outermost, Underdeveloped) areas. Unlike conventional utility-scale smart grid research that focuses on temperate climates with abundant labeled data, this research tackles the unique hurdles of remote off-grid environments: severe resource constraints on edge devices, extreme tropical weather variability (such as prolonged monsoons), and the absence of labeled fault telemetry. The Dataset and Evaluation Framework To overcome the lack of field data, the researchers developed the SCIO-Bench dataset by utilizing real-world solar generation telemetry from tropical climates (India) and augmenting it with a synthesized, physics-grounded battery state-of-charge model. The dataset was injected with six distinct anomaly types—ranging from natural panel degradation and sudden drops to adversarial False Data Injection (FDI) cyber-attacks—at a realistic overall contamination rate of under 10%. To expose adversarial manipulation, the researchers engineered invariant relational features based on physical laws, most notably the physicsᵣesidual (P - V I). It is emphasized that the off-grid battery channels and anomaly labels are synthetic, making SCIO-Bench a semi-synthetic reproducible testbed rather than a direct substitute for field-collected fault data. The study systematically benchmarks three distinct detection methodologies + LOF as addition baselin: An Adaptive-MAD (Median Absolute Deviation) rule-based baseline utilizing a rolling 7-day window to build resilience against seasonal concept drift. Isolation Forest, a classical unsupervised machine learning model. A semi-supervised Long Short-Term Memory Autoencoder (LSTM-AE), uniquely quantized to INT8 format using TensorFlow Lite for extreme edge deployment feasibility. Local Outlier Factor (LOF). Key Finding 1: The "Offline vs. Nighttime" Physical Ambiguity A profound empirical discovery of this research is the quantification of a conditional detection ceiling for purely electrical telemetry. The study reveals a fundamental physical ambiguity: an off-grid solar device going offline produces zero power and registers zero irradiance, which is mathematically indistinguishable from a perfectly functioning panel during normal nighttime operation when time-of-day context is excluded. Because of this identical measurement signature within the evaluated feature set, all reported methods failed entirely (F1 = 0. 00) to detect offline events and sudden panel drops without external meteorological context or a network heartbeat signal. Furthermore, detecting anomalies during the "Tropical Weather Stress Test" (Extended Low Irradiance) proved highly challenging, as thick cloud cover closely mimics hardware failure, causing massive false positive spikes in rigid rule-based systems without adaptive thresholds. Key Finding 2: High Success in Specific Cyber-Attacks (False Data Injection) In contrast to the struggles with natural weather anomalies, the research demonstrated high success in a specific cyber-physical discrimination scenario. All evaluated models achieved a perfect F1-Score of 1. 000 in detecting the modeled False Data Injection (FDI) attacks. Because this specific FDI injection manipulated individual sensor readings (like voltage and current) to violate the P = V I relation, the models successfully leveraged the physicsᵣesidual feature to perfectly flag these anomalies. However, the researchers note this perfect score is partly a consequence of the synthetic injection design and does not establish robustness against more sophisticated FDI variants that might preserve the P = V I relationship. Key Finding 3: Hardware-Algorithmic Co-Design and Edge AI Feasibility To accommodate the strict power budgets of off-grid systems, the researchers proposed a two-layer hierarchical architecture. A continuous, lightweight Layer 1 (L1) detector is designed to run on a low-power microcontroller, which only wakes up a heavier Layer 2 (L2) Random Forest classifier on a single-board computer (like a Raspberry Pi 4) when an anomaly is suspected, keeping the estimated L2 duty cycle under 1%. Crucially, the L1 LSTM-AE model underwent TFLite INT8 quantization, which delivered significant simulated efficiency gains. The quantization reduced the host-measured inference latency 144-fold (from 44. 68 ms down to just 0. 31 ms) and compressed its size to 150. 6 KB with less than 0. 001 MB of Python-level peak RAM usage. This indicates the feasibility of deploying advanced deep learning sequential models on resource-deprived microcontrollers like the ESP32-S3 in remote regions, though it remains pending physical on-device validation. Ultimately, this research provides an essential, reproducible semi-synthetic blueprint for securing and monitoring decentralized renewable energy infrastructure, combining Explainable AI (SHAP) with pragmatic hardware optimization pathways.
Riyan et al. (Wed,) studied this question.