Ensuring the consistency of recurring ETL processes is a critical challenge in large-scale financial analytics, where upstream data changes—such as variable redefinitions, unit conversions (e.g., from days past due to number of overdue installments or currency changes), or erroneous submissions following source system updates—can silently degrade model reliability. These risks are amplified in automated modeling environments, where dozens of models are retrained monthly for each financial institution and the number of serviced institutions is expected to grow. This study presents an automated statistical monitoring framework for continuous quality assurance of monthly ETL outputs used in model development. The approach quantifies drift between a reference dataset and successive data deliveries using descriptive univariate and bivariate statistics combined with a normalized Canberra-based drift score, aggregated into interpretable variable-level stability measures. Sensitivity is evaluated through controlled noisification experiments with increasing Gaussian perturbations, demonstrating a monotonic decline in stability scores and consistent directional shifts in complementary metrics such as the Gini coefficient and Kolmogorov–Smirnov statistic. The results show that the framework effectively detects both subtle and large-scale distributional changes, providing a scalable, interpretable, and reproducible monitoring diagnostics suitable for fully automated financial data pipelines, with flexibility for extension.
Building similarity graph...
Analyzing shared references across papers
Loading...
Piryankov et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69a287b00a974eb0d3c039d4 — DOI: https://doi.org/10.3390/app16052232
Konstantin Piryankov
Iveta Grigorova
Aleksandar Karamfilov
Applied Sciences
Technical University of Sofia
Building similarity graph...
Analyzing shared references across papers
Loading...