What question did this study set out to answer?

This research aims to develop an automated monitoring framework to ensure the quality of ETL processes in financial analytics.

February 28, 2026Open Access

Automated Data Monitoring Using a Canberra-Based Drift Score

Key Points

This research aims to develop an automated monitoring framework to ensure the quality of ETL processes in financial analytics.
Developed a statistical monitoring framework for ETL output quality assurance.
Quantified drift between reference dataset and incoming data using Canberra-based drift score.
Conducted controlled noisification experiments to evaluate sensitivity of drift detection.
Successful detection of subtle and large distributional changes in data.
Demonstrated a monotonic decline in stability scores with increasing Gaussian noise.
Showed consistent shifts in complementary metrics like Gini coefficient and Kolmogorov-Smirnov statistic.

Abstract

Ensuring the consistency of recurring ETL processes is a critical challenge in large-scale financial analytics, where upstream data changes—such as variable redefinitions, unit conversions (e.g., from days past due to number of overdue installments or currency changes), or erroneous submissions following source system updates—can silently degrade model reliability. These risks are amplified in automated modeling environments, where dozens of models are retrained monthly for each financial institution and the number of serviced institutions is expected to grow. This study presents an automated statistical monitoring framework for continuous quality assurance of monthly ETL outputs used in model development. The approach quantifies drift between a reference dataset and successive data deliveries using descriptive univariate and bivariate statistics combined with a normalized Canberra-based drift score, aggregated into interpretable variable-level stability measures. Sensitivity is evaluated through controlled noisification experiments with increasing Gaussian perturbations, demonstrating a monotonic decline in stability scores and consistent directional shifts in complementary metrics such as the Gini coefficient and Kolmogorov–Smirnov statistic. The results show that the framework effectively detects both subtle and large-scale distributional changes, providing a scalable, interpretable, and reproducible monitoring diagnostics suitable for fully automated financial data pipelines, with flexibility for extension.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Piryankov et al. (Thu,) studied this question.

www.synapsesocial.com/papers/69a287b00a974eb0d3c039d4 — DOI: https://doi.org/10.3390/app16052232

Authors

Konstantin Piryankov

Iveta Grigorova

Aleksandar Karamfilov

Journals

Applied Sciences

Actions

Institutions

Technical University of Sofia

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Automated Data Monitoring Using a Canberra-Based Drift Score

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion