Time series data frequently suffer from data quality problems during collection and transmission, such as small-jump dirty points, which existing cleaning methods often fail to detect. Since existing methods primarily address univariate series, their multivariate extensions often fail to capture complex inter-variable dependencies, significantly limiting their effectiveness. To this end, we propose SHoTClean, a family of four algorithms that bridges hard constraints (i.e., physical limits) and soft constraints (i.e., statistical patterns) within a constrained-optimization framework for effective and efficient multivariate time series cleaning. Specifically, we formulate the cleaning task as minimizing soft-constraint violations while respecting hard-constraint bounds. Then, we propose SHoTClean that introduces: (1) SHoTClean-B for offline batch processing using pruned dynamic programming to achieve global optimality; (2) SHoTClean-S and SHoTClean-P for online streaming scenarios by employing incremental dynamic programming, where SHoTClean-P accelerates SHoTClean-S via CDQ divide-and-conquer and Fenwick tree to attain near-linear complexity; and (3) SHoTClean-C, incorporating causal discovery into soft constraints to capture multivariate dependencies. Extensive experiments across 12 real-world datasets demonstrate that our approaches achieve i) 6.8%--90.0% and 7.8%--82.1% improvements in accuracy (RMSE metric) over 10 state-of-the-art baselines in offline and online settings, respectively; ii) an average two-order-of-magnitude runtime speed-up on large-scale datasets; and iii) superior robustness, with consistent high performance under extreme 80% contamination level and high-dimensional datasets. The code is available at https://github.com/ZJU-DAILY/SHoTClean.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhenghan Fang
Wei Shao
Zheqi Lu
Proceedings of the ACM on Management of Data
Zhejiang University
Ningbo University
Zhejiang University of Science and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Fang et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69d893406c1944d70ce04443 — DOI: https://doi.org/10.1145/3786698