Time series data frequently suffer from data quality problems during collection and transmission, such as small-jump dirty points, which existing cleaning methods often fail to detect. Since existing methods primarily address univariate series, their multivariate extensions often fail to capture complex inter-variable dependencies, significantly limiting their effectiveness. To this end, we propose SHoTClean, a family of four algorithms that bridges hard constraints (i.e., physical limits) and soft constraints (i.e., statistical patterns) within a constrained-optimization framework for effective and efficient multivariate time series cleaning. Specifically, we formulate the cleaning task as minimizing soft-constraint violations while respecting hard-constraint bounds. Then, we propose SHoTClean that introduces: (1) SHoTClean-B for offline batch processing using pruned dynamic programming to achieve global optimality; (2) SHoTClean-S and SHoTClean-P for online streaming scenarios by employing incremental dynamic programming, where SHoTClean-P accelerates SHoTClean-S via CDQ divide-and-conquer and Fenwick tree to attain near-linear complexity; and (3) SHoTClean-C, incorporating causal discovery into soft constraints to capture multivariate dependencies. Extensive experiments across 12 real-world datasets demonstrate that our approaches achieve i) 6.8%--90.0% and 7.8%--82.1% improvements in accuracy (RMSE metric) over 10 state-of-the-art baselines in offline and online settings, respectively; ii) an average two-order-of-magnitude runtime speed-up on large-scale datasets; and iii) superior robustness, with consistent high performance under extreme 80% contamination level and high-dimensional datasets. The code is available at https://github.com/ZJU-DAILY/SHoTClean.
Fang et al. (Thu,) studied this question.