Which imputation method performs best for continuous ECG time-series data under realistic missingness patterns?
40 patients from a pilot cohort with continuous 12-lead Holter recordings (ranging from 2.5 to 4 hours per patient)
Seven imputation methods (global mean, linear interpolation, K-Nearest Neighbors [KNN], Multiple Imputation by Chained Equations [MICE], softImpute, xgbooSt MIssing vaLues in timE Series [SMILES], and Self-Attention-based Imputation for Time Series [SAITS])
Comparison among imputation methods and between random vs. observed pattern-based missingness
Mean absolute error (MAE) across masking approaches, missingness levels, and missingness patternssurrogate
Self-Attention-based Imputation for Time Series (SAITS) provides superior imputation for continuous ECG data, and evaluating models using realistic pattern-based missingness is crucial as random masking may underestimate errors.
The utilization of continuous ECG monitoring has become an integral part of modern hospital-based care. However, missing data presents significant challenges in deploying real-time ECG-based predictive systems. Research on the implementation of imputation techniques on time-series ECG is limited. Furthermore, the performance of imputation techniques is typically benchmarked using random masking, which may not reflect the real-world missingness patterns encountered in clinical practice. This study aimed to evaluate and benchmark a range of imputation methods, from conventional statistical approaches to state-of-the-art deep learning models, using continuous ECG time-series data under different missingness conditions, including both random (conventional) and observed pattern-based (realistic) missingness. Time-domain features were extracted from continuous 12-lead Holter recordings (ranging from 2.5 to 4 hours per patient) from a pilot cohort of 40 patients. Missingness was introduced using random and pattern-based masking. We compared seven imputation methods: global mean, linear interpolation, K-Nearest Neighbors (KNN), Multiple Imputation by Chained Equations (MICE), softImpute, xgbooSt MIssing vaLues in timE Series (SMILES), and Self-Attention-based Imputation for Time Series (SAITS). Performance was evaluated using mean absolute error (MAE) across masking approaches, missingness levels, and missingness patterns. Overall, the MAEs for all seven imputations are higher under pattern-based masking than random masking. SAITS achieved the best performance across both masking approaches (MAEs of 0.277 and 0.146; standard deviations of absolute error of 0.398 and 0.252 for observed-pattern and random masking, respectively). Simpler methods such as SoftImpute and KNN showed comparable performance across both masking approaches, and particularly under certain missingness levels. Artificially masking by random may underestimate the accuracy of time-series imputation in real-world scenarios. Our findings underscore the importance of context-based imputation strategies (i.e., masking approach and imputation method) and balancing model complexity with practical considerations (e.g., resources, costs, and level of missingness) for real-time deployment.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sukardi Suba
Alexander Novak
Xiaojuan Xia
University of California, San Francisco
University of Rochester
Building similarity graph...
Analyzing shared references across papers
Loading...
Suba et al. (Fri,) studied this question.
synapsesocial.com/papers/69a75a3cc6e9836116a1fd48 — DOI: https://doi.org/10.64898/2026.01.14.26344164