The prevalence of massive, multi-scale, high-dimensional, and dynamic data sets resulting from advances in information and network communication technologies is frequently hampered by data incompleteness, a consequence of complex network structures and constrained sensor capabilities. The necessity of complete data for effective data analysis and mining mandates robust preprocessing techniques. This comprehensive survey systematically reviews missing value interpolation methodologies specifically tailored for time series flow network data, organizing them into four principal categories: classical statistical algorithms, matrix/tensor-based interpolation methods, nearest-neighbor-weighted methods, and deep learning generative models. We detail the evolution and technical underpinnings of diverse approaches, including mean imputation, the ARMA family, matrix factorization, KNN variants, and the latest deep generative paradigms such as GANs, VAEs, normalizing flows, autoregressive models, diffusion probabilistic models, causal generative models, and reinforcement learning generative models. By delineating the strengths and weaknesses across these categories, this survey establishes a structured foundation and offers a forward-looking perspective on state-of-the-art techniques for missing data generation and imputation in complex networks.
Shao et al. (Tue,) studied this question.