Since cluster analysis methods usually cannot be applied directly to data with missing values, various approaches have been investigated to handle the problem. Multiple imputation is one of the standard procedures for addressing the problem of missing data. In cluster analysis, instead of Rubin's rule, cluster ensemble methods have been proposed to be combined with multiple imputation. However, it remains unrevealed which of the cluster ensemble algorithms leads to better performance when integrated with the procedure. Therefore, we conducted numerical comparisons of several algorithms to integrate the results from k-means++ clustering for multiply imputed datasets and also applied the combined approaches to two real datasets. Our results suggest that the non-negative matrix factorization algorithm may be suitable for scenarios with class balance, whereas the greedy and agglomerative cluster algorithms may be suitable for scenarios with class imbalance. Before application to actual datasets, we still recommend performing simulation experiments in scenarios reflecting the characteristics of the datasets and the assumption of missing value mechanisms.
Tomo et al. (Mon,) studied this question.