What question did this study set out to answer?

The aim is to develop a robust method for customer segmentation using hierarchical clustering on mixed-type data.

April 15, 2026Open Access

Subsampling-Based Consensus Hierarchical Clustering for Robust Customer Segmentation with Mixed-Type Data

Key Points

The aim is to develop a robust method for customer segmentation using hierarchical clustering on mixed-type data.
Employed agglomerative hierarchical clustering with Gower dissimilarity for mixed data.
Integrated MICE and Winsorization for data preprocessing to handle missing values and outliers.
Used silhouette analysis and Davies–Bouldin Index to assess cluster stability and optimal cluster number.
Applied a consensus-based approach to validate clustering results and compare with baseline algorithms.
Identified distinct customer segments with significant behavioral differences.
Confirmed cluster robustness through statistical tests and consensus clustering framework.
Results suggest improved targeting strategies for customer engagement.

Abstract

Hierarchical clustering is an unsupervised framework that organizes observations according to pairwise similarity relationships. In this study, an agglomerative hierarchical approach combined with Gower dissimilarity is employed to accommodate mixed-type customer data. To address data quality issues such as missing values and outliers, Multiple Imputation by Chained Equations (MICE) and Winsorization are incorporated into the preprocessing pipeline. To validate cluster stability and identify the optimal number of clusters, we employ silhouette analysis, the Davies–Bouldin Index (DBI), the Proportion of Ambiguous Clustering (PAC), and a subsampling-based consensus clustering framework. A consensus-based hierarchical tree derived from the consensus matrix is employed to assess the robustness of the segmentation structure. The resulting clusters are further evaluated through comparisons with baseline algorithms for mixed-type data, including Partitioning Around Medoids (PAM) based on Gower dissimilarity and the K-prototypes method, together with statistical tests confirming significant behavioral differences between the identified segments. From an application standpoint, these results provide a data-driven basis for customer targeting by identifying distinct behavioral patterns, thereby supporting more effective engagement strategies and optimized resource allocation.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Nooshin Marefat

Purificación Galindo‐Villardón

Purificación Vicente-Galindo

Journals

Mathematics

Actions

Institutions

Universidad de Salamanca

Escuela Superior Politecnica del Litoral

Universidad Estatal de Milagro

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Subsampling-Based Consensus Hierarchical Clustering for Robust Customer Segmentation with Mixed-Type Data

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study