Microbiome communities are complex ecosystems of microorganisms that play crucial roles in human health and environmental balance. Understanding their diversity and structure is key to revealing associations with disease and physiological function. This study developed an integrated computational pipeline to analyze microbiome datasets and uncover patterns related to health status. The workflow includes data preprocessing, alpha and beta diversity estimation, multivariate dimensionality reduction by principal component analysis (PCA), hierarchical clustering, and Random Forest–based feature selection. These combined approaches address major analytical challenges such as high dimensionality, sparsity, and inter-sample variability. Results showed that healthy samples exhibited higher microbial richness and evenness based on Shannon alpha diversity. Beta diversity and PCA analyses demonstrated clear separation between healthy and diseased groups, while hierarchical clustering confirmed consistent community patterns. Random Forest classification identified specific Operational Taxonomic Units (OTUs) as key discriminative features, suggesting their potential as microbial biomarkers. This study provides a comprehensive and interpretable framework for microbiome data analysis. Its novelty lies in integrating statistical, multivariate, and machine learning methods into a single workflow, enabling robust biological interpretation and supporting applications in biomarker discovery and microbial community profiling.
Çağın KANDEMİR ÇAVAŞ (Thu,) studied this question.