Heart disease, a leading cause of death worldwide, accounts for 31% of global fatalities and requires effective early detection methods to combat its rising prevalence. Early detection and prediction of heart disease remain one of the most pressing challenges in current healthcare. In recent years, machine learning (ML) technologies have offered opportunities to address these inequities by improving heart disease detection and prediction capabilities. This study offers a comparative evaluation of seven machine learning models: Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Naïve Bayes (NB), Support Vector Machine (SVM), Artificial Neural Networks (ANN), and K-Nearest Neighbors (KNN) for classifying heart disease. Using the ‘BRFSS 2020 Heart Disease Dataset’, this research examines the effects of dataset balancing with various feature selection techniques and an ensemble method with bagging on classification and prediction accuracy. Three feature selection methods ANOVA, Chi-Square, and Regression Analysis were tested through eight different combinations based on union and intersection of these methods: (i) ANOVA ∪ Chi-Square, (ii) ANOVA ∪ Regression, (iii) Chi-Square ∪ Regression, (iv) ANOVA ∪ Chi-Square ∪ Regression, (v) ANOVA ∩ Chi-Square, (vi) ANOVA ∩ Regression, (vii) Chi-Square ∩ Regression, and (viii) ANOVA ∩ Chi-Square ∩ Regression. Experimental results demonstrate that with a balanced dataset, RF and DT achieved the highest accuracies of 85% and 82%, respectively. Besides, the outcome of the balanced dataset incorporating feature selection techniques indicates that ANOVA-based feature selection was associated with higher performance under the ANOVA ∪ Chi-Square and ANOVA ∪ Chi-Square ∪ Regression feature combinations, where RF reached the highest accuracy (92%), recall (93%), and AUC score (0. 92). Additionally, bagging-based ensemble techniques improved performance for certain high-variance models (DT, RF, and ANN) when applied to the balanced dataset, although the impact varied across models. Despite promising accuracy with dataset balancing incorporating an ensemble method, the recall and AUC scores were relatively low, indicating many positive cases were missing. Consequently, dataset balancing combined with feature selection techniques showed comparatively improved performance across several evaluation metrics under the specific experimental setup. These findings provide comparative insights into preprocessing strategies and optimal machine learning models for heart disease classification, which would be helpful for future research.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jinat Ara
Hanif Bhuiyan
Isfara Islam Roza
Scientific Reports
University of Pannonia
Ahsanullah University of Science and Technology
Gold Coast City Council
Building similarity graph...
Analyzing shared references across papers
Loading...
Ara et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69d893a86c1944d70ce0497f — DOI: https://doi.org/10.1038/s41598-026-47691-4