Cardiovascular diseases (CVDs) account for a large share of worldwide morbidity, disability, and premature mortality, posing a critical challenge to public health. The risk and severity of these conditions can be greatly reduced by adopting early identification and proactive treatment strategies. As part of this effort, the main focus has been to estimate the probability that an individual will experience major cardiovascular events. Machine learning offers a promising alternative to conventional risk models, enhancing the accuracy of health outcome predictions. A machine learning pipeline that can predict heart disease using the XGBoost algorithm, advanced feature selection techniques, and automated hyperparameter tuning with Optuna is presented in this research. Initially, focus on dataset to identify imbalance if any exist, derived important features using XGBoost-based importance scores, which improved model interpretability and reduced dimensionality. Optuna’s Tree-structured Parzen Estimator (TPE) sampler was used to efficiently optimize the classification model by exploring the hyperparameter space. The final model outperformed the test dataset, proving 99.02% accuracy, 99.813% precision, 100% recall, 99.05% F1-score, and ROC-AUC of 0.9998. The dataset, which was obtained from Kaggle has instances from four original UCI datasets (Cleveland, Hungary, Switzerland, and Long Beach V) were pre-merged and made available to the public as the Kaggle heart dataset, and each has 14 features. The results highlight that integrating ensemble learning, feature selection, and hyperparameter tuning enhances the reliability of predictive models for cardiovascular disease detection.
Rao et al. (Wed,) studied this question.