What question did this study set out to answer?

This research aims to enhance heart disease prediction accuracy using machine learning techniques.

March 7, 2026Open Access

Unified approach for accurate heart disease prediction using machine learning techniques

Key Points

This research aims to enhance heart disease prediction accuracy using machine learning techniques.
Utilized the XGBoost algorithm for predicting heart disease.
Employed advanced feature selection based on importance scores from XGBoost.
Applied Optuna for automated hyperparameter tuning.
Conducted analysis on a public dataset sourced from Kaggle and UCI datasets.
Achieved 99.02% accuracy in predicting heart disease.
Reported 99.813% precision and 100% recall in model performance.
Reached a 99.05% F1-score and ROC-AUC of 0.9998, indicating high reliability.

Abstract

Cardiovascular diseases (CVDs) account for a large share of worldwide morbidity, disability, and premature mortality, posing a critical challenge to public health. The risk and severity of these conditions can be greatly reduced by adopting early identification and proactive treatment strategies. As part of this effort, the main focus has been to estimate the probability that an individual will experience major cardiovascular events. Machine learning offers a promising alternative to conventional risk models, enhancing the accuracy of health outcome predictions. A machine learning pipeline that can predict heart disease using the XGBoost algorithm, advanced feature selection techniques, and automated hyperparameter tuning with Optuna is presented in this research. Initially, focus on dataset to identify imbalance if any exist, derived important features using XGBoost-based importance scores, which improved model interpretability and reduced dimensionality. Optuna’s Tree-structured Parzen Estimator (TPE) sampler was used to efficiently optimize the classification model by exploring the hyperparameter space. The final model outperformed the test dataset, proving 99.02% accuracy, 99.813% precision, 100% recall, 99.05% F1-score, and ROC-AUC of 0.9998. The dataset, which was obtained from Kaggle has instances from four original UCI datasets (Cleveland, Hungary, Switzerland, and Long Beach V) were pre-merged and made available to the public as the Kaggle heart dataset, and each has 14 features. The results highlight that integrating ensemble learning, feature selection, and hyperparameter tuning enhances the reliability of predictive models for cardiovascular disease detection.

Bookmark

View Full Paper

Bookmark

View Full Paper

Unified approach for accurate heart disease prediction using machine learning techniques

Key Points

Abstract

Cite This Study