What question did this study set out to answer?

The central aim is to develop an interpretable ensemble learning model to predict cardiovascular disease using a large patient database.

April 15, 2026

Interpretable Ensemble Learning for Cardiovascular Disease Prediction: A Large-Scale SHAP-Based Analysis

Key Points

The central aim is to develop an interpretable ensemble learning model to predict cardiovascular disease using a large patient database.
Developed three gradient boosting models.
Applied Synthetic Minority Over-sampling Technique to balance data.
Utilized Bayesian Hyperparameter Tuning for model optimization.
Achieved a test accuracy of 73.61%.
Reported an Area Under the Curve of 0.8022.
Identified systolic blood pressure as the most important feature.

Abstract

Cardiovascular disease is now the leading cause of death globally; however, there remains a significant barrier to the use of machine learning models to predict cardiovascular disease due to their lack of generalizability from smaller databases. The aim of this study was to develop an interpretable ensemble learning model based on a large database of over 70,000 patients. Specifically, this research created three advanced Gradient Boosting Models and enhanced these by Synthetic Minority Over-sampling Technique (SMOTE) for improving the data balance as well as Bayesian Hyperparameter Tuning for optimizing the models. The top performing Voting Ensemble had a test accuracy of 73.61% and an Area Under the Curve of 0.8022. SHAP results demonstrated that systolic blood pressure was the most important feature and that meaningful clinical thresholds existed at 120 mmHg for blood pressure and 48 years for age that aligned with established medical standards. Although the model’s accuracy was somewhat lower than the benchmark of 78.42%, the increased interpretability of the model makes it a promising tool for clinical decision support. Therefore, this research demonstrates that by combining powerful ensemble methods with strong explanations for decisions, more trustworthy AI systems in the field of cardiovascular medicine can be created.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yifeng Wang

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Interpretable Ensemble Learning for Cardiovascular Disease Prediction: A Large-Scale SHAP-Based Analysis

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study