Key points are not available for this paper at this time.
Obesity is a major public health concern because of its high prevalence and association with cardiometabolic comorbidities. This study compared nine ensemble and meta-ensemble learning models for multiclass obesity-status classification using the Obesity Dataset, comprising 1610 records, 14 predictors, and four body-weight status classes. To ensure a leakage-aware evaluation, all preprocessing and resampling steps were embedded within the validation workflow. Standardization, one-hot encoding, and RandomOverSampler were applied only within the training folds; SMOTE and no-resampling configurations were retained as configurable alternatives but were not used to generate the reported results. Model performance was assessed using complementary classification, discrimination, agreement, and calibration metrics, including accuracy, balanced accuracy, weighted F1-score, macro F1-score, weighted ROC-AUC, Matthews correlation coefficient, Brier score, and multiclass expected calibration error. Overall, the ensemble models achieved strong discriminative performance, with eight of nine classifiers exceeding 82% accuracy and obtaining weighted ROC-AUC values close to or above 94%. LightGBM showed the strongest mean metric-based profile, with an accuracy of 85.41 ± 2.85%, weighted F1-score of 85.25 ± 2.88%, weighted ROC-AUC of 95.58 ± 1.52%, and MCC of 0.779 ± 0.042. Random Forest and Stacking achieved comparable classification performance, although Stacking presented poorer calibration. The Friedman test detected significant global differences among classifiers, χ2 = 38.7733, p = 0.000005. However, the Nemenyi post hoc test indicated that Stacking, Random Forest, LightGBM, Voting, Gradient Boosting, and Extra Trees belonged to the same high-performance statistical group. Therefore, LightGBM was selected as the final model based on its practical balance of predictive performance, calibration behavior, stability, and implementation feasibility, rather than on unequivocal statistical superiority. On the independent holdout set, LightGBM maintained strong generalization, achieving accuracy = 0.8447, weighted F1-score = 0.8435, MCC = 0.7653, and weighted ROC-AUC = 0.9464. Calibration was moderate, with Brier score = 0.2575 and multiclass ECE = 0.1070, indicating that predicted probabilities should be interpreted cautiously when used to support threshold-based decisions.
Building similarity graph...
Analyzing shared references across papers
Loading...
Daniel Andrade-Girón
Universidad Nacional José Faustino Sánchez Carrión
William Marin-Rodriguez
Universidad Nacional José Faustino Sánchez Carrión
Américo Peña
Universidad Nacional José Faustino Sánchez Carrión
Informatics
Universidad Nacional José Faustino Sánchez Carrión
Building similarity graph...
Analyzing shared references across papers
Loading...
Andrade-Girón et al. (Wed,) studied this question.
synapsesocial.com/papers/6a21fbe500d082f62f96ec79 — DOI: https://doi.org/10.3390/informatics13060080