3,114 US counties with data on diagnosed diabetes incidence (2004-2019) and 34 sociodemographic factors
Machine learning models (elastic net regression, extreme gradient boosting [XGBoost], support vector machine [SVM])
Model performance for estimating diabetes incidence and classifying higher-burden counties (incidence >12.6 per 1000 persons)
Machine learning models demonstrated high discrimination in identifying US counties with a high burden of diabetes using sociodemographic factors, highlighting variables like children living with grandparent householders as key predictors.
: To evaluate county-level incidence of diagnosed diabetes and key sociodemographic factors in a high-dimensional, nonlinear setting. : This temporally aggregated observational study used US CDC data on county-level incidence of diagnosed diabetes, from 2004−2019, and 34 sociodemographic factors from public databases. We defined counties as higher-burden if diabetes incidence was >12.6 per 1000 persons (1 standard deviation SD above sample mean). As relationships between sociodemographic factors and diabetes incidence may be nonlinear and involve complex interactions, we trained three machine learning models to estimate incidence (elastic net regression), classify counties as higher-burden (extreme gradient boosting XGBoost, support vector machine SVM), and identify feature importance. Model performance was evaluated using 5-fold cross-validation, with stratified folds for XGBoost and SVM models. : Overall, 500 of 3114 counties (16.1%) were of higher-burden. Elastic net regression showed good predictive performance for estimating diabetes incidence (R 2 0.78 95% CI, 0.75–0.80). For classification of higher-burden counties, SVM and XGBoost showed high discrimination with AUROC of 0.962 (95% CI, 0.948–0.974) and 0.957 (95% CI, 0.941–0.971), respectively. Sensitivity analyses using alternative definitions of higher-burden counties (mean + 0.75×SD; mean + 1.25×SD) yielded comparable results. Across all three models, key county-level features contributing to model predictions were percentages of children living with grandparent householders and of people with Limited English . : Machine learning models demonstrated consistent performance in estimating and classifying county-level diabetes incidence, with high discrimination for identifying higher-burden counties. Sociodemographic factors, including children living with grandparent householders , may inform tailored public health interventions.
Building similarity graph...
Analyzing shared references across papers
Loading...
Alexander S. Keigley
Shant Ayanian
SAGAR DUGANI
American Journal of Medicine Open
Mayo Clinic in Arizona
Mayo Clinic in Florida
Building similarity graph...
Analyzing shared references across papers
Loading...
Keigley et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69ca1280883daed6ee09502e — DOI: https://doi.org/10.1016/j.ajmo.2026.100132