To develop a machine learning model for early prediction of gestational diabetes mellitus (GDM) using routinely available first-trimester clinical and ultrasound data in Northern Chinese women. This multicenter prospective cohort study enrolled pregnant women from three hospitals in Northern China. We integrated first-trimester maternal characteristics and standardized ultrasound measurements of subcutaneous (SAT) and visceral (VAT) adipose tissue thickness. After addressing potential selection bias via inverse probability weighting, we employed a genetic algorithm (GA) for robust feature selection and evaluated five machine learning classifiers (XGBoost, ANN, SVM, MLR, RF). Model performance was assessed on an internal test set and an independent external validation set, with AUC as the primary metric. The GA consistently selected BMI, SAT, and VAT as core predictive features. The model combining GA-selected features with XGBoost demonstrated the highest performance, achieving an AUC of 0.962 on the internal test set and 0.878 on the external validation set, with corresponding sensitivities of 0.90 and 0.70, and specificities of 0.942 and 0.935, respectively. It significantly outperformed models using other feature selection methods or classifiers (all P < 0.001). The model exhibited robust stability across various sensitivity analyses. A machine learning model based on readily accessible first-trimester indicators provides an effective tool for early, opportunistic GDM risk stratification in Northern Chinese women.
Zhai et al. (Mon,) studied this question.