What question did this study set out to answer?

This study aims to develop a machine learning model to predict gestational diabetes mellitus (GDM) early using clinical and ultrasound data.

March 12, 2026Open Access

Opportunistic screening data for early prediction of GDM in Northern Chinese women: a multicenter machine learning study

Key Points

This study aims to develop a machine learning model to predict gestational diabetes mellitus (GDM) early using clinical and ultrasound data.
Conducted a multicenter prospective cohort study with pregnant women from three hospitals in Northern China.
Integrated first-trimester maternal characteristics and ultrasound measurements of subcutaneous and visceral adipose tissue thickness.
Applied a genetic algorithm for feature selection and evaluated multiple machine learning classifiers.
Assessed model performance through an internal test set and an independent external validation set.
The model achieved an AUC of 0.962 on the internal test set and 0.878 on the external validation set.
Sensitivity was 0.90 for the internal test set and 0.70 for the external validation set.
Specificity was 0.942 for the internal test set and 0.935 for the external validation set.
The model outperformed others using different feature selection methods or classifiers, with all results being statistically significant.

Abstract

To develop a machine learning model for early prediction of gestational diabetes mellitus (GDM) using routinely available first-trimester clinical and ultrasound data in Northern Chinese women. This multicenter prospective cohort study enrolled pregnant women from three hospitals in Northern China. We integrated first-trimester maternal characteristics and standardized ultrasound measurements of subcutaneous (SAT) and visceral (VAT) adipose tissue thickness. After addressing potential selection bias via inverse probability weighting, we employed a genetic algorithm (GA) for robust feature selection and evaluated five machine learning classifiers (XGBoost, ANN, SVM, MLR, RF). Model performance was assessed on an internal test set and an independent external validation set, with AUC as the primary metric. The GA consistently selected BMI, SAT, and VAT as core predictive features. The model combining GA-selected features with XGBoost demonstrated the highest performance, achieving an AUC of 0.962 on the internal test set and 0.878 on the external validation set, with corresponding sensitivities of 0.90 and 0.70, and specificities of 0.942 and 0.935, respectively. It significantly outperformed models using other feature selection methods or classifiers (all P < 0.001). The model exhibited robust stability across various sensitivity analyses. A machine learning model based on readily accessible first-trimester indicators provides an effective tool for early, opportunistic GDM risk stratification in Northern Chinese women.

Bookmark

View Full Paper

Bookmark

View Full Paper

Opportunistic screening data for early prediction of GDM in Northern Chinese women: a multicenter machine learning study

Key Points

Abstract

Cite This Study