Background: Thalassemia is an inherited autosomal recessive hematologic disorder characterized by chronic hemolytic anemia due to impaired synthesis of globin chains. A three-level prevention system has effectively reduced the birth rate of affected children and enabled early intervention in Guangxi, China. However, conventional blood tests lack accuracy, and hemoglobin electrophoresis has low predictive value for α-thalassemia with low HbA₂. While genetic testing remains the gold standard, its high-cost limits large-scale use. Thus, a fast, low-cost, and reliable screening method is urgently needed for effective thalassemia control. Methods:An AI-driven detection system for intelligent recognition and quantitative analysis of erythrocyte morphology was developed using a large-scale dataset. This system was applied to both thalassemia and control samples to identify erythrocytes. The eXtreme Gradient Boosting (XGB) and logistic regression (LR) models were developed to distinguish thalassemia cases from controls, classify thalassemia subtypes, and differentiate thalassemia from iron deficiency anemia (IDA) based on the morphological features of red blood cell (RBC) identified by the system. Results: The XGB and LR models achieved AUROC values of 0.92 and 0.98, respectively, for distinguishing thalassemia cases from controls. Subclass analysis showed that the XGB model reached an AUROC of 0.99 for α-thalassemia and 0.98 for β-thalassemia, while the LR model achieved an AUROC of 0.95 for α-thalassemia and 0.99 for β-thalassemia. The LR model outperformed the XGB model (AUROC 0.88 vs. 0.75) in differentiating IDA from thalassemia. Conclusions: Both the XGB and LR models demonstrate high accuracy in predicting thalassemia and distinguishing its subtypes, based on using RBC morphological features extracted by an AI-driven detection system.
Gui et al. (Fri,) studied this question.