This study aimed to develop an efficient and cost-saving diagnostic approach using natural language processing and explainable machine learning models. 11,863 Influenza-like illness cases from Huzhou City, China for four common respiratory viruses were collected: SARS-CoV-2, influenza, respiratory syncytial virus, and adenovirus. Natural language processing techniques were employed to extract and normalize symptom features from unstructured clinical text. Five machine learning algorithms were evaluated using AUC, accuracy, sensitivity, and specificity to select the best-performing model. Subgroup analyses by age, sex, and fever status assessed model robustness, and SHAP values were calculated for interpretability. Compared with existing diagnostic tools, our model demonstrated higher accuracy and better predictive performance, with AUCs of 0.856 (95% CI: 0.830–0.881) for SARS-CoV-2, 0.737 (95% CI: 0.713–0.760) for Influenza, 0.801 (95% CI: 0.744–0.857) for RSV, and 0.782 (95% CI: 0.748–0.816) for adenovirus, showing particularly high capability for SARS-CoV-2 and RSV. Subgroup analyses showed particularly excellent discriminative accuracy in pediatric or afebrile patients. This study demonstrates the feasibility of integrating natural language processing and machine learning techniques for identification of respiratory viruses based solely on symptoms, and offers a low-cost and efficient alternative to PCR testing, which can reduce reliance on resource-intensive testing and enhance early detection in clinical practice. This approach can support early screening and resource allocation in both clinical and public health settings.
Building similarity graph...
Analyzing shared references across papers
Loading...
Mingqing Xie
Suyi Zhang
Jianyong Shen
Infectious Disease Modelling
Fudan University
Zhejiang Center for Disease Control and Prevention
Building similarity graph...
Analyzing shared references across papers
Loading...
Xie et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69db36a04fe01fead37c49b2 — DOI: https://doi.org/10.1016/j.idm.2026.04.006