What question did this study set out to answer?

This research aimed to develop a prediction model for assessing COVID-19 infection risk in international travelers.

February 28, 2026Open Access

Acute respiratory infection (COVID-19) risk prediction in travelers: A random forest model

Key Points

This research aimed to develop a prediction model for assessing COVID-19 infection risk in international travelers.
Constructed prediction models using data from passenger itineraries.
Applied random forest and multivariate logistic regression techniques for modeling.
Evaluated model performance using sensitivity, specificity, accuracy, AUC, and Brier score.
Conducted variable importance analysis to identify key predictive factors.
The random forest model demonstrated superior discriminative ability compared to the multivariate regression model.
Key predictors included close contacts, flight risk, and sojourn risk, with varying infection prevalence rates.
Infection rates were significantly higher among high-risk travelers compared to low-risk categories.

Abstract

Early screening during outbreaks of acute respiratory infections (ARIs) is critical for controlling disease spread among international travelers. However, the massive volume of traveler data generated in a short timeframe makes manual screening of suspected cases impractical for health quarantine officers. Prediction models for infection offer a promising solution to this challenge. Key predictive variables including travel history and seat numbers were extracted from passenger itineraries to construct the risk assessment model. Random forest algorithm and multivariate logistic regression were used to build prediction models of COVID-19 infection separately. Compare their performance through sensitivity(recall for the positive class), specificity, accuracy, AUC and Brier score. Sort the importance of variables through random forest algorithm. The random forest model exhibited better discriminative ability and calibration. Variable importance analysis revealed travel history-derived factors as top predictors: close contacts(0.419), flight risk (0.286), and sojourn risk (0.182). Infection prevalence stratified by risk level: flight risk: low risk vs high risk: 0.7% vs 1.4%; sojourn risk: low risk vs high risk: 0.7% vs 2.0%; close contacts vs non-close contact: 0.3% vs 2.4%. The prediction model based on random forest algorithm has a better performance in identifying infected passengers than multivariate regression model. We should pay more attention on variables extracted by epidemiological history in building prediction model of respiratory infectious diseases. This model demonstrates strong potential for effectively responding to future outbreaks of acute infectious diseases such as COVID-19.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper