Early screening during outbreaks of acute respiratory infections (ARIs) is critical for controlling disease spread among international travelers. However, the massive volume of traveler data generated in a short timeframe makes manual screening of suspected cases impractical for health quarantine officers. Prediction models for infection offer a promising solution to this challenge. Key predictive variables including travel history and seat numbers were extracted from passenger itineraries to construct the risk assessment model. Random forest algorithm and multivariate logistic regression were used to build prediction models of COVID-19 infection separately. Compare their performance through sensitivity(recall for the positive class), specificity, accuracy, AUC and Brier score. Sort the importance of variables through random forest algorithm. The random forest model exhibited better discriminative ability and calibration. Variable importance analysis revealed travel history-derived factors as top predictors: close contacts(0.419), flight risk (0.286), and sojourn risk (0.182). Infection prevalence stratified by risk level: flight risk: low risk vs high risk: 0.7% vs 1.4%; sojourn risk: low risk vs high risk: 0.7% vs 2.0%; close contacts vs non-close contact: 0.3% vs 2.4%. The prediction model based on random forest algorithm has a better performance in identifying infected passengers than multivariate regression model. We should pay more attention on variables extracted by epidemiological history in building prediction model of respiratory infectious diseases. This model demonstrates strong potential for effectively responding to future outbreaks of acute infectious diseases such as COVID-19.
Yu et al. (Sun,) studied this question.