An accurate forecasting of the COVID-19 pandemic’s trajectory is crucial for informing effective prevention and control strategies. The aim of this study was to develop and compare COVID-19 forecasting models (SEIR, ARIMAX, and LSTNet) incorporating multi-symptom Google Trends signals, and to evaluate whether PCA-derived components improve out-of-sample prediction of daily confirmed cases. We conducted a correlation analysis between the confirmed cases numbers from Johns Hopkins University’s COVID-19 database and the symptomatology data from Google Trends. SEIR, ARIMAX and LSTNet model were established respectively based on this data and compared their predictive performance. Symptom-related Google Trends series were strongly correlated (e.g. taste loss vs smell loss r = 0.96, 95% compatibility interval (CI) 0.952, 0.967, p = 1.59 × 10−26; weakest pair: shortness of breath vs cough r = 0.49, CI: 0.414, 0.559, p = 4.12 × 10−27). During the Delta-predominant period, a 1-unit increase in taste-loss Google Trends was associated with 2335.57 additional daily cases CI 1738.47, 2932.67, p = 2.4 × 10−13); during the Omicron-predominant period, the estimate was 3361.13 (2649.43, 4072.83, p = 6.06 × 10−19).It was suggested that the correlation was always strong in the epidemic periods of different SARS-COV-2 variants. Furthermore, Principal Component Analysis (PCA) was conducted, results showed the cumulative contribution rate reached at 93.04%. The Google Trends data after PCA (PCAGT) was introduced into different prediction models to improve the performance, and the results showed the SEIR model could predict the number of daily newly confirmed cases in the next 7 days with a Mean Absolute Error (MAE) of 5.30 × 103 and an RMSE of 6.60 × 103. Comparing the error of the predicted value and the actual value within 5 weeks, after incorporating PCAGT, the error of the ARIMAX model drops from 16.5% to 0.2%, and the error of the LSTNet model drops from 11.3% to 10.1%, and the improvement effect of ARIMAX was better when PCAGT was included. The ARIMAX model incorporating PCAGT had the best prediction performance with a prediction error of only 0.2% within 5 weeks. Symptom-related Google Trends were robustly associated with COVID-19 case trends across pandemic periods, and PCA-derived aggregated signals improved short-term forecasting performance, with ARIMAX+PCAGT performing best. Not applicable.
Ma et al. (Thu,) studied this question.