Abstract:
Objective To construct a cross-regional transmission risk assessment model for respiratory infectious diseases, providing a theoretical basis, practical experience, and technical support for evaluating the risk of cross-regional transmission.
Methods Data related to the COVID-19 epidemic in China from January 2020 to December 2022, officially released by the National Health Commission, Baidu Migration, and provincial statistical bureaus, were collected. Nine epidemic events were selected: the 2020 Wuhan, Hubei province epidemic; the 2021 Shijiazhuang, Hebei province epidemic; the 2021 Xi'an, Shaanxi province epidemic; the 2022 Shanghai epidemic; the 2022 Beihai, Guangxi Zhuang Autonomous Region epidemic; the 2022 Sanya, Hainan province epidemic; the 2022 Hohhot, Inner Mongolia Autonomous Region epidemic; the 2022 Zhengzhou, Henan province epidemic; and the 2022 Shijiazhuang, Hebei province epidemic. Principal component analysis (PCA) was used to construct a comprehensive input risk index formula for urban epidemics to quantitatively assess the comprehensive cross-regional epidemic input risk. The principal components with higher contribution rates were used as feature vectors for a support vector machine (SVM) model, with the actual intercity epidemic transmission status as the target variable. The SVM was used for model construction and prediction, and the Shapley Additive exPlanations (SHAP) method was used for model interpretability analysis.
Results Five factors were extracted by PCA: local socioeconomic activity factor (F1), epidemic source socioeconomic activity factor (F2), urban policy implementation factor (F3), urban population inflow and geographical distance-related factor (F4), and urban epidemic transmission index factor (F5), with a cumulative variance contribution rate of 82.32%. Based on the principal component score coefficients and variance contribution rates, the formula for the comprehensive input risk index score was: F=0.2178×F1+0.1841×F2+0.1556×F3+0.1419×F4+0.1238×F5. The SVM model achieved a prediction accuracy of 84.88%, precision of 75.00%, recall of 57.14%, and an F1-score of 64.86%. SHAP analysis showed that the SHAP values of F1, F2, F3, F4, and F5 were 0.15, 0.09, 0.08, 0.08, and 0.12, respectively, contributing 29.23%, 16.72%, 15.26%, 16.09%, and 22.70% to the assessment of transmission risk.
Conclusions The cross-regional transmission risk assessment model for respiratory infectious diseases constructed in this study, based on COVID-19 epidemic data, demonstrates good applicability and accuracy in quantifying the risk of cross-regional importation and predicting transmission, providing theoretical support and practical reference for the scientific formulation of respiratory infectious disease prevention and control strategies.