高级检索
李乐, 周子豪, 吴群红, 孟祥伟, 齐新业, 王星, 邵瑛琦, 李晨希. 流感数据与特定关键词相关性分析及预测[J]. 中国公共卫生, 2021, 37(12): 1813-1818. DOI: 10.11847/zgggws1132684
引用本文: 李乐, 周子豪, 吴群红, 孟祥伟, 齐新业, 王星, 邵瑛琦, 李晨希. 流感数据与特定关键词相关性分析及预测[J]. 中国公共卫生, 2021, 37(12): 1813-1818. DOI: 10.11847/zgggws1132684
LI Le, ZHOU Zi-hao, WU Qun-hong, . Association of specific keywords index in Baidu website with influenza monitoring data during 2012 – 2020 and potential use of the index for influenza epidemic prediction[J]. Chinese Journal of Public Health, 2021, 37(12): 1813-1818. DOI: 10.11847/zgggws1132684
Citation: LI Le, ZHOU Zi-hao, WU Qun-hong, . Association of specific keywords index in Baidu website with influenza monitoring data during 2012 – 2020 and potential use of the index for influenza epidemic prediction[J]. Chinese Journal of Public Health, 2021, 37(12): 1813-1818. DOI: 10.11847/zgggws1132684

流感数据与特定关键词相关性分析及预测

Association of specific keywords index in Baidu website with influenza monitoring data during 2012 – 2020 and potential use of the index for influenza epidemic prediction

  • 摘要:
      目的  分析我国流感相关关键词的百度指数与流感相关数据的相关性的变化趋势,探究以不同关键词构建流感预测模型的预测效果。
      方法  从全球流感监测与应对网络(GISRS)收集2012年第1周 — 2020年12周共429周的每周确诊的流感阳性病毒数据;用筛选的到的关键词从百度指数数据库(http://index.Baidu.com/)收集全国2012年第1周 — 2020年12周各关键词的每日百度指数。以我国流感爆发规模明显改变的时间(2017年)为节点,应用SPSS 22.0软件计算关键词百度指数在节点前后与流感数据的相关系数,并应用Eviews 8软件来构建相应关键词与流感数据的多元线性回归模型。
      结果  2017年以前关键词百度指数与周流感阳性病例数的相关系数 > 0.5的有18个,2017年以后有30个,其中28个关键词是2017年后相关性要高;2017年前相关性最高的前4位关键词是甲流是什么、流行性感冒、甲型流感、发烧,2017年后相关系数最高的为甲流的症状、流感症状、流感吃什么药、泰诺;在新冠疫情的背景下,自变量含有“高烧”非特异性关键词的回归模型预测结果偏高,替换为特异性关键词后预测偏差降低。
      结论  我国基于网络大数据监测的流感相关关键词范围不断扩大,相关性也有所提高;公众通过网络进行流感信息获取逐渐从流感的概念向流感的症状和治疗方面转移;在通过关键词监测流感时应及时更新选词同时选择特异性更高的词作为监测对象。

     

    Abstract:
      Objective  The analyze change trend in the correlation between specific keywords index in Baidu website (keyword Baidu index) and influenza monitoring data in China and to construct a Baidu index-based prediction model for influenza epidemics.
      Methods   The data on weekly number of influenza virus-positive cases in China from the first week of 2012 through the 12th week of 2020 were collected from the Global Influenza Surveillance and Response Network (GISRS). Using influenza-related keywords of four domains (disease name, prevention, treatment, and symptom) screened out with literature studies, the daily Baidu indexes of those keywords during the same period were extracted from the Baidu index database (http://index.Baidu.com/). Using SPSS 22.0 software, the coefficients for the correlation between the keyword Baidu index and the number of influenza virus-positive cases were calculated for the two periods separated by a time node of 2017, when the scale of influenza epidemics in China changed significantly, and multivariate linear regression models for the correlation between the two variables were also constructed with Eviews 8 software.
      Results   Totally 70 keywords were screened out. Before 2017, there were 18 keywords with the coefficients of greater than 0.5 for the correlation between keyword Baidu index and the weekly number of virus-positive influenza cases; while there were 30 such keywords after 2017, among which, 28 keywords were with the coefficients of much greater than 0.5. The top four keywords with the greatest coefficients were swine flu, influenza, influenza A, and fever before 2017; but after 2017, the top four keywords were swine flu symptom, influenza symptom, medicine for the treatment of influenza, and Tylenol (product name for paracetamol). The constructed regression model with the independent variables including a non-specific keyword of ‘high fever’ was of a better prediction outcome, and the prediction deviation of the model was reduced when the specific keyword was replaced.
      Conclusion   In China, the scope of influenza-related keywords has been extending in network media based on big data monitoring and the correlation between the keywords with influenza epidemic has also been increased. The public tend to acquire more information on symptoms and treatment of influenza than on general knowledge about the infectious disease. The results suggest that the selected keywords should be updated timely and more specific keywords should be adopted in influenza epidemic surveillance with big data from network media.

     

/

返回文章
返回