高级检索
石福艳, 马洁, 黄璐, 许小珊, 孙娜, 孟维静, 王素珍, 杨丽平. EMB多重填补法在横断面健康体检资料定量变量填补中应用[J]. 中国公共卫生, 2019, 35(11): 1536-1539. DOI: 10.11847/zgggws1119777
引用本文: 石福艳, 马洁, 黄璐, 许小珊, 孙娜, 孟维静, 王素珍, 杨丽平. EMB多重填补法在横断面健康体检资料定量变量填补中应用[J]. 中国公共卫生, 2019, 35(11): 1536-1539. DOI: 10.11847/zgggws1119777
Fu-yan SHI, Jie MA, Lu HUANG, . Application of expectation maximization with bootstrapping in multiple imputation of quantitative variables for cross-sectional health examination data[J]. Chinese Journal of Public Health, 2019, 35(11): 1536-1539. DOI: 10.11847/zgggws1119777
Citation: Fu-yan SHI, Jie MA, Lu HUANG, . Application of expectation maximization with bootstrapping in multiple imputation of quantitative variables for cross-sectional health examination data[J]. Chinese Journal of Public Health, 2019, 35(11): 1536-1539. DOI: 10.11847/zgggws1119777

EMB多重填补法在横断面健康体检资料定量变量填补中应用

Application of expectation maximization with bootstrapping in multiple imputation of quantitative variables for cross-sectional health examination data

  • 摘要:
      目的  研究基于bootstrap抽样的期望最大化算法(EMB)的多重填补方法在横断面健康体检定量变量缺失数据的填补效果,为健康体检数据选择恰当的多重填补方法提供相关依据。
      方法  基于人群横断面健康体检实测数据,采用EMB法多重填补法,应用R 3.5.0统计软件中的Amelia II程序包对2013年1 — 12 月在陕西省西安市西京医院健康体检中心进行常规体检的1 634名员工的健康体检数据进行多重填补分析。
      结果  对于横断面定量健康体检资料,在单变量缺失率分别为 < 10 %、20 %和 70 % 3种随机缺失情况下,EMB多重填补法相对于列表删除法其估计误差均降低;基于相同数据,EMB多重填补次数不同,资料的填补效果不同,本研究资料较为合适的填补次数为m = 10次;填补前后概率密度曲线分布图显示,填补次数m = 10时多重填补值与实际观察值的概率密度曲线图吻合程度较好;变量过拟合诊断图进一步显示,填补次数m = 10时各变量大多数观测值的90 % CI包含了其最佳拟合线,且其可信区间较窄;基于列表删除法和EMB多重填补法处理后的2个不同分析数据集分别构建的多因素回归模型中包含的变量不同。
      结论  对于不同缺失率随机缺失的定量变量,EMB多重填补法的填补效果均优于列表删除法;不同缺失资料的最优填补次数不同。

     

    Abstract:
      Objective  To evaluate the effect of expectation maximization with bootstrapping (EMB) in multiple imputation of quantitative variables for cross-sectional health examination data and to provide evidences for choosing appropriate multiple imputation method for health examination data.
      Methods  We collected data on 1 634 people taking routine physical examination at Xijing Hospital Health Checkup Center in Xi′an, Shaanxi province from January to December 2013. The data were analyzed with Amelia II package in R 3.5.0 statistical software and EMB multiple imputation method was used to fill missing values in the data set.
      Results  The estimated errors of the multiple imputations with EMB were decreased compared to those with list deletion method for the data set with the missing rate of less than 10%, 20%, or 70% for univariate quantitative variables. The effect of the EMB multiple imputation differed by the time of the imputation process and the appropriate imputation time for the used data set was 10. The probability density distribution curves for the data set before and after the imputation demonstrated that the imputed values were in a good agreement with the observed values when 10 imputations completed; the over-fitting diagnostic plot further revealed that the majority of the 90% confidence intervals for most observations of each variable contained the best fit line, with the narrow ranges for the confidence intervals. Different variables were included in the multivariate logistic regression models constructed for the same data set processed with multiple imputation with list deletion and the EMB method.
      Conclusion  For quantitative variables with different random missing rate, the effect of EMB based multiple imputation is better than that of list deletion method and the optimal imputation times vary for data sets with different missing profile.

     

/

返回文章
返回