Advanced Search
Fu-yan SHI, Jie MA, Lu HUANG, . Application of expectation maximization with bootstrapping in multiple imputation of quantitative variables for cross-sectional health examination data[J]. Chinese Journal of Public Health, 2019, 35(11): 1536-1539. DOI: 10.11847/zgggws1119777
Citation: Fu-yan SHI, Jie MA, Lu HUANG, . Application of expectation maximization with bootstrapping in multiple imputation of quantitative variables for cross-sectional health examination data[J]. Chinese Journal of Public Health, 2019, 35(11): 1536-1539. DOI: 10.11847/zgggws1119777

Application of expectation maximization with bootstrapping in multiple imputation of quantitative variables for cross-sectional health examination data

  •   Objective  To evaluate the effect of expectation maximization with bootstrapping (EMB) in multiple imputation of quantitative variables for cross-sectional health examination data and to provide evidences for choosing appropriate multiple imputation method for health examination data.
      Methods  We collected data on 1 634 people taking routine physical examination at Xijing Hospital Health Checkup Center in Xi′an, Shaanxi province from January to December 2013. The data were analyzed with Amelia II package in R 3.5.0 statistical software and EMB multiple imputation method was used to fill missing values in the data set.
      Results  The estimated errors of the multiple imputations with EMB were decreased compared to those with list deletion method for the data set with the missing rate of less than 10%, 20%, or 70% for univariate quantitative variables. The effect of the EMB multiple imputation differed by the time of the imputation process and the appropriate imputation time for the used data set was 10. The probability density distribution curves for the data set before and after the imputation demonstrated that the imputed values were in a good agreement with the observed values when 10 imputations completed; the over-fitting diagnostic plot further revealed that the majority of the 90% confidence intervals for most observations of each variable contained the best fit line, with the narrow ranges for the confidence intervals. Different variables were included in the multivariate logistic regression models constructed for the same data set processed with multiple imputation with list deletion and the EMB method.
      Conclusion  For quantitative variables with different random missing rate, the effect of EMB based multiple imputation is better than that of list deletion method and the optimal imputation times vary for data sets with different missing profile.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return