Advanced Search
LI Ang, WEN Qi, GU Xing-bo.et al, . Study on method of missing data imputation for SNPs test[J]. Chinese Journal of Public Health, 2014, 30(12): 1576-1582. DOI: 10.11847/zgggws2014-30-12-26
Citation: LI Ang, WEN Qi, GU Xing-bo.et al, . Study on method of missing data imputation for SNPs test[J]. Chinese Journal of Public Health, 2014, 30(12): 1576-1582. DOI: 10.11847/zgggws2014-30-12-26

Study on method of missing data imputation for SNPs test

  • Objective To study the effect and influencing factors of missing data imputation of single nucleotide polymorphisms(SNPs) test and to provide a scientific basis for the use of SNPs data in gene and disease association studies.Methods Human genome from International HapMap Project was used as raw data and Haploview software was used for tag SNP screening.HAPGEN2 software was adopted to simulate SNP reference data and the research data with simulated missing data.Then the research data were imputed with IMPUTE2 software based on reference data and the error rates of the imputations at different conditions(four levels of the missing ratio and the sample size of reference data) were compared.Results The imputation error rate was positively associated with the proportion of missing data and inversely with the sample size of reference data,with the error rates of 7.01%,5.92%,5.67%,and 5.26% corresponding to the reference data sample sizes of 50,100,150,and 200 repectively.The error rate of random site imputation(5.64%) was lower than that of tag SNP imputation(9.10%) when there was a large missing proportion(r2=0.825) and on the other hand using tag SNP imputation could fill the data at a lower error rate(4.96%) when there was a small missing proportion(r2=0.9).The imputation results showed that IMPUTE2 software resulted in low error rates(3%-13%) at different situations.Conclusion The proportion of missing data,reference data sample size,and different missing patterns have influences on imputation error rate.Selecting a subset of aim gene and then imputing the data is a good strategy in analyses.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return