高级检索
饶夫阳, 宋艳平, 吕芯芮, 白旭, 覃伟, 刘欢, 曹琦, 李子孝, 刘宝花, 姜勇. 基于机器学习模型缺血性脑卒中1年死亡预测效果评价[J]. 中国公共卫生, 2019, 35(9): 1187-1191. DOI: 10.11847/zgggws1122724
引用本文: 饶夫阳, 宋艳平, 吕芯芮, 白旭, 覃伟, 刘欢, 曹琦, 李子孝, 刘宝花, 姜勇. 基于机器学习模型缺血性脑卒中1年死亡预测效果评价[J]. 中国公共卫生, 2019, 35(9): 1187-1191. DOI: 10.11847/zgggws1122724
Fu-yang RAO, Yan-ping SONG, Xin-rui LÜ, . Prediction of mortality among ischemic stroke patients one year after hospital discharge based on machine learning model[J]. Chinese Journal of Public Health, 2019, 35(9): 1187-1191. DOI: 10.11847/zgggws1122724
Citation: Fu-yang RAO, Yan-ping SONG, Xin-rui LÜ, . Prediction of mortality among ischemic stroke patients one year after hospital discharge based on machine learning model[J]. Chinese Journal of Public Health, 2019, 35(9): 1187-1191. DOI: 10.11847/zgggws1122724

基于机器学习模型缺血性脑卒中1年死亡预测效果评价

Prediction of mortality among ischemic stroke patients one year after hospital discharge based on machine learning model

  • 摘要:
      目的  评价支持向量机(SVM)、随机森林、极端梯度提升(XGBoost)和自适应提升(Adaboost)4种机器学习模型对缺血性脑卒中出院1年后死亡结局的预测效果。
      方法  收集2007年9月 — 2008年8月中国国家卒中登记项目(CNSR)第1期数据库中12 418例缺血性卒中患者资料,通过3次不同分组重复训练模型,采用python 3.7进行SVM、随机森林、XGBoost和Adaboost 4种机器学习模型的训练及验证,并应用SAS 9.4进行logistic回归分析,通过死亡结局的F1-score、受试者工作特征曲线下面积(AUC)和准确率等指标比较各模型对缺血性脑卒中出院1年后死亡结局的预测效果。
      结果  4种机器学习模型及logistic回归模型对缺血性脑卒中出院1年后死亡结局预测效果按准确率排列从高到低依次为XGBoost(88.55 ± 0.18)%、随机森林(84.02 ± 0.53)%、Adaboost(82.58 ± 0.17)%、SVM(80.91 ± 0.28)%和logistic(77.03 ± 0.37)%,按死亡结局的F1-score排列从高到低依次为XGBoost(50.14 ± 0.43)%、随机森林(49.40 ± 1.00)%、Adaboost(48.72 ± 0.63)%、SVM(46.42 ± 0.45)%和logistic(44.81 ± 0.50)%,按AUC排列从高到低依次为随机森林(81.68 ± 0.42)%、logistic(81.39 ± 0.66)%、XGBoost(81.24 ± 0.44)%、Adaboost(81.20 ± 0.41)%和SVM(79.71 ± 0.37)%。
      结论  SVM、随机森林、XGBoost和Adaboost 4种机器学习模型预测效果均表现良好且模型稳定,在准确率和F1-score上均优于传统logistic回归模型;在AUC上SVM模型最低,其余各模型差别不大。

     

    Abstract:
      Objective  To evaluate the efficiency of four machine learning models (support vector machine SVM, random forest, extreme gradient boosting XGBoost and adaptive boosting Adaboost) in prediction of mortality among ischemic stroke (IS) patients one year after hospital discharge.
      Methods  The data on 12 418 ischemic stroke patients were extracted from the first wave of China National Stroke Registry (CNSR) between September 2007 and August 2008. Repeated grouping were performed 3 times to train and validate the models. The training and verification of four machine learning models were carried out with Python 3.7 and SAS 9.4 was used in logistic regression analysis. The predictive efficiency of each of the four models in prediction of mortality among the IS patients one year after hospital discharge were evaluated with F1-score, the area under receive operating characteristic curve (AUC) and accuracy rate.
      Results  When sorted by accuracy rate, F1-score and AUC for evaluation on the efficiency of mortality prediction of the IS patients, the ranks of the models in descending order were as following: XGBoost (88.55 ± 0.18%), random forest (84.02 ± 0.53%), AdaBoost (82.58 ± 0.17%), SVM (80.91 ± 0.28%), logistic regression (77.03 ± 0.37%); XGBoost (50.14 ± 0.43%), random forest (49.40 ± 1.00%), AdaBoost (48.72 ± 0.63%), SVM (46.42 ± 0.45%), logistic regression (44.81 ± 0.50%); and random forest (81.68 ± 0.42%), logistic regression (81.39 ± 0.66%), XGBoost (81.24 ± 0.44%), AdaBoost (81.20 ± 0.41%), SVM (79.71 ± 0.37%), respectively.
      Conclusion  The efficiency of SVM, random forest, XGBoost, and Adaboost are all good in prediction of mortality of IS patients one year after hospital discharge and the four models are stable; the four models are superior to logistic regression in terms of accuracy and F1-score, while in terms of AUC, the SVM performs the worst, and the performances of the other models are similar.

     

/

返回文章
返回