Objective To evaluate the efficiency of four machine learning models (support vector machine SVM, random forest, extreme gradient boosting XGBoost and adaptive boosting Adaboost) in prediction of mortality among ischemic stroke (IS) patients one year after hospital discharge.
Methods The data on 12 418 ischemic stroke patients were extracted from the first wave of China National Stroke Registry (CNSR) between September 2007 and August 2008. Repeated grouping were performed 3 times to train and validate the models. The training and verification of four machine learning models were carried out with Python 3.7 and SAS 9.4 was used in logistic regression analysis. The predictive efficiency of each of the four models in prediction of mortality among the IS patients one year after hospital discharge were evaluated with F1-score, the area under receive operating characteristic curve (AUC) and accuracy rate.
Results When sorted by accuracy rate, F1-score and AUC for evaluation on the efficiency of mortality prediction of the IS patients, the ranks of the models in descending order were as following: XGBoost (88.55 ± 0.18%), random forest (84.02 ± 0.53%), AdaBoost (82.58 ± 0.17%), SVM (80.91 ± 0.28%), logistic regression (77.03 ± 0.37%); XGBoost (50.14 ± 0.43%), random forest (49.40 ± 1.00%), AdaBoost (48.72 ± 0.63%), SVM (46.42 ± 0.45%), logistic regression (44.81 ± 0.50%); and random forest (81.68 ± 0.42%), logistic regression (81.39 ± 0.66%), XGBoost (81.24 ± 0.44%), AdaBoost (81.20 ± 0.41%), SVM (79.71 ± 0.37%), respectively.
Conclusion The efficiency of SVM, random forest, XGBoost, and Adaboost are all good in prediction of mortality of IS patients one year after hospital discharge and the four models are stable; the four models are superior to logistic regression in terms of accuracy and F1-score, while in terms of AUC, the SVM performs the worst, and the performances of the other models are similar.