Objective To explore the application value of regional health information platform data in reducing garbage code rates and improving data quality at the source.
Methods We analyzed 8 717 death records of registered residents in Putuo district, Shanghai from March 1, 2023 to February 29, 2024. Using the GBD-2019 ICD-10, we extracted cases with the underlying cause of death being classified as a garbage code and examined their baseline characteristics. Using the regional health information platform, we supplemented medical records for cases with garbage-coded underlying causes of death. Two trained mortality coders then revised the underlying causes to evaluate data quality improvement.
Results The garbage code rates were higher for underlying causes of deaths among individuals < 60 years (14.14%), cases documented in resident death inference certificates (12.74%), and deaths occurring at locations categorized as dead on arrival (23.81%), at home (13.43%), and others (13.04%). On the basis of the regional health information platform, the successful matching rate for medical records with garbage-coded underlying causes was 55.32%, and the effective correction rate for underlying causes reached 50.00%. After correction, the overall garbage code rate decreased from 7.01% to 5.28% (χ2 = 22.683, P < 0.001). Notably, significant reductions were observed in the garbage code rates for individuals ≥ 60 years, cases recorded in both official death certificates and death inference certificates, and deaths occurring in hospital wards, at home, and in nursing homes (all P < 0.05). Regarding garbage code classification, prior to correction, the coding rates for Level I (severe impact) to Level Ⅳ (limited impact) were 1.96%, 2.90%, 1.28%, and 0.86%, respectively (χ2 = 336.015, P < 0.001), with Level Ⅱ (major impact) codes predominating at 41.41% of cases. Post-correction, the coding rates for Level I to Level Ⅲ reduced (all P < 0.05), with 26.48% of Level Ⅱ codes and 29.46% of Level Ⅲ codes reclassified as non-garbage codes. High-frequency Level Ⅱ codes such as hypertension and hypertensive encephalopathy, as well as Level Ⅲ codes including other diseases of the lung and other specified respiratory conditions exhibited decreases (χ2 = 15.599, P < 0.001; χ2 = 4.838, P = 0.028).
Conclusions The regional health information platform effectively targets major mortality reporting sources and significantly reduces garbage code rates, thus substantially improving the overall data quality in mortality statistics.