Objective To develop an R program for processing complex work unit entries in the Notifiable Disease Surveillance and Reporting System, enabling active searching of school-based cluster epidemics.
Methods Taking the data of acute hemorrhagic conjunctivitis, pertussis, and other infectious diarrhea diseases in Guangdong province, 2024 as an example, we designed an active searching algorithm in R 4.1.0. The program standardizes unstructured school names via text mining algorithms, counts cases by normalized school name, actively search for school-based cluster epidemics with the fixed threshold method, and compare them with the epidemics reported in the Public Health Emergency Management Information System (PHEMIS).
Results When being aligned with PHEMIS reporting thresholds, the R program showed the positive predictive value of 100.00%. Among the reported school-based cluster epidemics, 67.65% (23/34) were detected by the R program. The R program identified 60.87% (14/23) cluster epidemics later than PHEMIS but achieved synchronized reporting of 39.13% (9/23) cluster epidemics. Moreover, the R program revealed 11 clusters meeting PHEMIS criteria but not reported. When the thresholds were lowered, more potential cluster epidemics were detected, and 2 signals were earlier than PHEMIS alerts.
Conclusions This active searching framework can serve as a powerful supplement to PHEMIS, demonstrating operational value for early detection and containment of school-based cluster epidemics.