Objective To compare the accuracy of multiple large language models (LLM) and manual judgment in predicting infectious disease trends.
Methods Using monthly report data of 17 infectious diseases in Guangzhou from January to December 2024, we used five LLM and expert judgments from public health physicians for prediction and compared the predicted results with actual trends.
Results A total of 99 predictions were conducted for 17 infectious diseases. The accuracy rate of prediction by public health physicians was 84.85%, and that of the AI-integrated prediction was 73.74%. The prediction accuracy of AI products followed the trend of Kimi (70.71%), Doubao (68.69%), Lingxi (68.69%, parallel), ERNIE Bot (65.66%), and Deep Seek (63.64%). On the basis of 99 referred data items related to dengue fever, norovirus infection, hand-foot-mouth disease, pertussis, and influenza, in the identification of epidemic inflection points, public health physicians showed the accuracy of 50.0%, followed by Kimi (30.0%), AI integration (20.0%), and Doubao (20.0%, parallel); public health physicians showed the F1 score of 0.56, followed by Kimi (0.40), Doubao (0.38), and AI integration (0.27).
Conclusions Currently, public health physicians have higher accuracy in predicting infectious disease trends than AI, especially in identifying the impacts of complex social factors.