Objective To develop an intelligent question-answering model based on retrieval-augmented generation (RAG) to assist public health professionals in efficiently and accurately extracting key information from technical documents such as regulations, guidelines, manuals, and literature, thereby improving knowledge acquisition efficiency and standardized application capabilities.
Methods On the basis of RAG, this study integrates an automated text preprocessing mechanism and employs the LangChain framework to construct a context-aware unified retrieval-augmented question-answering model powered by large language models (CURA-LLM). This model combines a threshold-managed information retrieval module with a response generation module supported by historical dialogue records.
Results CURA-LLM demonstrated outstanding performance on three representative public health datasets related to operational guidelines, testing procedures, and legal regulations, with the F1 scores of 0.941, 0.891, and 0.947, and the cosine similarities of 0.968, 0.929, and 0.963, respectively. Moreover, CURA-LLM outperformed three existing domain-specific question-answering models in terms of retrieval relevance and answer accuracy.
Conclusions CURA-LLM provides effective support for the intelligent interpretation and rapid extraction of knowledge from public health technical documents. It can assist personnel in quickly responding and standardizing execution in scenarios such as emergency responses, routine management, and professional training, offering strong potential for practical application.