BACKGROUND: Large language models (LLMs) have potential to provide clinical infection advice, but variations in prevalent pathogens and antimicrobial resistance requires models to be adapted to local contexts. We evaluated a retrieval-augmented generation (RAG) approach to provide antibiotic and infection advice explicitly constrained to local guidelines. METHODS: Relevant guideline sections from Oxford University Hospitals were identified combining keyword-matching and a medical embedding model. A locally-deployed LLM (gpt-oss-20b) generated answers using the retrieved context. Performance was assessed using 200 simulated questions with an LLM-as-judge, and 66 human-written questions reviewed by ≥2 infection specialists. RESULTS: The model attempted to answer 186/200 (93%) simulated clinical advice queries, of which 162 (87%) responses were judged fully-correct, 14 (8%) partially-correct, and 10 (5%) incorrect. Performance was lower in complex scenarios, e.g., when renal impairment was present. For 57 human-written questions covered by guidelines, 46 (81%) single-stage responses were fully-correct and 10 (18%) partially-correct. Of 9 out-of-scope questions, 5 (56%) were correctly identified. A multi-stage pipeline modestly improved performance (84% fully-correct). Median answer generation time was 12 s (single-stage) and 15 s (multi-stage). LLMs without RAG-based local guideline context had lower performance: 21/186 (11%) answers to simulated questions fully correct with the same locally-deployed LLM and 92/200 (46%) with a current frontier model (gpt-5.4). CONCLUSION: An LLM grounded in local antimicrobial guidelines can deliver mostly accurate, concise infection advice but still generates occasional errors and does not always recognise out-of-scope queries. Further optimisation and safety mechanisms are required before routine clinical deployment.
Journal article
2026-06-09T00:00:00+00:00
93
Antibiotic advice, Antibiotic guidelines, Artificial intelligence, Chatbot