Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

BACKGROUND: Large language models (LLMs) have potential to provide clinical infection advice, but variations in prevalent pathogens and antimicrobial resistance requires models to be adapted to local contexts. We evaluated a retrieval-augmented generation (RAG) approach to provide antibiotic and infection advice explicitly constrained to local guidelines. METHODS: Relevant guideline sections from Oxford University Hospitals were identified combining keyword-matching and a medical embedding model. A locally-deployed LLM (gpt-oss-20b) generated answers using the retrieved context. Performance was assessed using 200 simulated questions with an LLM-as-judge, and 66 human-written questions reviewed by ≥2 infection specialists. RESULTS: The model attempted to answer 186/200 (93%) simulated clinical advice queries, of which 162 (87%) responses were judged fully-correct, 14 (8%) partially-correct, and 10 (5%) incorrect. Performance was lower in complex scenarios, e.g., when renal impairment was present. For 57 human-written questions covered by guidelines, 46 (81%) single-stage responses were fully-correct and 10 (18%) partially-correct. Of 9 out-of-scope questions, 5 (56%) were correctly identified. A multi-stage pipeline modestly improved performance (84% fully-correct). Median answer generation time was 12 s (single-stage) and 15 s (multi-stage). LLMs without RAG-based local guideline context had lower performance: 21/186 (11%) answers to simulated questions fully correct with the same locally-deployed LLM and 92/200 (46%) with a current frontier model (gpt-5.4). CONCLUSION: An LLM grounded in local antimicrobial guidelines can deliver mostly accurate, concise infection advice but still generates occasional errors and does not always recognise out-of-scope queries. Further optimisation and safety mechanisms are required before routine clinical deployment.

More information Original publication

DOI

10.1016/j.jinf.2026.106789

Type

Journal article

Publication Date

2026-06-09T00:00:00+00:00

Volume

93

Keywords

Antibiotic advice, Antibiotic guidelines, Artificial intelligence, Chatbot