Abstract: Background Multicentre training may reduce biases in medical artificial intelligence, however, ethical, legal and technical considerations constrain hospitals’ ability to share data. Federated learning (FL) allows institutions to participate in algorithm development while retaining custody of their data, however uptake in hospitals has been limited as deployment requires specialist expertise at each site. Previously, we have developed an AI-driven screening test for COVID-19 in emergency departments using clinical data routinely available within 1h of arrival (vital signs & blood tests; CURIAL-Lab). Here, we aimed to federate our COVID-19 screening test, developing a novel and easy-to-use embedded system (‘full stack FL’) to train and evaluate a model across 4 UK hospital groups without centralising patient data. Methods We supplied a Raspberry Pi 4 Model B preloaded with our FL pipeline to 4 NHS hospital groups or their locally-linked research university (Oxford University Hospitals/University of Oxford (OUH), University Hospitals Birmingham/University of Birmingham (UHB), Bedfordshire Hospitals (BH) and Portsmouth Hospitals University (PUH) NHS Trusts). OUH, PUH and UHB participated in federated training and calibration, training a deep neural network (DNN) and logistic regressor to predict COVID-19 status using clinical data for pre-pandemic (COVID-19-negative) admissions and COVID-19-positive cases from the first wave. We performed federated prospective evaluation at PUH & OUH, and federated external evaluation at BH, evaluating the resultant global and site-tuned models for admissions during the second wave. The primary outcome was overall model performance measured as AUROC. Removable microSD storage was destroyed on study completion. Findings Routinely collected clinical data from a total 130,941 patients (1,772 COVID-19 positive) across three hospital groups were included in federated training. OUH, PUH and BH participated in evaluation, including a total 32,986 patient admissions (3,549 positive) during the second wave. Federated training improved DNN performance by a mean of 27.6% (SD 2.20) in terms of AUROC when compared to models trained locally, from AUROC of 0.574 & 0.622 at OUH & PUH to 0.872 & 0.876 for the federated global model. Performance improvement was more modest for a logistic regressor with a mean AUROC increase of 13.9% (SD 0.5%). During federated external evaluation at BH, the global DNN model achieved an AUROC of 0.917 (0.893-0.942), with 89.7% sensitivity (83.6-93.6) and 76.7% specificity (73.9-79.1). Site-specific tuning of the global model did not significantly improve performance (AUROC change <0.01). Interpretations We present development of a COVID-19 screening test across four UK hospital groups, without centralising patient data, using our novel full-stack FL platform alongside micro-computing hardware. Federation improved model performance and generalisability. An easy-to-use embedded system can allow hospitals to contribute to AI development without specialist technical expertise. Funding University of Oxford Medical & Life Sciences Translational Fund/Wellcome
Journal article
The Lancet. Digital Health
Elsevier Ltd
31/10/2023