Analysis of big electronic health record data on dementia and cerebrovascular disease to investigate associations with brain atrophy
Supervisors: Sarah Parish, Matthew Arnold and William Whiteley
Studies to date have failed to identify pathways for prevention of dementia and there is an urgent need to bring larger scale evidence to bear. Electronic health records (EHR) provide a source of outcomes which avoids the problem of poor response rates to surveys among those with cognitive impairment. However, use of EHR requires careful validation.1 Hospital episode statistics (HES) data is more widely available than primary care data, but there is likely to be a delay in disease manifesting in hospitalisation. The UK biobank 0.5M participant prospective population study2 is already linked to HES data and will release region-based primary care records for 40% of the population during 2018.
The aim of the project will be to consolidate and compare HES and primary care data (including coded diagnoses and prescriptions) in UK Biobank for four outcomes affecting cognitive status: dementia, stroke, transient ischaemic attack and atrial fibrillation. It will assess concordance and the delay in diagnosis between the two sources and factors influencing the delay and investigate the potential bias caused by such delay and how to counteract it. The approach will then be applied to look at associations of key derived markers from MRI brain imaging (such as brain volumes and white matter damage) with the consolidated outcomes.
The findings will be a valuable guide to analysis for a wide range of large-scale studies at NDPH and elsewhere in the UK that are linked to HES but not to primary care data. Automated derived brain biomarkers that are pre-clinical indicators of disease may be useful surrogate endpoints for trials of dementia prevention because they could allow benefits of interventions to be detected more quickly, aiding feasibility of such trials.
1. Brown et al. Emerg Themes Epidemiol (2016) 13:11
2. UK Biobank http://www.ukbiobank.ac.uk/
research experience, research methods and training
Learning from working within a multi-disciplinary team including statisticians, statistical programmers, a neuroscientist and neurologist with experience of analysis of large-scale data. Developing planning and design skills for future research.
field work, secondments, industry placements and training
Further training in statistical programming through a range of courses run by NDPH, Oxford University and the SAS Institute. Opportunities to present research findings at relevant meetings.
The project involves data and statistical analysis of big data to improve population health and, therefore, requires previous data analysis or statistical programming training/experience (e.g. in SQL, R, SAS) with an interest in developing these skills further. Examples of suitable prior qualifications are an MSc in Data Science or Medical/Applied Statistics.