Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.


Supervisors: Sarah Parish, Matthew Arnold and William Whiteley

Studies to date have failed to identify pathways for prevention of dementia and there is an urgent need to bring larger scale evidence to bear.  Electronic health records (EHR) provide a source of outcomes which avoids the problem of poor response rates to surveys among those with cognitive impairment. However, use of EHR requires careful validation.1 Hospital episode statistics (HES) data is more widely available than primary care data, but there is likely to be a delay in disease manifesting in hospitalisation. The UK biobank 0.5M participant prospective population study2 is already linked to HES data and will release region-based primary care records for 40% of the population during 2018.

The aim of the project will be to consolidate and compare HES and primary care data (including coded diagnoses and prescriptions) in UK Biobank for four outcomes affecting cognitive status: dementia, stroke, transient ischaemic attack and atrial fibrillation. It will assess concordance and the delay in diagnosis between the two sources and factors influencing the delay and investigate the potential bias caused by such delay and how to counteract it.  The approach will then be applied to look at associations of key derived markers from MRI brain imaging (such as brain volumes and white matter damage) with the consolidated outcomes.

The findings will be a valuable guide to analysis for a wide range of large-scale studies at NDPH and elsewhere in the UK that are linked to HES but not to primary care data. Automated derived brain biomarkers that are pre-clinical indicators of disease may be useful surrogate endpoints for trials of dementia prevention because they could allow benefits of interventions to be detected more quickly, aiding feasibility of such trials.

1. Brown et al. Emerg Themes Epidemiol (2016) 13:11

2. UK Biobank‎

research experience, research methods and training

Learning from working within a multi-disciplinary team including statisticians, statistical programmers, a neuroscientist and neurologist with experience of analysis of large-scale data. Developing planning and design skills for future research.

field work, secondments, industry placements and training 

Further training in statistical programming through a range of courses run by NDPH, Oxford University and the SAS Institute. Opportunities to present research findings at relevant meetings.  

prospective candidate

The project involves data and statistical analysis of big data to improve population health and, therefore, requires previous data analysis or statistical programming training/experience (e.g. in SQL, R, SAS) with an interest in developing these skills further. Examples of suitable prior qualifications are an MSc in Data Science or Medical/Applied Statistics.


  • Sarah Parish
    Sarah Parish

    Emeritus Professor of Medical Statistics and Epidemiology