Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.


Information on clinical diagnoses and outcomes derived from electronic health records (EHRs) is of increasing relevance for both clinicians and researchers. These records represent a rich source of clinical information, collected at minimal cost, in large numbers of people and with potential for linkage to other data sources. For example, large-scale cohort studies have traditionally relied on linkage to disease registries to identify outcomes, but there is now the opportunity to link such studies to EHRs. The wealth of clinical data in EHRs (such as outpatient letters, discharge summaries, laboratory results, and imaging) would greatly enhance the value of these studies for understanding the causes of vascular and other diseases.  However, there are major methodological challenges to analysing EHR datasets. This project will use the Oxford Research Data Warehouse (a large dataset of electronic health records from secondary care in Oxfordshire) to develop and validate high-resolution phenotyping algorithms for selected vascular diseases, such as heart failure or atrial fibrillation. It will then apply these algorithms in analyses on a subset of participants from UK Biobank (a prospective study of 0.5 million participants) with linked EHRs. The specific DPhil project will be subject to further discussion and personal interest.


This project will involve detailed analysis and interpretation of existing data. The student will work within a multi-disciplinary team and will gain research experience in literature review, epidemiological and statistical methodology, programming and data analysis (including machine learning techniques). Regular research meetings and workshops will be held which the candidate will be expected to attend and to present research findings.


The project will provide a range of training opportunities in statistical analysis and interpretation and statistical programming. By the end of the DPhil, it is expected that you will be competent to plan, undertake and interpret statistical analysis of large-scale epidemiological data, and to report your findings. The project will be based at the Big Data Institute and the Clinical Trial Service Unit, Nuffield Department of Population Health, which have excellent facilities and a world-class community of statistical and clinical scientists. There will also be opportunities to participate in Health Data Research UK (HDR UK) research meetings and training activities.


Candidates should have a strong background in a mathematical or biomedical discipline and postgraduate training in computer science, statistics or epidemiology (or be willing to do the MSc in Global Health Science at Oxford in preparation for such a project). The project will involve large-scale data and statistical analyses. Candidates should therefore have an interest and aptitude in extending these skills.