Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.


Very large datasets of anonymised electronic health records are increasingly used to construct observational studies with many millions of patients. The size and detailed information on clinical events means such studies hold much potential for understanding the causes of vascular and other chronic diseases. However, there are major methodological challenges to analysing these datasets. One key issue is the sheer size of the available data, with large amounts of data collected on each patient over many years. A second issue is that data were not collected for the purpose of research: measurements are often only taken when the patient visits their doctor, an action which is often prompted by reasons related to the health status of the patient at that time. A third issue is how to deal with missing data in this context. This project will use the Clinical Practice Research Database (CPRD; an anonymised database of the primary care records of five million current UK patients) and the Oxford Research Data Warehouse (a large dataset of electronic health records from primary and secondary care in Oxfordshire) to investigate the major determinants of vascular disease. For comparison, a set of complementary analyses will be conducted using UK Biobank. The specific DPhil project will be subject to further discussion and personal interest.


This project will involve detailed analysis and interpretation of existing data. The student will work within a multi-disciplinary team and will gain research experience in literature review, epidemiological and statistical methodology, programming and data analysis (including machine learning). Regular research meetings and workshops will be held which the candidate will be expected to attend and to present research findings.


The project will provide a range of training opportunities in statistical analysis and interpretation and statistical programming. By the end of the DPhil, it is expected that you will be competent to plan, undertake and interpret statistical analysis of large-scale epidemiological data, and to report your findings. The project will be based at the Big Data Institute and the Clinical Trial Service Unit, Nuffield Department of Population Health, which has excellent facilities and a world-class community of statistical and clinical scientists. There will also be opportunities to participate in Health Data Research UK (HDR UK) research meetings and training activities.


Candidates should have a strong background in a mathematical or biomedical discipline and postgraduate training in epidemiology, statistics or public health (or be willing to do the MSc in Global Health Science at Oxford in preparation for such a project). The project will involve large-scale data and statistical analyses. Candidates should therefore have an interest and aptitude in extending these skills as well as a strong interest in non-communicable disease epidemiology.