Analysing big data from electronic health records to understand the determinants of cardiovascular disease
NDPH/19/48
background
Very large datasets of anonymised electronic health records are increasingly used to construct observational studies with many millions of patients. The size and detailed information on clinical events means such studies hold much potential for understanding the causes of vascular and other chronic diseases. However, there are major methodological challenges to analysing these datasets. One key issue is the sheer size of the available data, with large amounts of data collected on each patient over many years. A second issue is that data were not collected for the purpose of research: measurements are often only taken when the patient visits their doctor, an action which is often prompted by reasons related to the health status of the patient at that time. A third issue is how to deal with missing data in this context. This project will use the Clinical Practice Research Database (CPRD; an anonymised database of the primary care records of five million current UK patients) and the Oxford Research Data Warehouse (a large dataset of electronic health records from primary and secondary care in Oxfordshire) to investigate the major determinants of vascular disease. For comparison, a set of complementary analyses will be conducted using UK Biobank. The specific DPhil project will be subject to further discussion and personal interest.
RESEARCH EXPERIENCE, RESEARCH METHODS AND TRAINING
This project will involve detailed analysis and interpretation of existing data. The student will work within a multi-disciplinary team and will gain research experience in literature review, epidemiological and statistical methodology, programming and data analysis (including machine learning). Regular research meetings and workshops will be held which the candidate will be expected to attend and to present research findings.
FIELD WORK, SECONDMENTS, INDUSTRY PLACEMENTS AND TRAINING
The project will provide a range of training opportunities in statistical analysis and interpretation and statistical programming. By the end of the DPhil, it is expected that you will be competent to plan, undertake and interpret statistical analysis of large-scale epidemiological data, and to report your findings. The project will be based at the Big Data Institute and the Clinical Trial Service Unit, Nuffield Department of Population Health, which has excellent facilities and a world-class community of statistical and clinical scientists. There will also be opportunities to participate in Health Data Research UK (HDR UK) research meetings and training activities.
PROSPECTIVE CANDIDATE
Candidates should have a strong background in a mathematical or biomedical discipline and postgraduate training in epidemiology, statistics or public health (or be willing to do the MSc in Global Health Science at Oxford in preparation for such a project). The project will involve large-scale data and statistical analyses. Candidates should therefore have an interest and aptitude in extending these skills as well as a strong interest in non-communicable disease epidemiology.