Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.


There is a growing body of evidence to suggest that complex diseases, such as heart attacks, asthma and chronic obstructive pulmonary disease (COPD), are composed of distinct subtypes with different risk factor and prognostic profiles. Artificial intelligence (AI) holds particular promise in identifying, describing and evaluating such novel disease subtypes. Identifying subtypes will improve our understanding of the causes of disease and enhance personalised treatments. This doctoral project will use large linked datasets of electronic health records (such as the Oxford Research Data Warehouse), together with AI methods, to identify novel subtypes of common diseases. A wide range of approaches will be used, including both unsupervised learning and supervised learnings. We will explore recent machine learning methods, such as deep learning and auxiliary learning. The findings will then be applied to linked electronic health records in UK Biobank, a prospective study of 0.5 million participants, to increase the understanding of these causes of disease. There is scope to tailor the project to the student’s interest and background, including engagement in international collaborations.


The student will work within the rich academic environment of the Nuffield Department of Population Health and affiliated institutions, gaining research experience and skills training in epidemiology and statistics. The successful candidate will have access to several large dataset including the Oxford Research Data Warehouse (a large dataset of electronic health records from secondary care in Oxfordshire) and the UK Biobank, a prospective cohort study of 0.5 million adults of middle aged and older. The student will be supported through regular research meetings and will have the opportunity to participate in training and seminars offered by the unit.


By the end of the DPhil, it is expected that the candidate will be able to plan, undertake and interpret statistical analysis of large-scale epidemiological data, and to report their findings. The candidate will have acquired transferable skills including drafting project proposals, and presenting the research findings at national and international meetings. The candidate will be encouraged to publish peer-reviewed papers as lead author.

prospective student

Candidates should have an MSc degree in statistics/epidemiology/machine learning, or equivalent mathematical background.