Large number of explanatory variables for understanding and predicting risk of disease
- 8 September 2025 to 2 December 2025
- Project No: D26053
- DPhil Project 2026
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU)
Background
In many applications in epidemiology, genetics, or population health, it is of interest to understand the relationships between large numbers of variables, which may have complex patterns of correlation or relations based on prior biological knowledge. The aim of this project will be to explore methods for selecting and modelling variables in such settings to explain and predict risk of cancer and other diseases, motivated by applications in large cohort studies such as the UK Biobank. Specific objectives may include:
- To implement and develop novel statistical methods for identifying sets of variables that explain or predict risk of disease or other traits, such as methods aiming to identify a confidence set of models.
- To compare different approaches that may be appropriate for selecting and modelling variables, in particular among highly correlated variables and/or heterogeneity among individuals.
- To explore approaches for incorporating prior biological knowledge from databases containing information on known biological processes and pathways.
- To investigate statistical methods that may be used in the analysis of data with rare outcomes.
research experience, research methods and skills training
The project will involve work in statistical methods, data analysis and literature review. There are opportunities to receive training to develop the skills required.
FIELD WORK, SECONDMENTS, INDUSTRY PLACEMENTS AND TRAINING
There are various opportunities for training within the department and externally. There will be opportunity to work with diverse teams in the department with a range of backgrounds and skills, as well as regular research activities such as seminars.
PROSPECTIVE STUDENT
The ideal candidate will have a Bachelor’s/Master’s degree in statistics, mathematics or a related area, and interest in epidemiology, population health, or genetics.

