Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

external supervisor

Ruth Keogh, London Schoolof Hygiene and tropical Medicine


Risk prediction models involve predicting the risk of an outcome, such as a disease diagnosis, for individuals based on their characteristics. They are widely used to inform people of their risk of a condition and to identify people at high risk. A key aspect of the development of prediction models is evaluation of their predictive performance.

Prediction models are often developed using data from large cohorts or routinely collected health data. However, this can pose computational difficulties, and not all important predictors may be available for the entire cohort. This project focuses on use of case-cohort studies for the development of prediction models. Case-cohort (or case-subcohort) studies are studies nested within prospective cohort studies, and they make use of data on all people who have the event of interest, but only a subset of the controls. This allows a more efficient estimation of the associations of exposures with outcomes, while allowing to measure variables of interest only in a subset of participants. Such designs are particularly useful when there are expensive predictors that we cannot easily measure of all individuals, for example for measuring biomarkers in stored samples. Several extensions of this design exist, such as stratified designs, which may be used to improve efficiency or address confounding. Methods for estimating such associations are survival analysis methods used in prospective cohort studies, such as Cox regression, with modifications to take into account the sampling scheme. Models can be developed within case-subcohort studies to predict risk of a particular disease.


The project will be developed according to the student’s interests and may include:


  • Application of methods for risk prediction to data from case-cohort studies within large epidemiological cohort studies
  • Review and comparison of methods for selecting variables for risk prediction in case-cohort studies, including using simulation studies
  • Review and development of methods for assessing the predictive ability of models in case-cohort studies
  • Optimal design of case-cohort studies for risk prediction
  • Extensions of existing methods to more complex designs, such as case-cohort studies with stratified subcohorts, or nested case-control studies with various sampling schemes
  • Estimation and prediction taking into account competing risks in case-cohort studies.


The project will provide the student with experience in the development of statistical methodology motivated by applications in epidemiology, statistical analysis of large datasets, and study design. Training in statistics, epidemiology and research methods will be available. It is anticipated that the project will lead to some publications.


The student will be based at the Big Data Institute Building. There are excellent facilities and a world-class community of population health, statistics, epidemiology, and genomic medicine researchers.


The ideal candidate will have a degree in statistics or a related discipline.