Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

  • 8 September 2025 to 2 December 2025
  • Project No: D26048
  • DPhil Project 2026
  • China Kadoorie Biobank (CKB) Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU)

Background

Risk prediction models involve predicting the risk of an outcome, such as a disease diagnosis, for individuals based on their characteristics. They are widely used to inform people of their risk of a condition and to identify people at high risk. Advances in high-throughput plasma proteomics assays now enable assessment of thousands of proteins in plasma, and several recent studies have demonstrated that plasma proteomics can improve risk prediction for many diseases. 

Despite the use of standard protocols for sample processing and data normalisation, there remains nuisance variability in protein measurements. This includes biological fluctuations, short-term environmental effects and technical variability. Thus, a single assay measurement of protein levels for an individual may not capture true or usual blood or tissue levels of proteins. Such measurement error can negatively affect the performance of risk prediction models. Further complexities in proteomics data, such as missing data, limits of detection and quality control flags, must also be addressed when using such data for prediction. 

The aim of this project will be developed according to the student’s interests and may include:

  1. Investigating the impact of measurement error on the performance of existing proteomics-based disease risk prediction models.
  2. Assessing existing and developing novel methods to account for measurement error in proteomics.
  3. Assessing how single or repeated assays may be used to account for measurement error and variability.
  4. Investigating how other biological information or genetic data can supplement proteomics for risk prediction.
  5. Validating and developing methods for handling missing, imprecise, or low-quality protein measurements in prediction.

The research will include both statistical and machine learning based approaches to prediction. The work may also extend to other omics data, including metabolomics, and other prediction problems such as ageing clocks. 

research experience, research methods and skills training

The research will include both statistical and machine learning based approaches to prediction. The work may also extend to other omics data, including metabolomics, and other prediction problems such as ageing clocks. 

FIELD WORK, SECONDMENTS, INDUSTRY PLACEMENTS AND TRAINING

The student will be based at the Big Data Institute Building. There are excellent facilities and a world-class community of population health, data science, epidemiology, and genomic medicine researchers. There will be in-house training in epidemiology, statistics, and genetics.

PROSPECTIVE STUDENT

The ideal candidate will have a Masters degree in a relevant area (e.g. statistics/ML/epidemiology) or a bachelor’s degree and some work experience.