Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

  • 8 September 2025 to 2 December 2025
  • Project No: D26060
  • DPhil Project 2026

Background

Large-scale cohort studies such as UK Biobank (~500,000 participants) rely heavily on linked health records to define health outcomes at scale. These records provide unparalleled opportunities for research but were collected primarily for administrative or clinical purposes and are subject to limitations such as missingness, coding errors, misclassification, and incomplete capture across care settings. Accurate phenotyping of common health outcomes is therefore essential for reliable epidemiological analyses and for maximising the scientific and public health value of UK Biobank. 

Linked data sources (including primary care, hospital admissions, cancer and death registrations, and prescription/dispensing data) offer the opportunity to compare and validate outcome definitions across multiple record types. Additional layers of data (e.g. genetic associations, biomarker profiles, and self-reported information) provide powerful tools for triangulating outcome validity, helping to identify the most robust approaches. Improved outcome validation will strengthen analyses of disease determinants, medication adherence, multimorbidity, and health inequalities, and will help ensure findings can be translated into public health and clinical practice.

research experience, research methods and skills training

The student will:

  1. Systematically evaluate approaches to outcome validation for common conditions (e.g. cardiovascular disease, diabetes, depression, migraine)
  2. Comparing definitions and case-ascertainment from different linked health data sources.
  3. Use genetic, biomarker, and questionnaire data as external anchors to assess the accuracy and robustness of case definitions.
  4. Investigate the potential impact of misclassification and incomplete outcome capture on epidemiological analyses, and consider how these might be addressed.
  5. Develop reproducible algorithms and phenotyping resources to define validated health outcomes in UK Biobank, to be shared with the wider research community.

(Optional, tailored to the student’s interest) Explore how improved outcome validation alters estimates of associations with key risk factors, medication adherence, or health inequalities.

FIELD WORK, SECONDMENTS, INDUSTRY PLACEMENTS AND TRAINING

By the end of the DPhil, it is expected that the candidate will be able to plan, undertake and interpret statistical analysis of large-scale epidemiological data, and to report their findings. The candidate will have acquired transferable skills including drafting project proposals, and presenting the research findings at national and international meetings. The candidate will be encouraged to publish peer-reviewed papers as lead author. 

PROSPECTIVE STUDENT

The ideal candidate will have a Master’s degree in epidemiology or statistics or data science and will be expected to have knowledge and experience in epidemiological study design and related concepts and be adept in statistical analysis. The student will analyse the data using UK Biobank’s Research Analysis Platform and will be using R-studio as the main statistical package.