Advancing causal understanding of cancer through integrative proteomic and genetic analysis
- 8 September 2025 to 31 July 2026
- Project No: D26072
- DPhil Project 2026
- Cancer Epidemiology Unit (CEU)
Background
Understanding how biological pathways contribute to cancer development is a major goal in population health research. Advances in high-throughput proteomics now enable measurement of thousands of circulating proteins that reflect subtle biological processes and may reveal both early markers of disease and potential aetiological targets for future prevention trials. Because associations observed in proteomic studies may not always distinguish correlation from causation, integrating complementary genetic data can strengthen causal inference and help identify proteins with more robust aetiological evidence for a role in cancer development.
This project will focus on integrating complex proteomic and proteogenomic data from large population-based studies, including the European Prospective Investigation into Cancer and Nutrition (EPIC) and the UK Biobank, to identify key proteins and biological pathways involved in the development of common cancers.
The specific aims of the project may include, and can be adapted to the student’s interests:
- Develop approaches to harmonise proteomic data across large cohorts to improve comparability and data quality.
- Develop integrative methods that combine proteomic and proteogenomic findings with biological knowledge from publicly available resources, to improve understanding of cancer aetiology and prioritise candidate protein targets.
- Generate robust, interpretable networks of proteins and pathways involved in cancer development by combining data from multiple cohorts.
research experience, research methods and skills training
The student will gain experience in analysing large, high-dimensional molecular and epidemiological datasets, developing practical skills in data harmonisation, advanced statistical modelling, and causal inference. The work will involve computational and quantitative approaches to integrate rich, multivariate data across studies and to assess associations between proteins, genetic variants, and cancer risk.
Training will be provided in molecular epidemiology, advanced statistical programming (e.g. R and Python), and modern causal inference methods. The project offers opportunities to contribute to international collaborations and to develop transferable skills in reproducible research, data visualisation, and scientific communication.
FIELD WORK, SECONDMENTS, INDUSTRY PLACEMENTS AND TRAINING
The student will be based at the Richard Doll Building, within a vibrant research environment that includes experts in epidemiology, statistics, bioinformatics, and molecular epidemiology. They will attend departmental seminars and courses, collaborate with multidisciplinary teams, and engage with partners involved in major cohort studies such as EPIC and UK Biobank.
PROSPECTIVE STUDENT
The ideal candidate will have an MSc in a quantitative discipline such as statistics, bioinformatics, or data science, with strong coding skills and an interest in applying them to population health and cancer research. Some background in molecular or genetic epidemiology would be an advantage.
