Integrating whole‑genome sequencing, proteogenomics, and phenome‑wide analyses to discover, validate, and repurpose drug targets
- 8 September 2025 to 2 December 2025
- Project No: D26040
- DPhil Project 2026
- China Kadoorie Biobank (CKB) Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU)
Background
Functional genetic variants modify the expression and/or activity of proteins which may represent potential drug targets. These natural experiments in human populations can improve the drug development process, such as assisting in prioritising targets based on predicted efficacy, assessing safety, identifying alternative indications and informing clinical trial designs. Large prospective biobank studies, such as China Kadoorie Biobank (CKB) and UK Biobank (UKB) are uniquely positioned to fulfil these goals.
In CKB and UKB, electronic health record linkage records deaths and hospitalisation episodes for thousands of different diseases. Genome-wide genotype data are currently available for 100,000 CKB participants and all UKB participants. Whole genome sequence data are available in all participants in both CKB and UKB. Proteomic data are currently available for ~10,000 proteins in 4,000 CKB participants and for ~3,000 proteins in 54,000 UKB participants, with both datasets expected to expand considerably over the next 2–3 years. These are complemented by other blood biomarkers (e.g. clinical biochemistry, metabolomics, gut microbiome, serology) in all or subsets of participants. Previous research highlights the benefits of assessing drug targets using genetic data from diverse populations.
research experience, research methods and skills training
The DPhil project will assess the biological pathways and clinical outcomes associated with genetic variation in potential therapeutic targets, and will identify novel protein targets for certain diseases. The specific area of research will be developed according to the student’s interests and aptitude, and may include the following key objectives:
- Identifying potential novel therapeutic targets by analysing whole‑genome sequencing data from CKB, UKB and other cohorts, to associate common and aggregated rare variants with specific diseases;
- Prioritising proteins targets through integrated proteogenomic analyses—combining pQTL mapping, Mendelian randomisation and colocalization analyses to identify genetically-supported drug targets for specific diseases;
- Using a phenome-wide (PheWAS) approach to assess the efficacy, safety, and alternative indications (i.e. repurposing) of established and emerging drug targets at different stages of clinical development, using relevant functional genetic variants and pQTLs;
- Exploring the biological relevance of potential drug targets using multi-omics biomarker datasets, and using machine learning tools to assess target druggability.
The student will work within a multi-disciplinary team. There will be training in genetics, epidemiology, statistical analysis, and attendance at relevant courses as required. By the end of the DPhil, the student will be able to plan, undertake and interpret analyses of large-scale genetic, proteomic, and epidemiological data, and report research findings, including conference presentation and publications as the lead author in peer-reviewed journals.
FIELD WORK, SECONDMENTS, INDUSTRY PLACEMENTS AND TRAINING
The project will be based within the CKB research group, part of NDPH and based in the Big Data Institute. There are excellent facilities and a world-class community of population health, data science and genomic medicine researchers. There may be opportunities to work with external partners from industry and other research institutions.
PROSPECTIVE STUDENT
The ideal candidate will have good first degree (2.1 or higher) and MSc or equivalent experience in a relevant subject, with a strong interest in epidemiology, genetics, or statistics. The project will involve large-scale data analyses and requires previous statistical and programming experience.
