Genomics investigation of the causes of disease in diverse populations
- 8 September 2025 to 2 December 2025
- Project No: D26041
- DPhil Project 2026
- China Kadoorie Biobank (CKB) Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU)
Background
Genetic analyses provide powerful tools for understanding the causes of disease. Genetic differences between individuals which are associated with disease risks highlight biological processes involved in disease; enable identification of individuals at high risk of disease; and potentially contribute to development of new drugs for disease prevention or treatment. The power of these investigations can be enhanced by combining data from many different populations, and by identifying similarities and differences between them. Large prospective biobank studies, such as China Kadoorie Biobank (CKB) and UK Biobank (UKB) are well-positioned to fulfil these goals (http://www.ckbiobank.org/achievements/genetic-collaborations).
In CKB and UKB, electronic health record linkage records deaths and hospitalisation episodes for thousands of different diseases. Genome-wide genotype data are currently available for 100,000 CKB participants and all UKB participants. Whole genome sequence data are available in all participants in both CKB and UKB. Many other data are also available, including proteomic data for ~10,000 proteins in 4,000 CKB participants and for ~3,000 proteins in 54,000 UKB participants (with both datasets expected to expand considerably over the next 2–3 years) and blood biomarkers (e.g., clinical biochemistry, metabolomics, gut microbiome, serology) in all or subsets of participants. Together, these provide unprecedented opportunities for comparing the genetic architecture of disease in Europeans and East Asians, and for using genetics to investigate the contribution of genetic and other risk factors to a wide range of diseases and traits.
research experience, research methods and skills training
A wide range of genetics and genomics projects are available. The specific areas of research will be developed in discussion with the student according to their interests and aptitudes, but will potentially include several of the following:
- Genome-wide association analysis of relevant diseases or traits
- Trans-ancestry meta-analysis
- Impact of genomic structural variants on disease and risk factors
- Construction and application of genetic and/or polygenic scores
- Formulation and coding/programming of novel analytical approaches
- Mendelian randomisation, including traditional and genetic epidemiology
Examples of possible project areas include:
- Stroke and stroke subtypes
- Spondylosis (arthritis of the spine)
- Respiratory disease
- Infection and immunity
- Population genetics and natural selection
The student will work within a multi-disciplinary team. There will be training in statistical and computational genetics, epidemiology, statistical analysis, and attendance at relevant courses as required. By the end of the DPhil, the student will be able to plan, undertake and interpret analyses of large-scale genetic data, and report research findings, including conference presentation and publications as the lead author in peer-reviewed journals.
FIELD WORK, SECONDMENTS, INDUSTRY PLACEMENTS AND TRAINING
The project will be based within the CKB research group, part of NDPH and based in the Big Data Institute. There are excellent facilities and a world-class community of population health, data science and genomic medicine researchers. There may be opportunities to work with external partners from industry and other research institutions
PROSPECTIVE STUDENT
The ideal candidate will have good first degree (2.1 or higher) and MSc or equivalent experience in a relevant subject, with a strong interest in one or more of genetics, statistics, computational biology, or epidemiology. The project will involve large-scale data and statistical analyses, so requires some aptitude for data handling and programming.
