Understanding the link between genetic burden, ancestry and health outcomes in diverse populations
- 8 September 2025 to 2 December 2025
- Project No: D26055
- DPhil Project 2026
- China Kadoorie Biobank (CKB) Mexico City Prospective Study
Background
Loss-of-function (LoF) mutations disrupt gene function and can cause serious health consequences. The tolerance to LoF variants varies substantially across genes, which can be quantified using metrics such as Shet - a score that estimates selective pressure against such variants. Summing Shet scores genomewide yields an individual’s genetic burden and has been previously linked to reduced male reproductive success, increased risk of neuropsychiatric disorders, and lower educational attainment. However, these findings almost exclusively been obtained from European cohorts. To extend previous research, this project will leverage two large, diverse studies - the Mexico City Prospective Study (MCPS, n = 144 000 sequenced participants) and the China Kadoorie Biobank (CKB, n = 510 000) - to examine how genetic burden affects health related traits across ancestries.
This project could include any or all of the following aims:
- Identify key selective pressures influencing genetic constraint: Investigate the primary factors driving variability in gene tolerance (Shet scores) across the genome, focusing on factors such as reproductive success, cognitive abilities, mental health, and mortality before reproductive age. This extends previous findings predominantly derived from European cohorts (e.g., Gardner et al., 2022) and aims to understand evolutionary influences across diverse populations.
- Evaluate associations between genetic burden and diverse health outcomes: Conduct a comprehensive assessment of the association between individual genetic burden (aggregated Shet scores) and a wide range of health-related outcomes, including metabolic traits, mental health, anthropometric measures, and reproductive success. Analyses will leverage the substantial number of sibling-pairs in CKB and MCPS cohorts, to perform within-family analysis to control for environmental and socioeconomic confounders.
- Examine ancestry-specific differences in genetic burden: Determine whether genetic burden systematically differs across segments of Indigenous American, East Asian, and European ancestry within the MCPS and CKB cohorts, and to what degree these differences explain health-related outcomes.
research experience, research methods and skills training
The student will gain hands-on training in statistical and genetics methods, including local‐ancestry inference (e.g. RFmix), phenotype curation and burden‐phenotype association testing including family-based designs.
FIELD WORK, SECONDMENTS, INDUSTRY PLACEMENTS AND TRAINING
The project will be based within the MCPS and CKB groups in the Big Data Institute, a community for population health research. In-house training in statistical and epidemiological methods, programming, and scientific writing will be provided, and participation to in-house workshops and lectures will be expected.
PROSPECTIVE STUDENT
The ideal candidate will hold a bachelor’s (or master’s) degree in statistics, epidemiology, public health, computational biology or a related field. They should have strong quantitative and programming skills, familiarity with genetic data formats, and an interest in applying population‐health approaches to genomic research. Excellent communication and ability to work collaboratively in large, interdisciplinary teams are essential.