Improved methodology for CVD risk prediction using high dimensional data
- 8 September 2025 to 2 December 2025
- Project No: D26039
- DPhil Project 2026
- China Kadoorie Biobank (CKB) Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU)
Background
Cardiovascular disease (CVD), primarily coronary heart disease and stroke, collectively ranks among the leading causes of death worldwide. Traditional clinical predictors, such as blood pressure, body mass index (BMI), cholesterol levels, and medical and family history, have been employed to estimate individual disease risk, but their accuracy remains limited. Therefore, there is a need to develop more precise predictive methods incorporating state-of-the-art omics technologies such as proteomics. However, due to cost issues multi-omics data are at present only available in smaller subsets and a small sample size for model development can compromise model performance when the model is applied in a new setting (particularly calibration). Risk prediction models fitted using maximum likelihood estimation (MLE) are often overfitted resulting in predictions that are too extreme and a calibration slope (CS) less than 1.
This project aims to:
- Compare the utility of conventional penalized methods for risk prediction using conventional and proteomic data;
- Utilise a modified tuning to select tuning parameters in order improve the performance of risk prediction using penalized methods for conventional and proteomic data;
- Investigate models for recurrent events using conventional and proteomic data;
- Propose recommendations on sample size, analysis strategies and which are the most appropriate performance metrics to use.
research experience, research methods and skills training
The student will work within a multi-disciplinary team and have in-house training in epidemiology, statistical programming, risk prediction modelling, machine learning and attendance of relevant courses. By the end of the DPhil, the student will be competent to plan, undertake and interpret analyses of large datasets, and to report research findings in peer-reviewed journals and present at conferences.
PROSPECTIVE STUDENT
The ideal candidate should have a first degree and relevant postgraduate experience in either epidemiology, statistics or data science. Candidates should have strong analytical skills and experience of handling large-scale epidemiological and high dimensional data.
