Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

This DPhil project will assess the feasibility of using centralised sets of real-world data regarding drugs prescribed and dispensed in the National Health Service in England as a primary data source for concomitant drug use in a phase III randomised controlled trial of cardiovascular disease. 


Randomised controlled trials (RCTs) are the gold-standard for the assessment of efficacy and safety of medical interventions. Data used in RCTs is usually purpose-collected during long time periods and with a high level of detail, rendering the conduct of high-quality RCTs both a complex and costly endeavour. 

However, data routinely collected for health care purposes is considerably similar to that collected for medical research, sparking an interest in the use of routinely-collected data (RCD) to inform the design and conduct of RCTs. 

Harnessing RCD might provide enhanced data integrity and completeness for planning, recruitment and follow-up purposes, bearing the promise of reduced costs and improved external validity of results. 

In England, NHS England is the body responsible for developing and maintaining the information collected in the National Health Service (NHS), including RCD generated from routine clinical care. During 2020, NHS England have released a new dataset on drug dispensing collected for reimbursement purposes by the NHS Business Services Authority (NHSBSA). 

Although these data are already in use for pharmacovigilance and regulatory purposes, and in some instances for medical research, this is the first time that the full scope of the data collected at record-level will be available. 

This new dataset has therefore the potential for being one of the largest and most comprehensive data sources of its kind, and may therefore have a major impact on future research using RCD. Nevertheless, it is unclear whether it is feasible to use them as a primary data source in clinical research, and therefore how scientifically valuable it is. 

Aims and methods

The main aim of this project is to determine the feasibility of using these contemporary datasets as primary data sources in RCTs. In order to address this question, we will investigate what are the components and quality of this dataset, how it compares to similar data collected from other sources, and whether it is prepared for linkage to data collected for other purposes. 

Finally, if this dataset is to be used in the future in RCTs, it will be important to understand how these data can be aligned with the Clinical Data Interchange Standards Consortium (CDISC) standards, as required for regulatory submission in the United States and Japan, and what improvements could be made to the CDISC standards to accommodate data from these sources. 

Secondary outputs of the project will include description, validation, mapping and curation of this resource according to CDISC standards, which will inform future applications of the data.

Data Privacy Notice