Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

INTRODUCTION: Electronic health records are invaluable for pregnancy-related studies. The Clinical Practice Research Datalink (CPRD) Pregnancy Register (PR) identifies pregnancies in primary care records, including uncertain cases. OBJECTIVES: This paper outlines a method to reduce uncertainty in identifying pregnancies within CPRD GOLD PR data, exemplified through a study investigating the provision of pre-pregnancy care. METHODS: We used CPRD Mother Baby Link (MBL) and Maternity Hospital Episode Statistics (HES) to clean and augment the CPRD PR data. The study included all women aged 18-48yrs, registered at an English GP practice within CPRD on 01/01/2017, with a year of prior registration and eligibility for hospital data linkage. We developed a cleaning and combining algorithm and further applied strict data quality criteria to form three populations: 'as provided', 'derived' (using our algorithm) and 'strictly derived' (with stricter data quality criteria). We compared characteristics and outcomes across these populations, examining potential biases in effect estimates using the 'as provided' population. RESULTS: Our algorithm added 22,270 (~7%) pregnancies from hospital data to the CPRD PR (1997-2021), eliminated conflicting pregnancies and pregnancies with unknown outcomes, and minimised potentially non-contemporaneous records of past pregnancies or partial records of pregnancies.For all pregnancies across women's reproductive history, in the 'strictly derived' population, characterised by better data quality, a higher prevalence of pre-existing medical conditions and increased pre-pregnancy care were observed. In this dataset, recording of both exposure and outcome was better, and the magnitude of the association between exposure and outcome was reduced compared to the 'as provided' population. CONCLUSION: PR data requires cleaning before use. This study presents a pragmatic and practical method to identify pregnancies using existing CPRD data and linked records, without needing additional data. Researchers should carefully consider their studies' specific requirements and may adapt our proposed methodology accordingly to align with their research questions.

More information Original publication

DOI

10.23889/ijpds.v10i1.2471

Type

Journal article

Publication Date

2025-01-01T00:00:00+00:00

Volume

10

Keywords

Clinical Practice Research Datalink (CPRD), Hospital Episode Statistics (HES) Maternity, Mother Baby Link, Pregnancy Register, electronic health records, electronic medical records, methodological study, pregnancies, routine National Health Service (NHS) data, Humans, Female, Pregnancy, Adult, Registries, Algorithms, Young Adult, Uncertainty, Adolescent, Electronic Health Records, Middle Aged, England, Data Accuracy