Researchers at Oxford Population Health’s Demographic Science Unit and the Leverhulme Centre for Demographic Science have developed Phenofhy, a new open-source Python package designed to help scientists process and quality-control phenotypic data within the Our Future Health Trusted Research Environment (TRE) more efficiently. The toolkit is introduced in a paper published in Nature Medicine.
TREs are secure, cloud-based systems now used as standard infrastructure by major biobanks. They improve data security and privacy by allowing researchers to analyse de-identified health data remotely without the data leaving the TRE, allowing data providers to retain oversight and control.
Phenofhy addresses a practical gap for researchers working within TREs – the need to support exploratory, trial-and-error work locally, enabling researchers to develop the computational pipelines that are essential for biomedical discovery, particularly when working with large-scale real-world health data.
Vincent Straub, DPhil candidate at Oxford Population Health, said ‘Alongside increasing trust, which seems to be the critical challenge facing UK health data operators and biobanks, we need systems that not only minimise the risk of data leaks but also ones that support the often messy and exploratory reality of scientific discovery.
‘TREs have become a widely adopted technical solution, but we now also need to pay attention to their sociotechnical nature, as they can still involve high costs, limited tooling, and may deter exploratory analyses, crucial for students and discovery. We hope to help change that.’
Phenofhy has been developed as part of the Our Future Health Early Adopters Programme, and is available to all approved Our Future Health researchers via the TRE. Researchers can apply for access to Our Future Health by first becoming “registered researchers”.
Phenofhy is part of a broader set of recommendations the authors make for improving TRE usability — including providing training environments that mirror real data structures, and cost estimators (like that provided by UK Biobank) that provide tailored guidance on the costs of running various types of analyses.
Professor Melinda Mills, Director of the Demographic Science Unit at Oxford Population Health, added ‘Large-scale health research requires access to data for bona fide researchers to realise the true potential of the data. Systems like TREs are crucial to secure access to sensitive health data, preserve privacy, public trust and the long-term viability of scientific discovery.
Data that are locked away benefit no one, and TREs are crucial. But, if we are to realise the full scientific potential of data, we need systems that are not only secure, but also accessible, practical and sustainable for researchers of all means and career stages. Phenofhy is one more step in the direction of making these systems more accessible for all.
The Phenofhy package is available as a beta release via GitHub, where it can be downloaded and imported into a researcher’s TRE project space via the Our Future Health Airlock process.
