Exploiting unstructured data in clinical trial settings
NDPH/MT18/41
funding
The MRC are advertising one fully funded non-clinical studentship award – this is one of the projects that candidates can apply for to be awarded this funding. The closing date for a candidate to be considered for MRC HTMR funding is Monday 18 December 2017 and interviews will be held with the HTMR directors on Tuesday 23 January 2018 in London. Details on how to apply can be found here: http://www.methodologyhubs.mrc.ac.uk/about/phd-studentships/
Please note:
Any applicant who is successful in obtaining MRC funding would also need to be successful in their separate DPhil in Population Health application.
Any applicant who is awarded both MRC funding and a place on the DPhil in Population Health would be required to start the programme early, in Trinity term (April) 2018.
If you’re interested in this project, please contact Michael Lay (michael.lay@ndph.ox.ac.uk) as soon as possible for an informal discussion.
other supervisors
Professor Jim Davies, Professor of Software Engineering, Department of Computer Science
BACKGROUND
Important information about diagnosis, treatment, and outcomes is often available only in the form of unstructured data: in clinical or laboratory reports, in patient notes, or as free text responses on case report forms. Even where the information exists also in coded form, there may be questions as to the accuracy or completeness of the coding.
RESEARCH EXPERIENCE, RESEARCH METHODS AND TRAINING
The studentship will be focussed upon the development and evaluation of methodologies for the management and transformation of unstructured data. This will involve: a systematic review of literature in natural language processing, domain-specific modelling, model-driven transformation, data governance, trials design and compliance; the design and implementation of techniques for automatic analysis, de-identification, and quality assurance; the development of metrics for measuring the applicability of these techniques to different classes of unstructured data; the development of domain-specific modelling languages and ontologies for the classification and management of information contained within and derived from unstructured data; the establishment of key properties of these languages and ontologies, in terms of mathematical foundations and relationships to alternative approaches.
The HPS2-THRIVE and HPS3-REVEAL trials constitute a valuable resource for the development and evaluation of techniques: not only have these trials collected large quantities of unstructured data, including more than 42,000 medication reports, but this data has been manually interpreted against an agreed ontology – at considerable expense in terms of time and clinical effort; the raw, unstructured data and the coded interpretation will be made available to support this research.
FIELD WORK, SECONDMENTS, INDUSTRY PLACEMENTS AND TRAINING
No field work is required, although opportunities for collaboration will be available through the MRC Hub Network, the Farr Institute, UK Healthcare Text Analytics Research Network, the Oxford Big Data Institute, and existing research collaborations, including Stanford University (NIH National Center for Biomedical Ontology; Stanford Center for Biomedical Informatics Research), Vanderbilt University (PheKB, eMERGE consortium), and the University of Washington/Fred Hutchinson Cancer Research Centre.
PROSPECTIVE CANDIDATE
The candidate needs to demonstrate:
1. Proven academic excellence in computer science or related discipline (i.e., 1st class or upper second-class undergraduate degree; or international equivalent; a master’s degree)
2. Proficiency in English and excellent communications skills
3. Research or employment experience relevant to population health would be beneficial, as would experience with unstructured data.