This repository contains the various routines used to develop the Clinical Trial Selector (CTS). The data loading functions are based on the Synthetic Veteran Suicide Dataset (SVSD).
The basic workflow for the CTS parser is as follows:
- Collect .csv files that make up SVSD
- Extract EHR for a patient
- Preprocess the EHR using NLP techniques
- Run NER model on preprocessed data
- Map entities to medical concepts
- Generate list of conditions that define the patient based on entities from previous step
- Query for clinical trials based on conditions from above
- Preprocess the eligibility criteria for each clinical trial using NLP techniques
- Run NER model on eligibility criteria for each clinical trial
- Map the terms in the eligibility criteria to medical concepts
- Calculate the Sorensen-Dice Index (SDI) between the patient and each clinical trial
The function returns a dataframe containing ALL queried clinical trials. Future versions will return a dataframe with clinical trials that have an SDI above a predfined threshold value.