This project is statistical models for detection of smoking episodes using self-report and sensors to measure smoking episodes. Time-varying covariates such as urge and stress are measured via self-report. Interventions are provided over the study in order to reduce proximal stress. The proposed model attempts to account for the uncertainty in event times when jointly modeling risk and time-varying health processes (e.g., urge/stress) as well as in assessing impact of intervention on proximal risk of smoking.
This project includes the code needed to reproduce results. This includes (A) exploratory data analysis, (B) algorithmic development, and (C) application of models to the cleaned datasets. If using this code please cite the paper using the following bibtex:
@article{dempsey:2020,
author = {Dempsey, Walter},
title = {Hierarchical point process and multi-scale measurements: data integration for latent recurrent event analysis under uncertainty},
booktitle = {arXiv},
year = {2020}}
The goal of this project is to do.
If there are steps to run the code list them as follows:
- Dependencies: all code is developed in Python using Anaconda.
- The Anaconda environment can be installed using bayesian.yml. See here for instructions on creating the environment. Simply open Anaconda shell, open to github repo and run:
conda env create -f bayesian.yml
- Symbolic links: Please setup symbolic links for linking to the Box directory since the data location can change across OS and systems.
- Requires 4 symbolic links
cleaned_data
maps to location...\Box\MD2K Northwestern\Processed Data\smoking-lvm-cleaned-data
data_streams
maps to location...\Box\MD2K Northwestern\Processed Data\Data streams
data_streams_backup
maps to location...\Box\MD2K Northwestern\Processed Data\Data streams - phone backup files
final_data
maps to location...\Box\MD2K Northwestern\Processed Data\smoking-lvm-cleaned-data\final
- For example: On Windows the following run from home directory will generate the correct symbolic link for
final_data
mklink /d final_data ...\Box\MD2K Northwestern\Processed Data\smoking-lvm-cleaned-data\final
- On Mac or Linux, replace
mklink /d
withln -s
- Requires 4 symbolic links
- Data access, preprocessing, and exploratory data analysis
- Data is stored on Box and is owned by PI Bonnie Spring. Access is limited to the study team; however,
- Data preprocessing converts the raw data into a set of data files
- Exploratory data analysis is presented as a set of ipython notebooks. Descriptive statistics are used to inform the prior on the measurement-error models using in the analysis phase
- The methods directory contains all algorithms for MCMC estimation. Algorithms are developed within the pymc3. The algorithm, at a high-level, performs the following
- Sample event times given observations and parameters (using reversible-MCMC adjustment)
- Sample parameters given latent event times (using pyMC3 software)
- All evaluation functions can be found in the the evaluation directory. In particular, we perform posterior predictive checks to confirm model fit to the data.
- Final project report can be found in the write-up directory