ldliao / jointvip Goto Github PK

Prioritize variables in observational study design through the joint variable importance plot; shiny app: https://ldliao.shinyapps.io/jointVIP/

Home Page: https://ldliao.github.io/jointVIP/

License: Other

R 7.21% TeX 0.33% HTML 92.42% CSS 0.04%

causal-inference observational-study study-design rstats r

jointvip's Introduction

Joint variable importance plot

Joint variable importance plot (jointVIP) visualizes each variable’s outcome importance via Pearson’s correlation and treatment importance via cross-sample standardized mean differences. Bias curves enable comparisons to support variable prioritization among potential confounders.

Installation

You can install the jointVIP package on CRAN using:

# for version on CRAN
install.packages("jointVIP")

# for development version on github
devtools::install_github("ldliao/jointVIP")

BRFSS Example

To demonstrate, we use the 2015 Behavioral Risk Factor Surveillance System (BRFSS) example to answer the causal question: Does smoking increase the risk of chronic obstructive pulmonary disease (COPD)? The data and background is inspired by Clay Ford’s work from University of Virginia Library. First, the data is cleaned to only have numeric variables, i.e., all factored variables are transformed via one-hot-encoding. Treatment variable smoke only contains 0 (control) and 1 (treatment).

With the cleaned data, you can specify details in the function create_jointVIP() like so:

library(jointVIP)
## basic example code

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:testthat':
#> 
#>     matches
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# load data
data('brfss', package='jointVIP')

treatment = 'smoke'
outcome = 'COPD'
covariates = names(brfss)[!names(brfss) %in% c(treatment, outcome)]

## select the pilot sample from random portion
## pilot data here are considered as 'external controls'
## can be a separate dataset; should be chosen with caution
set.seed(1234895)
pilot_prop = 0.2
pilot_sample_num = sample(which(brfss %>% pull(treatment) == 0),
                          length(which(brfss %>% pull(treatment) == 0)) *


## set up pilot and analysis data
## we want to make sure these two data are non-overlapping

pilot_df = brfss[pilot_sample_num, ]
analysis_df = brfss[-pilot_sample_num, ]


## minimal example
brfss_jointVIP = create_jointVIP(treatment = treatment,
                                 outcome = outcome,
                                 covariates = covariates,
                                 pilot_df = pilot_df,
                                 analysis_df = analysis_df)

Generic functions can be used for the jointVIP object to extract information as a glance with summary() and print().

summary(brfss_jointVIP)
#> Max absolute bias is 0.032
#> 3 variables are above the desired 0.01 absolute bias tolerance
#> 13 variables can be plotted

print(brfss_jointVIP)
#>                 bias
#> age_over65     0.032
#> average_drinks 0.031
#> age_25to34     0.012

plot(brfss_jointVIP)

In this example, age_over65 and average_drinks are two most important variables to adjust. At a bias tolerance of 0.01, 3 variables: age_over65, average_drinks, and age_25to34 are above the tolerance threshold. Moreover, age_over65 and average_drinks are of higher importance for adjustment than age_25to34. Although race_black and age_over65 have similar absolute standardized mean differences (0.322 and 0.333, respectively), age_over65 is more important to adjust for since its highly correlated with the outcome.

Acknowledgement

Centers for Disease Control and Prevention (CDC). Behavioral Risk Factor Surveillance System Survey Questionnaire. Atlanta, Georgia: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2015.
Ford, C. 2018. “Getting Started with Matching Methods.” UVA Library StatLab. https://library.virginia.edu/data/articles/getting-started-with-matching-methods/ (accessed Jan 29, 2024).

jointvip's People

Contributors

Stargazers

Watchers

jointvip's Issues

JOSS review

Code/Repo

Note that the example in the README doesn't work immediately when copy-pasted into an R session (I often do this, presumably others too) because the object df used in the example is created in a hidden Rmd code block, which downloads a CSV of a dataset that is then cleaned to produce df. A cleaner solution would be to move this hidden block to a standalone R script (stored in the package) and distribute the cleaned dataset as part of your package, as described at https://r-pkgs.org/data.html; this has the advantage of making the package self-contained wrt data, as changes in the availability of that CSV won't affect this package's examples/demos.
Code organization: Most of the package's core functions seem to be organized in a file general.R, which is fine but may prove challenging to work with for contributors. It's recommended that functions be stored in individual files or at least thematically organized (e.g., classes.R to store your S3 class definitions); see https://r-pkgs.org/code.html#sec-code-organising.
The suite of unit tests seems to be reasonably diverse but it's hard to get a sense without a code coverage check in place. On a related note, GitHub Actions seems to be activated but is only being used to produce the JOSS PDF. A better use of the Actions workflow would be to automate the running of unit tests (activated on push and PRs) and code coverage checks; some examples of actions for R are at https://github.com/r-lib/actions
Per JOSS review guidelines, some statement about community guidelines is necessary. This could be either as a separate markdown file in the repo or a new section in the existing README file.

Paper

There's a minor typo in the abstract (line 14 of JOSS PDF): "The joint variable importance plot (jointVIP) package to guides decisions about which variables to prioritize for adjustment by quantifying and visualizing each variable's relationship to both treatment and outcome" (emphasis mine).
On "Development" (lines 49-50 of JOSS PDF): It would be clearer to note that the package exposes a new S3 class, called jointVIP, and exposes methods of the generic functions print(), summary(), and plot() for this class (as opposed to saying that it "leverages system generic functions").
On "Usage", a few typos: "outcomeis" instead of "outcome is" and "functionto" instead of "function to"

JOSS Review: @jackmwolf

Hi @ldliao! I'm done with my initial review of jointVIP for openjournals/joss-reviews/issues/6093. The functionality and purpose are great overall. I've listed several minor suggestions for improvement below:

Based on the provided examples and my own testing, it appears that create_jointVIP() does not support categorical variables with >2 levels and such variables must be converted into a set of indicator variables first. I suggest being more explicit about the input requirements in the documentation for create_jointVIP(). Currently, the only listed requirements are that pilot_df and analysis_df are data.frames.
Similarly, the create_jointVIP() returns an error if treatment is not binary (e.g., if it is coded as the strings 'treatment' and 'control'). This expectation could be made explicit in the documentation.
Consider adding documentation for how others can contribute to your software somewhere in your root directory (e.g., CONTRIBUTING.md). You can find an example here.
The GitHub repository contains both LICENSE and LICENSE.md. The former only lists the year and copyright holder and is not a software license.
Can you expand on the statement that "[b]ias curves enable comparisons to support prioritization" in README.md? What is prioritized?
(Very minor) Line 62 of paper.md says "functionto" instead of "function to."

ldliao / jointvip Goto Github PK

jointvip's Introduction

Joint variable importance plot

Installation

BRFSS Example

Acknowledgement

jointvip's People

Contributors

Stargazers

Watchers

jointvip's Issues

JOSS review

Code/Repo

Paper

JOSS Review: @jackmwolf

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs