GithubHelp home page GithubHelp logo

ldliao / jointvip Goto Github PK

View Code? Open in Web Editor NEW
6.0 2.0 0.0 9.16 MB

Prioritize variables in observational study design through the joint variable importance plot; shiny app: https://ldliao.shinyapps.io/jointVIP/

Home Page: https://ldliao.github.io/jointVIP/

License: Other

R 7.21% TeX 0.33% HTML 92.42% CSS 0.04%
causal-inference observational-study study-design rstats r

jointvip's Introduction

Joint variable importance plot

CRAN_Status_Badge CRAN_Downloads_Badge test-coverage Project Status: Active – The project has reached a stable, usable state and is being actively developed. MIT license

Joint variable importance plot (jointVIP) visualizes each variable’s outcome importance via Pearson’s correlation and treatment importance via cross-sample standardized mean differences. Bias curves enable comparisons to support variable prioritization among potential confounders.

Installation

You can install the jointVIP package on CRAN using:

# for version on CRAN
install.packages("jointVIP")

# for development version on github
devtools::install_github("ldliao/jointVIP")

BRFSS Example

To demonstrate, we use the 2015 Behavioral Risk Factor Surveillance System (BRFSS) example to answer the causal question: Does smoking increase the risk of chronic obstructive pulmonary disease (COPD)? The data and background is inspired by Clay Ford’s work from University of Virginia Library. First, the data is cleaned to only have numeric variables, i.e., all factored variables are transformed via one-hot-encoding. Treatment variable smoke only contains 0 (control) and 1 (treatment).

With the cleaned data, you can specify details in the function create_jointVIP() like so:

library(jointVIP)
## basic example code

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:testthat':
#> 
#>     matches
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
# load data
data('brfss', package='jointVIP')

treatment = 'smoke'
outcome = 'COPD'
covariates = names(brfss)[!names(brfss) %in% c(treatment, outcome)]

## select the pilot sample from random portion
## pilot data here are considered as 'external controls'
## can be a separate dataset; should be chosen with caution
set.seed(1234895)
pilot_prop = 0.2
pilot_sample_num = sample(which(brfss %>% pull(treatment) == 0),
                          length(which(brfss %>% pull(treatment) == 0)) *


## set up pilot and analysis data
## we want to make sure these two data are non-overlapping

pilot_df = brfss[pilot_sample_num, ]
analysis_df = brfss[-pilot_sample_num, ]


## minimal example
brfss_jointVIP = create_jointVIP(treatment = treatment,
                                 outcome = outcome,
                                 covariates = covariates,
                                 pilot_df = pilot_df,
                                 analysis_df = analysis_df)

Generic functions can be used for the jointVIP object to extract information as a glance with summary() and print().

summary(brfss_jointVIP)
#> Max absolute bias is 0.032
#> 3 variables are above the desired 0.01 absolute bias tolerance
#> 13 variables can be plotted
print(brfss_jointVIP)
#>                 bias
#> age_over65     0.032
#> average_drinks 0.031
#> age_25to34     0.012
plot(brfss_jointVIP)

In this example, age_over65 and average_drinks are two most important variables to adjust. At a bias tolerance of 0.01, 3 variables: age_over65, average_drinks, and age_25to34 are above the tolerance threshold. Moreover, age_over65 and average_drinks are of higher importance for adjustment than age_25to34. Although race_black and age_over65 have similar absolute standardized mean differences (0.322 and 0.333, respectively), age_over65 is more important to adjust for since its highly correlated with the outcome.

Acknowledgement

  • Centers for Disease Control and Prevention (CDC). Behavioral Risk Factor Surveillance System Survey Questionnaire. Atlanta, Georgia: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2015.
  • Ford, C. 2018. “Getting Started with Matching Methods.” UVA Library StatLab. https://library.virginia.edu/data/articles/getting-started-with-matching-methods/ (accessed Jan 29, 2024).

jointvip's People

Contributors

ldliao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

jointvip's Issues

JOSS review

Code/Repo

  • Note that the example in the README doesn't work immediately when copy-pasted into an R session (I often do this, presumably others too) because the object df used in the example is created in a hidden Rmd code block, which downloads a CSV of a dataset that is then cleaned to produce df. A cleaner solution would be to move this hidden block to a standalone R script (stored in the package) and distribute the cleaned dataset as part of your package, as described at https://r-pkgs.org/data.html; this has the advantage of making the package self-contained wrt data, as changes in the availability of that CSV won't affect this package's examples/demos.
  • Code organization: Most of the package's core functions seem to be organized in a file general.R, which is fine but may prove challenging to work with for contributors. It's recommended that functions be stored in individual files or at least thematically organized (e.g., classes.R to store your S3 class definitions); see https://r-pkgs.org/code.html#sec-code-organising.
  • The suite of unit tests seems to be reasonably diverse but it's hard to get a sense without a code coverage check in place. On a related note, GitHub Actions seems to be activated but is only being used to produce the JOSS PDF. A better use of the Actions workflow would be to automate the running of unit tests (activated on push and PRs) and code coverage checks; some examples of actions for R are at https://github.com/r-lib/actions
  • Per JOSS review guidelines, some statement about community guidelines is necessary. This could be either as a separate markdown file in the repo or a new section in the existing README file.

Paper

  • There's a minor typo in the abstract (line 14 of JOSS PDF): "The joint variable importance plot (jointVIP) package to guides decisions about which variables to prioritize for adjustment by quantifying and visualizing each variable's relationship to both treatment and outcome" (emphasis mine).
  • On "Development" (lines 49-50 of JOSS PDF): It would be clearer to note that the package exposes a new S3 class, called jointVIP, and exposes methods of the generic functions print(), summary(), and plot() for this class (as opposed to saying that it "leverages system generic functions").
  • On "Usage", a few typos: "outcomeis" instead of "outcome is" and "functionto" instead of "function to"

JOSS Review: @jackmwolf

Hi @ldliao! I'm done with my initial review of jointVIP for openjournals/joss-reviews/issues/6093. The functionality and purpose are great overall. I've listed several minor suggestions for improvement below:

  1. Based on the provided examples and my own testing, it appears that create_jointVIP() does not support categorical variables with >2 levels and such variables must be converted into a set of indicator variables first. I suggest being more explicit about the input requirements in the documentation for create_jointVIP(). Currently, the only listed requirements are that pilot_df and analysis_df are data.frames.

  2. Similarly, the create_jointVIP() returns an error if treatment is not binary (e.g., if it is coded as the strings 'treatment' and 'control'). This expectation could be made explicit in the documentation.

  3. Consider adding documentation for how others can contribute to your software somewhere in your root directory (e.g., CONTRIBUTING.md). You can find an example here.

  4. The GitHub repository contains both LICENSE and LICENSE.md. The former only lists the year and copyright holder and is not a software license.

  5. Can you expand on the statement that "[b]ias curves enable comparisons to support prioritization" in README.md? What is prioritized?

  6. (Very minor) Line 62 of paper.md says "functionto" instead of "function to."

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.