GithubHelp home page GithubHelp logo

nt-williams / lmtp Goto Github PK

View Code? Open in Web Editor NEW
55.0 6.0 16.0 2.97 MB

:package: Non-parametric Causal Effects Based on Modified Treatment Policies :crystal_ball:

Home Page: http://www.beyondtheate.com

License: GNU Affero General Public License v3.0

R 100.00%
causal-inference nonparametric-statistics longitudinal-data statistics machine-learning robust-statistics targeted-learning censored-data stochastic-interventions survival-analysis

lmtp's Introduction

lmtp

CRAN status R build status codecov License: GPL v3 Project Status: Active – The project has reached a stable, usable state and is being actively developed.

Non-parametric Causal Effects of Feasible Interventions Based on Modified Treatment Policies

Nick Williams and Ivan Diaz


lmtp is an R package that provides an estimation framework for the casual effects of feasible interventions based on point-treatment and longitudinal modified treatment policies as described in Diaz, Williams, Hoffman, and Schenck (2020). Two primary estimators are supported, a targeted maximum likelihood (TML) estimator and a sequentially doubly robust (SDR) estimator (a G-computation and an inverse probability of treatment weighting estimator are provided for the sake of being thorough but their use is recommended against in favor of the TML and SDR estimators). Both binary and continuous outcomes (both with censoring) are allowed. lmtp is built atop the SuperLearner package to utilize ensemble machine learning for estimation. The treatment mechanism is estimated via a density ratio classification procedure irrespective of treatment variable type providing decreased computation time when treatment is continuous. Dynamic treatment regimes are also supported.

A list of papers using lmtp is here.

For an in-depth look at the package’s functionality, please consult the accompanying technical paper in Observational Studies.

Installation

lmtp can be installed from CRAN with:

install.packages("lmtp")

The stable, development version can be installed from GitHub with:

devtools::install_github("nt-williams/lmtp@devel")

A version allowing for different covariates sets for the treatment, censoring, and outcome regressions:

devtools::install_github("nt-williams/lmtp@separate-variable-sets")

What even is a modified treatment policy?

Modified treatment policies (MTP) are interventions that can depend on the natural value of the treatment (the treatment value in the absence of intervention). A key assumption for causal inference is the positivity assumption which states that all observations have a non-zero probability of experiencing a treatment value. When working with continuous or multivalued treatments, violations of the positivity assumption are likely to occur. MTPs offer a solution to this problem.

Can lmtp estimate other effects?

Yes! lmtp can estimate the effects of deterministic, static treatment effects (such as the ATE) and deterministic, dynamic treatment regimes for binary, continuous, and survival outcomes.

Features

Feature Status
Point treatment
Longitudinal treatment
Modified treatment intervention
Incremental Propensity Score Intervention (Using the risk ratio)
Static intervention
Dynamic intervention
Continuous treatment
Binary treatment
Categorical treatment
Multivariate treatment
Missingness in treatment
Continuous outcome
Binary outcome
Censored outcome
Mediation
Survey weights
Super learner
Clustered data
Parallel processing
Progress bars

Example

library(lmtp)
#> Loading required package: mlr3superlearner
#> Loading required package: mlr3learners
#> Warning: package 'mlr3learners' was built under R version 4.2.3
#> Loading required package: mlr3
#> Warning: package 'mlr3' was built under R version 4.2.3

# the data: 4 treatment nodes with time varying covariates and a binary outcome
head(sim_t4)
#>   ID L_1 A_1 L_2 A_2 L_3 A_3 L_4 A_4 Y
#> 1  1   2   3   0   1   1   1   1   3 0
#> 2  2   2   1   1   4   0   3   1   2 0
#> 3  3   1   0   1   3   1   2   1   1 1
#> 4  4   1   0   0   3   1   3   1   2 0
#> 5  5   3   3   1   1   0   1   1   2 0
#> 6  6   1   0   0   2   0   3   1   4 0

We’re interested in a treatment policy, d, where exposure is decreased by 1 only among subjects whose exposure won’t go below 1 if intervened upon. The true population outcome under this policy is about 0.305.

# a treatment policy function to be applied at all time points
policy <- function(data, trt) {
  (data[[trt]] - 1) * (data[[trt]] - 1 >= 1) + data[[trt]] * (data[[trt]] - 1 < 1)
}

In addition to specifying a treatment policy, we need to specify our treatment variables and time-varying covariates.

# treatment nodes, a character vector of length 4
A <- c("A_1", "A_2", "A_3", "A_4")
# time varying nodes, a list of length 4
L <- list(c("L_1"), c("L_2"), c("L_3"), c("L_4"))

We can now estimate the effect of our treatment policy, d. In this example, we’ll use the cross-validated TML estimator with 10 folds.

lmtp_tmle(sim_t4, A, "Y", time_vary = L, shift = policy, intervention_type = "mtp", folds = 10)
#> LMTP Estimator: TMLE
#>    Trt. Policy: (policy)
#> 
#> Population intervention estimate
#>       Estimate: 0.2526
#>     Std. error: 0.0223
#>         95% CI: (0.2089, 0.2962)

Data structure

Single time point

Time-varying exposure and confounders, not survival outcome

Single exposure, survival outcome

Time-varying exposure and confounders, survival outcome

Similar Implementations

A variety of other R packages perform similar tasks as lmtp. However, lmtp is the only R package currently capable of estimating causal effects for binary, categorical, and continuous exposures in both the point treatment and longitudinal setting using traditional causal effects or modified treatment policies.

Citation

Please cite the following when using lmtp in publications. Citation should include both the R package article and the paper establishing the statistical methodology.

@article{,
  title = {lmtp: An R package for estimating the causal effects of modified treatment policies},
  author = {Nicholas T Williams and Iván Díaz},
  journal = {Observational Studies},
  year = {2023},
  url = {https://muse.jhu.edu/article/883479}
}

@article{
  doi:10.1080/01621459.2021.1955691,
  author = {Iván Díaz and Nicholas Williams and Katherine L. Hoffman and Edward J. Schenck},
  title = {Non-parametric causal effects based on longitudinal modified treatment policies},
  journal = {Journal of the American Statistical Association},
  year  = {2021},
  doi = {10.1080/01621459.2021.1955691},
  URL = {https://doi.org/10.1080/01621459.2021.1955691},
}

References

Iván Díaz, Nicholas Williams, Katherine L. Hoffman & Edward J. Schenck (2021) Non-parametric causal effects based on longitudinal modified treatment policies, Journal of the American Statistical Association, DOI: 10.1080/01621459.2021.1955691

lmtp's People

Contributors

nt-williams avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

lmtp's Issues

Reconfigure node list implementation

Currently, the node list process forces all variables from before t to be used in estimation. The best option is likely to force the user to create their time ordered node list themselves?

Missing data check

Need to implement a check of missing data among the covariates. Would issue error such as:

Missing data found in treatment and/or covariate nodes. Either impute (recommended) or only use observations with complete covariates. Missing outcomes are okay.

Nodes argument name is ambiguous

Describe the issue
The nodes argument in estimators is ambiguous and does not provide a natural mapping to its role, to specify time varying variables. A name change to time_vary would be more evocative.

learner fails warning message

Is your feature request related to a problem? Please describe.
When sl3 learners fail they are removed from the ensemble. A warning message should be output when this occurs.

The baseline parameter argument is non-intuitive

Describe the issue
If the data generating process is not assumed to contain a Markov property, the use of the baseline argument will throw an error. It would make more sense to always specify baseline covariates here and leave the nodes argument only for time-varying covariates.

Handle binary exposure by latching onto censoring engine?

Not necessary for initial release, but may be good to add on for the future. Would really expand package capabilities. Albeit this is already possible but would be done through the density ratio classification system which is more complicated than need be for a binary exposure.

New package name

Because lmtp is boring

My current favorite: palantir: the seeing stone in lord of the rings
pythia: the oracle of delphi

Naming consistency with paper

pseudo <- paste0("m", tau)

I think this should be named with tau + 1, for consistency with the notation in the paper. Also probably a good idea to call it something else, since it is not actually m but a pseudo outcome. Phi or something like that would work.

Common error accumulation

Common errors/issues that aren't explicitly checked for should now be listed here:

  • user calls make_learner() instead of make_learner_stack() with multiple learners.
  • Factor levels in shift function don't respect the levels in the unshifted data resulting in different column names when expanded into dummy codes.

Allow user to pass non-quoted column names to trt, outcome, and baseline variables

If possible (without introducing a bunch of dplyr dependencies), it would be nice to change the trt, outcome and baseline args of the lmtp_sub/ipw/tmle/sdr estimators to allow quoted or non-quoted variable names to be passed through. If not possible, the help files should be changed so that it is clear that the column names need to be passed as character strings.

Pass multiple treatment policies

Is your feature request related to a problem? Please describe.
Currently, you must run estimators multiple times to test different treatment policies.

Describe the solution you'd like
It would be useful to be able to pass multiple shift functions, as a list, to the shift parameter for estimators.

redundant function as eif

lmtp/R/utils.R

Lines 89 to 95 in f1b98ec

transform_sdr <- function(r, tau, max, shifted, natural) {
natural[is.na(natural)] <- -999
shifted[is.na(shifted)] <- -999
m <- shifted[, (tau + 2):(max + 1), drop = FALSE] - natural[, (tau + 1):max, drop = FALSE]
out <- rowSums(r * m, na.rm = TRUE) + shifted[, tau + 1]
return(out)
}

Fill in examples and add a vignette

As the package moves towards and initial release with the arXiv paper, comprehensive examples need to be added throughout. This should also include the addition of a vignette.

Change censoring parameter name

Via @hoffmakl , the requirement that the censoring nodes should be set equal to 1 if observed and 0 if lost to follow-up and not vice-versa is confusing given that the parameter for these nodes is called cens.

I think the requirement that these nodes are 1 if censored should remain, however, to make this less confusing, it may be worthwhile to change the cens argument to something like observed which is more intuitive for the variable coding.

M-matrix is created using the non-transformed outcome

get_folded_data(cbind(matrix(
nrow = nrow(data), ncol = length(nodes)
), data[[outcome]]),
folds)

This is the cause of the incorrect standard errors and CIs I noticed with continuous outcomes in SDR (issue #14)

Needs to be changed to

      self$m <-
        get_folded_data(cbind(matrix(
          nrow = nrow(data), ncol = length(nodes)
        ), scale_y_continuous(data[[outcome]],
                             y_bounds(data[[outcome]],
                                      outcome_type,
                                      bounds))),
        folds)

Rethink main parameter names

While param names like A and Y are common place and are whats used in the papers. Using a more informative name for the actual functions may increase usability.

Remove the nodes argument requirement

The requirement that the nodes argument is always supplied, even when there are no time varying covariates, presents a barrier to UX. The requirement exists because internally lmtp uses the length of this list to determine how many time points there are. This task could fall to the trt parameter, but this presents an issue in the case of a time-invariant survival analysis where there are multiple time points but the length of the trt argument would be 1.

I think the best solution would be to modify the outcome_type argument so as to accept a 3rd option, survival. When outcome_type is set to "binary" or "continuous" the number of time points will be derived from the trt parameter. If it is set to "survival", then the number of time points will be derived from the length of the vector supplied to cens. This would also open up a check that if the user sets outcome_type = "survival" that the outcome argument is supplied a vector.

Criteria for initial release

A non-exhaustive to-do list:

  • Point treatment binary/continuous outcomes
  • Time varying binary/continuous outcomes
  • Censoring
  • Survival outcomes
  • Crossfit estimates
  • Contrasts
  • Dynamic stochastic regimes
  • Account for clustering?
  • >90% test coverage
  • Vignette covering typical use cases

Elaborate on error for lmtp_contrast

Currently the error for lmtp_contrast when using lmtp_sub or lmtp_ipw objects is that lmtp_contrast is not implemented for these estimators. It may be helpful to elaborate that this is purposeful, since we do not have any asymptotic linearity theory to support SEs from these estimators when ensemble learning is used. Also, maybe an explicit recommendation that "the authors of lmtp recommend using the tmle or sdr estimation methods for optimal results" would be good? Perhaps a reference would be helpful, too. Right now I can only think of Sherri Rose's Intro to TMLE paper as a good ref for an approachable explanation of the cons of IPW/G comp, but there's probably other good ones! https://academic.oup.com/aje/article/185/1/65/2662306

How does SDR handle censored outcome if transformation used in training relies on observed data?

lmtp/R/utils.R

Lines 95 to 101 in 7bb1aac

transform_sdr <- function(r, tau, max, shifted, natural) {
natural[is.na(natural)] <- -999
shifted[is.na(shifted)] <- -999
m <- shifted[, (tau + 2):(max + 1), drop = FALSE] - natural[, (tau + 1):max, drop = FALSE]
out <- rowSums(r * m, na.rm = TRUE) + shifted[, tau + 1]
return(out)
}

Doesn't this mean that training is only ever conducted on those who always remained uncensored? @idiazst

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.