romainkp / stremr Goto Github PK

Streamlined Estimation for Static, Dynamic and Stochastic Treatment Regimes in Longitudinal Data

License: MIT License

R 99.84% TeX 0.16%

propensity-scores censoring-events ipw-msm grid-search tuning-parameters tmle survival machine-learning targeted-learning time-varying-confounding

stremr's Issues

ID cannot be a factor, all character variables will be ignored

Add a test that the ID variable provided by the user is NOT a factor within importData and return an error to the user otherwise.

Make it clear in the doc that the ID variable should be not be provided as a factor. Otherwise, the current implementation will create dummies for all study ID.

Make it clear in the doc that all character variables are ignored by stremr (i.e., they won't be converted to factors).

Replace IPAW with IPW

In the report and in the argument values of the make report routine

This is to be match the most common terminology used in the literature

IPW for Categorical Exposure with 4 levels

IPW_for_Categorical_Exposure_with_4_Levels.docx

fitPropensity has error when the exposure is categorical with more than 2 levels (‘max’ not meaningful for factors). Please see the example bellow

stremr version 0.8.99 and Installing stremr package from GitHub and loading the packages


# ----------------------------------------------------------------------
# Instal stremr Version 0.8.99 Data
# ----------------------------------------------------------------------
#knitr::opts_chunk$set(echo = TRUE)

library(devtools)
#install_github("osofr/stremr", ref = "experimental_master")

# ----------------------------------------------------------------------
# Get Libraries
# ----------------------------------------------------------------------
library(stremr)
library(data.table)
library(magrittr)
library(h2o)
options(stremr.verbose=TRUE)
sessionInfo()

R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rmarkdown_1.8 repmis_0.5 sl3_1.0.0 h2o_3.16.0.2
[5] magrittr_1.5 data.table_1.10.5 devtools_1.13.4 haven_1.1.0
[9] stremr_0.8.99

Get Source Data from another Github repository

Read sampleAD.RData from Soudi00 GitHub repository

library(repmis)

source_data("https://github.com/Soudi00/Multi-Treatment-Causal-Modeling/blob/master/sampleAD.RData?raw=True")

AD = as.data.table(AD, key= c(ID,SEQ))

use importData to prepare the data to get porpensity scores, also define censoring and exposrure regressions

# ----------------------------------------------------------------------
# Import Data
# ----------------------------------------------------------------------
OData.2  <-  importData(AD, ID = "ID", t_name = "SEQ", 
                        covars = c("CAT_VAR1","CAT_VAR2","CONT_VAR1"),           
                        CENS = c("CNS","ADM_CNS"), 
                        TRT = "TRTN",
                        MONITOR = NULL, OUTCOME = "STATUS",
                        weights = NULL, remove_extra_rows = TRUE,
                        verbose = getOption("stremr.verbose"))

# ----------------------------------------------------------------------
# Look at the input data object
# ----------------------------------------------------------------------
print(OData.2)

# ----------------------------------------------------------------------
# Access the input data
# ----------------------------------------------------------------------
get_data(OData.2)

# ----------------------------------------------------------------------
# Regression formula for Right Censoring and Administrative
# Censoring and  Exposure
# ----------------------------------------------------------------------
gform_CENS <- "CNS + ADM_CNS ~ CAT_VAR1 + CONT_VAR1"
gform_TRT = "TRTN ~ CAT_VAR1 + CAT_VAR2 + CONT_VAR1"

#Error in fitPropensity (not meaningful for factors)

I tried different options in fitPropensity but none of them works for categorical with more than 2 levels.


# ----------------------------------------------------------------------
# Estimate Propensity Scores
# fitPRopensity score with all defult option has an error
# ----------------------------------------------------------------------

OData.2 <- fitPropensity(OData.2, gform_CENS = gform_CENS,ngform_TRT = gform_TRT )

Using the default regression formula: TRTN ~ CAT_VAR1_2+CAT_VAR1_3+CAT_VAR1_4+CAT_VAR1_6+CAT_VAR1_7+CAT_VAR2_2+CAT_VAR2_3+CAT_VAR2_4+CAT_VAR2_5+CONT_VAR1
[1] "New 'ModelBinomial' regression defined:"
[1] "P(CNS|CAT_VAR1_2, CAT_VAR1_3, CAT_VAR1_4, CAT_VAR1_6, CAT_VAR1_7, CONT_VAR1);\ outvar.class: binomial;\ Stratify: ;\ N: NA"
[1] "New 'ModelBinomial' regression defined:"
[1] "P(ADM_CNS|CNS, CAT_VAR1_2, CAT_VAR1_3, CAT_VAR1_4, CAT_VAR1_6, CAT_VAR1_7, CONT_VAR1);\ outvar.class: binomial;\ Stratify: CNS == 0;\ N: NA"
[1] "New 'ModelCategorical' regression defined:"
[1] "P(TRTN|CAT_VAR1_2, CAT_VAR1_3, CAT_VAR1_4, CAT_VAR1_6, CAT_VAR1_7, CAT_VAR2_2, CAT_VAR2_3, CAT_VAR2_4, CAT_VAR2_5, CONT_VAR1);\ outvar.class: categorical;\ Stratify: ;\ N: NA"
[1] "fitting the model: P(CNS|CAT_VAR1_2, CAT_VAR1_3, CAT_VAR1_4, CAT_VAR1_6, CAT_VAR1_7, CONT_VAR1);\ outvar.class: binomial;\ Stratify: ;\ N: NA"
[1] "fitting the model: P(ADM_CNS|CNS, CAT_VAR1_2, CAT_VAR1_3, CAT_VAR1_4, CAT_VAR1_6, CAT_VAR1_7, CONT_VAR1);\ outvar.class: binomial;\ Stratify: CNS == 0;\ N: NA"
[1] "fitting the model: P(TRTN|CAT_VAR1_2, CAT_VAR1_3, CAT_VAR1_4, CAT_VAR1_6, CAT_VAR1_7, CAT_VAR2_2, CAT_VAR2_3, CAT_VAR2_4, CAT_VAR2_5, CONT_VAR1);\ outvar.class: categorical;\ Stratify: ;\ N: NA"
Failed on Lrnr_condensier_c("equal.mass", "equal.len", "dhist")_5_20_FALSE_NA_FALSE_NULL
Error in Summary.factor(structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, :
‘max’ not meaningful for factors

sl3 error debugging info:
[1] "Error in Summary.factor(structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, : \n ‘max’ not meaningful for factors\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in Summary.factor(structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 3L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 3L, 1L, 1L, 1L, 2L, 2L, 3L, 4L, 2L, 2L, 2L, 2L, 3L, 1L, 2L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 3L, 4L, 3L, 2L, 3L, 1L, 2L, 3L, 2L, 3L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 1L, 2L, 2L, 3L, 2L, 2L, 2L, 3L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 1L, 1L, 2L, 1L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 1L, 3L, 3L, 3L, 1L, 2L, 1L, 1L, 2L, 1L, 3L, 4L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 4L, 4L, 4L, 2L, 2L, 2L, 4L, 2L, 4L, 4L, 4L, 2L, 2L, 2L, 4L, 4L, 4L, 2L, 4L, 2L, 2L, 4L, 2L, 4L, 4L, 4L, 2L, 2L, 2L, 4L, 4L, 2L, 2L, 2L), .Label = c("1", "2", "3", "4"), class = "factor"), na.rm = FALSE): 'max' not meaningful for factors>
...trying to run Lrnr_glm_fast as a backup...
Warning in Ops.factor(y, mu): '-' not meaningful for factors
Warning in Ops.factor(eta, offset): '-' not meaningful for factors
Warning in Ops.factor(y, mu): '-' not meaningful for factors
Warning in Ops.factor(y, mu): '-' not meaningful for factors
Warning in Ops.factor(weights, y): '*' not meaningful for factors
Warning in Ops.factor(y, mu): '-' not meaningful for factors
Warning in Ops.factor(y, mu): '-' not meaningful for factors
Error in private$PsAsW.models[[k_i]]$predictAeqa(newdata = newdata, n = n, : some of the modeling predictions resulted in NAs, which indicates an error in a prediction routine

tried modeling treatment with Gradient Boosting machines same error


# ----------------------------------------------------------------------
# Fitting treatment model with Gradient Boosting machines:
# ----------------------------------------------------------------------
require("h2o")
h2o::h2o.init(nthreads = -1)
gform_CENS <- "CNS + ADM_CNS ~ CAT_VAR1 + CONT_VAR1"
models_TRT <- sl3::Lrnr_h2o_grid$new(algorithm = "gbm")
OData.2 <- fitPropensity(OData.2, gform_CENS = gform_CENS,
                        gform_TRT = gform_TRT,
                        models_TRT = models_TRT)

# Use `H2O-3` distributed implementation of GLM for treatment model estimator:
models_TRT <- sl3::Lrnr_h2o_glm$new(family = "multinomial")
OData.2 <- fitPropensity(OData.2, gform_CENS = gform_CENS,
                        gform_TRT = gform_TRT,
                        models_TRT = models_TRT)

Time dependent propensity models

Different from stratified ps models: how do I specify different variable sets in g_form_TRT?

new make_report argument

Currently, a call to make_report_rmd leads to a browser or a pdf reader to open up automatically and display the report of analysis results. This is problematic when make_report_rmd is called several times from a larger program and one ends up with a cluttered desktop.

Is it possible to add an argument that prevents the pop-up of the display windows?

better docs for the output

Describe the output results table(s) for each function
Define each column and describe what it contains)

For example:

     est_name time   sum_Y_IPW    sum_IPW St.directIPW rule.name
 1: directIPW    0    9.828827   225.6710    0.9564462   TI.set1
 2: directIPW    1   14.841714   308.6067    0.9519073   TI.set1
 3: directIPW    2   21.627479   430.0012    0.9497037   TI.set1
 4: directIPW    3   21.627479   606.0012    0.9643112   TI.set1
 5: directIPW    4   43.571094   868.0025    0.9498030   TI.set1
 6: directIPW    5   61.760385  1241.4314    0.9502507   TI.set1
 7: directIPW    6   66.382274  1802.6454    0.9631751   TI.set1
 8: directIPW    7  116.322109  2638.1985    0.9559085   TI.set1
 9: directIPW    8  198.486284  3871.7562    0.9487348   TI.set1
10: directIPW    9  254.941362  5714.0214    0.9553832   TI.set1
11: directIPW   10  254.941362  8456.0006    0.9698508   TI.set1
12: directIPW   11  397.563386 12563.9928    0.9683569   TI.set1
13: directIPW   12 1225.200094 18784.3937    0.9347756   TI.set1
14: directIPW   13 1816.300699 27581.8941    0.9341488   TI.set1
15: directIPW   14 1816.300699 40628.9102    0.9552954   TI.set1
16: directIPW   15 4927.103677 60323.6409    0.9183222   TI.set1
17: directIPW   16 4927.103677 88105.7038    0.9440774   TI.set1

using stremr without monitoring

Is it possible to use stremr without a monitoring variable? There does not seem to be a default for MONITOR in defineIntervedTRT(), so do I have to construct a pseudo monitoring variable for the function to work?

Thanks

IC routine crash

when the MSM parameterization results in no event for a time interval under a given rule

contrast of joint (dx,bar{n}) interventions defined by same dx but different bar{n}

Example: UACR/TI 90days.

dx with x=7.5
bar{n}* = continous 1, 101010..., 1001001...

In this example, we are interested in an MSM for 3 rules.

survMSM however ignores the fact that each intervention is indexed by a particular monitoring intervention and defines model terms solely based on the various levels of treatment interventions. In this case, there is only one dx so survMSM gives:

MSM.IPAW$MSM.fit
$coef
Periods.0to0_new.d7.5 Periods.1to1_new.d7.5 Periods.2to2_new.d7.5 Periods.3to3_new.d7.5
-5.362056 -3.953137 -4.067354 -3.152000
Periods.4to4_new.d7.5 Periods.5to5_new.d7.5 Periods.6to6_new.d7.5 Periods.7to7_new.d7.5
-6.936838 -3.870630 -4.138810 -2.931822
Periods.8to8_new.d7.5 Periods.9to17_new.d7.5
-5.218133 -5.028533

$linkfun
[1] "logit_linkinv"

$fitfunname
[1] "h2o.glm"

I'll email template code that describes the problem in more details

SL as option for PS including screener

stratifyQ_by_rule

The documentation in fit_GCOMP() for stratifyQ_by_rule says,

Set to 'TRUE' for stratifying the fit of Q (the outcome model) by rule-followers only.

Does this mean that non-rule-followers are excluded at each time point from the hazard estimate? I don't see separate "stratified" estimates.

Thanks,

package dependencies

Is there documentation for all of the package dependencies of stremr?

Add percentiles of IP weights in report

Add 95th, 99th, and 99.9th percentiles and maximum value of IP weights in relevant section of report

Add CI and p value for MSM coefficients

stremr output

Can you help me understand the output of stremr? In the optimal dynamic treatment rule example, in the output table IPW_estimates, there are the variables: sum_Y_IPAW, sum_all_IPAW, ht, St.IPTW, ht.KM. Can you help me understand how to interpret each of these variables?

factor with 1 level only

When a factor variable can take on only one value, importData creates a dummy of class "list" which cannot be handled by subsequent stremr routines (Error message is produced).

Stremr should catch the problem, print a warning and create a single dummy variable of class "integer".

allow stratified modeling with Q learning

All the functionality is already there, except that extraction of the prediction results requires writing custom access functions. Specifically we need to extract the final predictions for each strata of Q prediction and combine them into N predictions.

Add truncation level and % truncated in report

Allow input of counterfactual (intervention) probabilities for TRT and MONITOR

Current set-up only permits the input of probabilities/indicators of observation following an intervention-specific treatment and/or monitoring rule (arguments gstar.TRT and gstar.MONITOR).

In some cases it might be more intuitive to have as input the counterfactual values of treatment and monitoring based on some intervention rule (the current functionality of ltmle and tmlenet). For dynamic rules, these would be evaluated for each observation and could also be specified as probabilities of TRT[t]=1 and MONITOR[t]=1, under some intervention rule. In this case, the rule followers/non-followers (gstar.TRT & gstar.N) would be evaluated automatically by comparing the observed TRT/MONITOR to their corresponding counterfactual values/probabilities.

automated reports 3

How to draw Doubly robust adjusted Kaplan Meier curve and estimate log-rank test?

I inted to submit paper about observational study in urological field.
I think TMLE is double estimation. If that is right, I could not understand the meaning of the result from TMLE.
Please, tell me, the script for drawing the Doubly robust adjusted Kaplan Meier curve (if possible 95% CI) and estimate log-rank test.

Problem with CVTMLE

Hi, thanks a lot for this wonderful package. I wanted to run CVTMLE, and I am getting this error:
Error in UseMethod("predict_SL") :
no applicable method for 'predict_SL' applied to an object of class "c('Lrnr_sl', 'Lrnr_base', 'R6')"

I guess it is something to do with gridsl::predict_SL.
My function:
params <- gridisl::defModel(estimator = "speedglm__glm")
IPW <- getIPWeights(Odata, intervened_TRT = "Intervention_1", holdout = TRUE)
qwigh <- quantile(IPW[cum.IPAW>0,][["cum.IPAW"]], c(0.99))
gcomp_est <- fit_GCOMP(Odata, tvals = tvals,
TMLE = TRUE,
CVTMLE = TRUE,
Qforms = Qforms,
intervened_TRT = "Intervention_1"
models = params,
stratifyQ_by_rule = FALSE,
stratify_by_last = TRUE,
trunc_weights = qwigh,
fit_method = "cv",
byfold_Q = FALSE,
IPWeights = IPW)

Regards!

UACR/TI 90days - continuous N=1 - error

The code below leads to this error:

Error in runglmMSM(OData, wts_data, all_dummies, Ynode, verbose) :
trying to get slot "model" from an object (class "try-error") that is not an S4 object

Note that the function g.Nstatic10001 below implies that the monitoring intervention is 11111 when number.0.between.1s <- 0

# ESTIMATE SURVIVAL FOR ONE TREATMENT REGIMEN WITH STATIC (1,0,0,0,1,0,...) INTERVENTIONS ON MONITORING REGIME
# get the likelihood for the following static g^* N(t): 1,0,0,0,1,0,0,0,1,0
g.Nstatic10001 <- function(){
  function(OdataDT, gn.N = "g0.N", SHIFTED.OUTCOME = "outcome.tplus1", ID = "ID", MONITOR = "N", t = "t", ...){
    ID.expression <- as.name(ID)
    Odata_sel <- OdataDT[, c(ID, MONITOR, t, SHIFTED.OUTCOME, gn.N), with = FALSE]
    number.0.between.1s <- 0 ## every k weeks means k-1 0's between 1's and this shoudl be set to k-1
    N.star.t <- as.integer((Odata_sel[[t]]%%(number.0.between.1s+1))%in%number.0.between.1s) 
    Odata_sel[, g.N := as.numeric(as.integer(Odata_sel[[MONITOR]]) == N.star.t)] 
    return(Odata_sel[["g.N"]])
  }
}
# Define N(t) rule followers under static N.g.star: (1,0,1,0) and a column to the observed data.table in OData:
g.star.N.follow <- g.Nstatic10001()
OData$dat.sVar[, N.star.stat10001 := g.star.N.follow(OData$dat.sVar, gn.N = "g0.N", SHIFTED.OUTCOME = "outcome.tplus1", ID = "StudyID", MONITOR = "N", t = "intnum")]

#1. Obtain weighted data sets by rule:
wts.St.d7 <- getIPWeights(OData, gstar_TRT = "new.d7", gstar_MONITOR = "N.star.stat10001")
wts.St.d7.5 <- getIPWeights(OData, gstar_TRT = "new.d7.5", gstar_MONITOR = "N.star.stat10001")
wts.St.d8 <- getIPWeights(OData, gstar_TRT = "new.d8", gstar_MONITOR = "N.star.stat10001")
wts.St.d8.5 <- getIPWeights(OData, gstar_TRT = "new.d8.5", gstar_MONITOR = "N.star.stat10001")
wts.all <- list(d7 = wts.St.d7, d7.5 = wts.St.d7.5, d8 = wts.St.d8, d8.5 = wts.St.d8.5)
print(object.size(wts.all), units = "MB")
wts.all <- rbindlist(wts.all)
wts.all <- wts.all[!is.na(cumm.IPAW) & !is.na(outcome.tplus1) & (cumm.IPAW > 0), ]
print(object.size(wts.all), units = "MB")

data.table::setthreads(20) # will help load data into h2o
# MSM for hazard with regular weights:
t.breaks.byquarter <- c(1:9)-1 ## need to change or will get error due to bins with no event when computing IC - need to add as bug on github
MSM.IPAW <- survMSM(OData, wts_data = wts.all, t_breaks = t.breaks.byquarter, use_weights = TRUE, est_name = "IPAW", getSEs = getSEs)
# MSM for hazard with truncated weights:
MSM.trunc <- survMSM(OData, wts_data = wts.all, t_breaks = t.breaks.byquarter, use_weights = TRUE, trunc_weights = 20, est_name = "IPAWtrunc", getSEs = getSEs)
# crude MSM for hazard without any weights:
MSM.crude <- survMSM(OData, wts_data = wts.all, t_breaks = t.breaks.byquarter, use_weights = FALSE, est_name = "crude", getSEs = getSEs)
# save(list = c("MSM.IPAW", "MSM.trunc", "MSM.crude"), file = "./MSMs.Rdata")

bug in defineMONITORvars

    DT <- data.table(data, key = c("ID", "t"))

I think "ID" and "t" should be replaced with IDvar and tvar

Add p value for the test of a difference between areas under any two curves

automated reports 2

inconsistency between doc and implementation for getIPWeights

Documentation of getIPWeights states:

"The output is person-specific data with evaluated weights, ‘wts.DT’, only observation-times with non-zero weight are kept"

The returned object can actually contain person-time obs with cum.IPAW = 0

Better docs of the function output

The results tables provided by estimating functions are poorly documented.

This could be used as a start: https://github.com/osofr/stremr/#simulated-data-example

Bug in `QlearnModel` when ML-based regression fails

When doing Q learning, especially with stratification, a single algorithm might fail. In this case a fall back fitting either with glm or speedglm should be performed. For some reason this doesn't happen in this case (./tests/examples/2_building_blocks_example.R):

The following error is produced:

unable to run randomForest with h2o for: intercept only models or designmat with zero rows or  constant outcome (y) ...
Error in UseMethod("predictP1") : 
  no applicable method for 'predictP1' applied to an object of class "try-error"
In addition: Warning messages:

The code from ./tests/examples/2_building_blocks_example.R:

data(OdataNoCENS)
OdataDT <- as.data.table(OdataNoCENS, key=c(ID, t))
OdataDT[, ("N.tminus1") := shift(get("N"), n = 1L, type = "lag", fill = 1L), by = ID]
OdataDT[, ("TI.tminus1") := shift(get("TI"), n = 1L, type = "lag", fill = 1L), by = ID]
OdataDT[, ("TI.set1") := 1L]
OdataDT[, ("TI.set0") := 0L]
OData <- importData(OdataDT, ID = "ID", t = "t", covars = c("highA1c", "lastNat1", "N.tminus1"),
                    CENS = "C", TRT = "TI", MONITOR = "N", OUTCOME = "Y.tplus1")
gform_CENS <- "C ~ highA1c + lastNat1"
gform_TRT = "TI ~ CVD + highA1c + N.tminus1"
gform_MONITOR <- "N ~ 1"
stratify_CENS <- list(C=c("t < 16", "t == 16"))

require("h2o")
h2o::h2o.init(nthreads = -1)

params_TRT = list(fit.package = "h2o", fit.algorithm = "gbm", ntrees = 50,
    learn_rate = 0.05, sample_rate = 0.8, col_sample_rate = 0.8,
    balance_classes = TRUE)
params_CENS = list(fit.package = "speedglm", fit.algorithm = "glm")
params_MONITOR = list(fit.package = "speedglm", fit.algorithm = "glm")
OData <- fitPropensity(OData,
            gform_CENS = gform_CENS, stratify_CENS = stratify_CENS, params_CENS = params_CENS,
            gform_TRT = gform_TRT, params_TRT = params_TRT,
            gform_MONITOR = gform_MONITOR, params_MONITOR = params_MONITOR)

t.surv <- c(0:5)
Qforms <- rep.int("Q.kplus1 ~ CVD + highA1c + N + lastNat1 + TI + TI.tminus1", (max(t.surv)+1))
params_Q = list(fit.package = "h2o", fit.algorithm = "randomForest",
                ntrees = 100, learn_rate = 0.05, sample_rate = 0.8,
                col_sample_rate = 0.8, balance_classes = TRUE)
tmle_est <- fitTMLE(OData, t_periods = t.surv, intervened_TRT = "TI.set1",
            Qforms = Qforms, params_Q = params_Q,
            stratifyQ_by_rule = TRUE)

Defining counterfactual dynamic treatment nodes with multiple dummy exposure

Multivaraite_TRT__4_dummy_exposure_.docx

Install stremr Package Version: 0.8.99 and get Libraries


# ----------------------------------------------------------------------
# Instal stremr Version 0.8.99 Data
# ----------------------------------------------------------------------
#knitr::opts_chunk$set(echo = TRUE)

library(devtools)
#install_github("osofr/stremr", ref = "experimental_master")

# ----------------------------------------------------------------------
# Instal stremr Version 0.8.99 Data
# ----------------------------------------------------------------------
library(stremr)
library(data.table)
library(magrittr)
library(h2o)
options(stremr.verbose=TRUE)
sessionInfo()

Get Source Data from another Github Repository

library(repmis)

source_data("https://github.com/Soudi00/Multi-Treatment-Causal-Modeling/blob/master/sampleAD.RData?raw=True")

AD = as.data.table(AD, key= c(ID,SEQ))

#I have 4 different treatment , should I only use 3 of the dummies in the importData or I should include all of them?


# ----------------------------------------------------------------------
# Define intervention (always on TRT1):
# ----------------------------------------------------------------------
AD[, ("zero.set1") := 0L]
AD[, ("zero.set2") := 0L]
AD[, ("zero.set3") := 0L]
AD[, ("TRT1.set") := 1L]

# ----------------------------------------------------------------------
# Import Data in to stremr object
# ----------------------------------------------------------------------
OData.1  <-  importData(AD, ID = "ID", t_name = "SEQ", 
                        covars = c("CAT_VAR1","CAT_VAR2","CONT_VAR1"),           
                        CENS = c("CNS","ADM_CNS"), 
                        TRT = c("TRT1","TRT2","TRT3","TRT4"),
                        MONITOR = NULL, OUTCOME = "STATUS",
                        weights = NULL, remove_extra_rows = TRUE,
                        verbose = getOption("stremr.verbose"))

# ----------------------------------------------------------------------
# Look at the input data object
# ----------------------------------------------------------------------
print(OData.1)

# ----------------------------------------------------------------------
# Access the input data
# ----------------------------------------------------------------------
get_data(OData.1)

# ----------------------------------------------------------------------
# Model the Right Censroing and Adminstrative Censoring and Exposure
# ----------------------------------------------------------------------
gform_CENS <- "CNS + ADM_CNS ~ CAT_VAR1 + CONT_VAR1"
gform_TRT = "TRT1+TRT2+TRT3+TRT4 ~ CAT_VAR1 + CAT_VAR2 + CONT_VAR1"

# ----------------------------------------------------------------------
# Fit Propensity Scores
# ----------------------------------------------------------------------

OData.1 <- fitPropensity(OData.1, gform_CENS = gform_CENS,ngform_TRT = gform_TRT )

What should be the dimension of the intervened_TRT when we are using multiple dummy treatment

I have my own defined dynamic treatment patterns of interest (5 dummy variables for the 5 patterns). That is:

Always TRT1 (PATH1)
Always TRT2 (PATH2)
Always TRT3 (PATH3)
Start TRT1, switch at any time to TRT3 (PATH4)
Start TRT2, switch at any time to TRT3 (PATH5)

# ----------------------------------------------------------------------
#  Error: length(intervened_NODE) not equal to length(NodeNames)
# ----------------------------------------------------------------------

wts.DT.1 <- getIPWeights(OData = OData.1, intervened_TRT="PATH1")

# ----------------------------------------------------------------------
# Error in modelfit.g$getPsAsW.models()[[i]] : subscript out of bounds
# ----------------------------------------------------------------------

wts.DT.1 <- getIPWeights(OData = OData.1, intervened_TRT=c("TRT1.set","zero.set1","zero.set2","zero.set3"))

# ----------------------------------------------------------------------
# useing diffrent intervened_TRT didnt make a diffrence in the result
# ----------------------------------------------------------------------

wts.DT.1 <- getIPWeights(OData = OData.1, useonly_t_TRT="PATH1==1",rule_name ="Only TRT1")
wts.DT.1

wts.DT.2 <- getIPWeights(OData = OData.1, useonly_t_TRT="PATH2==1", rule_name = "Only TRT2")
wts.DT.2

older version of stremr and new version of sl3 are not compatible

:when calling fitPropensity:

Version 0.8.99 of stremr when the treatment is continuous give this error:
Error: 'Lrnr_condensier' is not an exported object from 'namespace:sl3'

When I install version 1 or 2, then even the bionary treatment does not work. All of the cases give me this error:
Error: outvar is not a character vector

my project is on hold until I can solve this problem. Your help is appreciated. Best Regards

column name of h2o table overlap in a report stored in pdf format

pdf_format_h2o_issue.docx

get_MSMRDs with a single t0

RDtables.t0.1.2year <- get_MSM_RDs(MSM.IPAW, c(3,7), getSEs = TRUE) ## BUG
RDtables.t0.1.year <- get_MSM_RDs(MSM.IPAW, c(3), getSEs = TRUE) ## BUG

Error in se.RDscale.Sdt.K[d1.idx, d2.idx] <- getSE.RD.d1.minus.d2(nID = nID, :
replacement has length zero

Note that when there is only a single t0 specified in the argument (value 3), there is an error while the routine runs without a problem with 2+ t0's.

Fit treatment propensity based only on initial time point

Hello,

I'm trying to conduct an intent-to-treat analysis and am having difficulty fitting the treatment propensity score using only t == 0 (baseline) and then using that estimated baseline treatment propensity across all time points with time-varying censoring.

It looks like when I stratify and only calculate it on t == 0, the treatment propensity score for all remaining time points (1 to 170) may be set to 1 by default? Any recs or hints on how to apply the t = 0 propensity score to all future time points? Censoring on the other hand I do want to be estimated at each time point (or in a pooled manner). Also just fyi at t == 0, 8.7% of the sample is treated.

This is using scrambled data from Romain and attempting to replicate Marcus, J. L., Neugebauer, R. S., Leyden, W. A., Chao, C. R., Xu, L., Quesenberry Jr, C. P., ... & Silverberg, M. J. (2016). Use of abacavir and risk of cardiovascular disease among HIV-infected individuals. JAIDS Journal of Acquired Immune Deficiency Syndromes, 71(4), 413-419.

Here is my rough code:

vars = list(
  id = "ID",
  time = "intnum",
  treatment = "exposure",
  censoring = "censor",
  outcome = "outcome",
  covars_time_varying =
    c("cd4", "I.cd4", "vl", "I.vl", "egfr.low", "I.egfr.low", "diabetes", "htn.meds",
      "llt.meds")
  # covars_baseline will be defined after factors_to_indicators conversion.
)

odata =
  importData(dt,
             ID = vars$id,
             t = vars$time,
             covars = c(vars$covars_baseline, vars$covars_time_varying),
             CENS = vars$censoring,
             TRT = vars$treatment,
             OUTCOME = vars$outcome)

# Try simpler propensity score specification to get a better distribution.
(gform_TRT = paste(vars$treatment, "~", paste(c("eversmoke", "everdrug", "sex_m", "art.naive"), collapse = " + ")))

# For ITT parameter only estimate treatment propensity at baseline.
# (uses rlang::list2 and !! so that we can substitute the treatment variable into the list name)
(stratify_TRT = list2(!!vars$treatment := paste0(vars$time, " == 0L")))

(gform_CENS = paste(vars$censoring, "~", paste(c(vars$covariates), collapse = " + ")))

odata = fitPropensity(odata,
                      gform_TRT = gform_TRT,
                      gform_CENS = gform_CENS,
                      stratify_TRT = stratify_TRT)

# Excessive right skewing: 1st quartile to max are all 1.0
# Min is 6.7%.
# Perhaps because we need to examine only t = 0?
summary(odata$g_preds$g0.A)
qplot(odata$g_preds$g0.A)

Thank you and happy to provide any more context if it would be helpful. Sorry if this is an easy fix.

automatic report sample

sequential randomization

The trial is a sequentially randomized trial with possible re-randomization based on previous response. At baseline, participants are randomized 1:1:1 into three arms, texting intervention, mobile app intervention, and standard of care. At 3 month increments participants in the text and app arms who do not "improve" their risk behaviors are eligible for re-randomization into a third intervention, e-coaching.

Because it's a randomized trial, at each time I know the probability of receiving each treatment given the past. I don't necessarily have to input the known probabilities, but I can't figure out how to properly stratify the fitPropensity command and can't digest what the error messages are actually telling me.

Attached is an example data table.

ID = partic. id
covariate = a binary baseline covariate
Y = the outcome at month 12
txt = on the texting intervention
app = on the app intervention
t = time (0 = baseline, 3, 6, 9, 12 months)
monitor = whether they are seen at t + 3
risk = risk behavior score, if risk at t+3 is bigger than or equal to risk at t (and risk at t != 0) and -
txt == 1 | app == 1, then they are eligible at t+3 for rerandomizaiton to e-coaching. Y is the sum of risk at 3, 6, 9, and 12.
trt = multinomial treatment indicator 1 = txt, 2 = app
Y_scale = Y scaled by maximum value of Y
trt.setXX = treatment levels for rules I'm interested in (no int, all txt int, all app int, txt + e-coaching when risk doesn't improve, app + e-coaching when risk doesn't improve)
eligible = whether or not someone is eligible for re-randomization at t
ecoach = on the e-coach intervention
coach.tminus1 = on e-coach intervention at previous time point
cens = censoring = 0 for everyone, since it seemed like providing monitoring status made stremr happier than censoring + missing values

Here's what I've tried:

gform_MONITOR <- "monitor ~ risk + covariate"
# would perhaps prefer this to be intercept only, but ~1 syntax seems to piss everything off
gform_TRT <- "trt ~ risk"
stratify_TRT <- list(trt = c(
  "t == 0", # baseline randomization prob.
  "t == 3 & eligible == 1", # at time 3 prob. of getting e-coaching amongst those eligible (should be 1/2)
  "t == 6 & eligible == 1", # at time 6 prob. of getting e-coaching amongst those eligible (should be 1/2)
  "t == 9 & eligible == 1", # at time 9, etc... 
  "t == 3 & txt == 1 & eligible == 0", # at time 3, txt folks should stay on txt with prob. 1 if not eligible for e-coach
  "t == 6 & txt == 1 & eligible == 0", # at time 6, txt folks should stay on txt with prob. 1 if not eligible for e-coach
  "t == 9 & txt == 1 & eligible == 0", # etc...
  "t == 3 & app == 1 & eligible == 0", # ditto app people who are not eligible
  "t == 6 & app == 1 & eligible == 0", # etc 
  "t == 9 & app == 1 & eligible == 0", # etc
  "t == 3 & app == 0 & txt == 0", # ditto standard of care people, i.e. trt == 0
  "t == 6 & app == 0 & txt == 0", # etc
  "t == 9 & app == 0 & txt == 0", # etc
  "t == 6 & ecoach.tminus1 == 1",  # once you get to e-coaching, you stay on e-coaching with prob. 1
  "t == 9 & ecoach.tminus1 == 1"))

I'm just wondering how to properly specify these regressions and whether there is a way to make them intercept only or not.

testDT.RData.zip

add discreteSL example to readme.md

Return a warning when g^*=NA

Sometimes the user may set the intervention node (g^*) to NA. If this value is actually being applied, then we should produce a warning, since the resulting weights will be also NA.

The best place to catch it is probably here:
https://github.com/osofr/stremr/blob/master/R/DeterministicBinaryOutcomeModel.R#L29-L44

    fit = function(overwrite = FALSE, data, ...) { # Move overwrite to a field? ... self$overwrite
      self$n <- data$nobs
      self$define.subset.idx(data)
      private$probA1 <- data$get.outvar(TRUE, self$gstar.Name)
      # private$.isNA.probA1 <- is.na(private$probA1)
      # self$subset_idx <- rep.int(TRUE, self$n)
      self$subset_idx <- seq_len(self$n)
      private$.outvar <- data$get.outvar(TRUE, self$getoutvarnm) # Always a vector of 0/1
      # private$.isNA.outvar <- is.na(private$.outvar)
      self$is.fitted <- TRUE
      # **********************************************************************
      # to save RAM space when doing many stacked regressions wipe out all internal data:
      # self$wipe.alldat
      # **********************************************************************
      invisible(self)
    },

Or here (private$probA1 is the g^*):
https://github.com/osofr/stremr/blob/master/R/DeterministicBinaryOutcomeModel.R#L53-L67

  predictAeqa = function(newdata, ...) { # P(A^s[i]=a^s|W^s=w^s) - calculating the likelihood for indA[i] (n vector of a`s)
      assert_that(self$is.fitted)
      if (missing(newdata)) {
        indA <- self$getoutvarval
      } else {
        indA <- newdata$get.outvar(self$getsubset, self$getoutvarnm) # Always a vector of 0/1
      }
      assert_that(is.integerish(indA)) # check that observed exposure is always a vector of integers
      probAeqa <- rep.int(1L, self$n) # for missing values, the likelihood is always set to P(A = a) = 1.
      # probA1 <- private$probA1[self$getsubset]
      probA1 <- private$probA1
      probAeqa[self$getsubset] <- probA1^(indA) * (1 - probA1)^(1L - indA)
      self$wipe.alldat # to save RAM space when doing many stacked regressions wipe out all internal data:
      return(probAeqa)
    },
`

Allowing data with no censoring and/or no monitoring and/or no exposure

When either one of the arguments CENS, TRT,MONITOR` is not specified, do not to intervene on it and do not attempt to fit the corresponding model for censoring, treatment or monitoring. Currently will return an error if either is not specified.

option : return_wts = TRUE dose not work for survNPMSM and only return results when used in directIPW

Please see the example from the documentation of the functions bellow:

the return_wts option dose not return value when survNPMSM function is called.

----------------------------------------------------------------------

Simulated Data

----------------------------------------------------------------------

data(OdataNoCENS)
OdataDT <- as.data.table(OdataNoCENS, key=c(ID, t))

define lagged N, first value is always 1 (always monitored at the first time point):

OdataDT[, ("N.tminus1") := shift(get("N"), n = 1L, type = "lag", fill = 1L), by = ID]
OdataDT[, ("TI.tminus1") := shift(get("TI"), n = 1L, type = "lag", fill = 1L), by = ID]

----------------------------------------------------------------------

Define intervention (always treated):

----------------------------------------------------------------------

OdataDT[, ("TI.set1") := 1L]
OdataDT[, ("TI.set0") := 0L]

----------------------------------------------------------------------

Import Data

----------------------------------------------------------------------

OData <- importData(OdataDT, ID = "ID", t = "t", covars = c("highA1c", "lastNat1", "N.tminus1"),
CENS = "C", TRT = "TI", MONITOR = "N", OUTCOME = "Y.tplus1")

----------------------------------------------------------------------

Look at the input data object

----------------------------------------------------------------------

print(OData)

----------------------------------------------------------------------

Access the input data

----------------------------------------------------------------------

get_data(OData)

----------------------------------------------------------------------

Model the Propensity Scores

----------------------------------------------------------------------

gform_CENS <- "C ~ highA1c + lastNat1"
gform_TRT = "TI ~ CVD + highA1c + N.tminus1"
gform_MONITOR <- "N ~ 1"
stratify_CENS <- list(C=c("t < 16", "t == 16"))

----------------------------------------------------------------------

Fit Propensity Scores

----------------------------------------------------------------------

OData <- fitPropensity(OData, gform_CENS = gform_CENS,
gform_TRT = gform_TRT,
gform_MONITOR = gform_MONITOR,
stratify_CENS = stratify_CENS)

----------------------------------------------------------------------

IPW Ajusted KM or Saturated MSM

----------------------------------------------------------------------

require("magrittr")
AKME.St.1 <- getIPWeights(OData, intervened_TRT = "TI.set1") %>%
survNPMSM(OData, return_wts = TRUE) %$%
estimates
AKME.St.1$wts_data

----------------------------------------------------------------------

Bounded IPW

----------------------------------------------------------------------

IPW.St.1 <- getIPWeights(OData, intervened_TRT = "TI.set1") %>%
directIPW(OData, return_wts = TRUE)
IPW.St.1$wts_data

add option for isotonic regression on cumulative risks estimates over time

See stats::isoreg(y = y, x = ...)

add xgboost and h2oEnsemble as learners

xgboost
ensemble

Add option for observation weights

Hello,

I mentioned this in person but thought I'd start a quick feature request issue to support observation weights. I believe Mark is on board with simply passing the weights through to the Q & g estimation as well as the fluctuation. For the variance of the IC he suggested that the weights be normalized to sum to 1 (possibly also applied to the Q & g estimation as well, not sure of the specifics).

Personally I could use this for a study where we have 30 million observations but only a few categorical covariates, so we can aggregate the replicated observations and incorporate observation weights to drastically speed up the superlearning & reduce memory usage.

Susan's tmle package doesn't support observation weights but ltmle does, so I'm using that right now. Mark suggested that observation weights would generally be important to support because they can be used to solve many different problems.

Thanks,
Chris

getIPWeights - need to add a check that user has specified PS model for all nodes on which we intervene

Say user specifies stratified PS models for A (resp. C or N) nodes such that the union of the person-time observations on A that fall in each stratum is not equal to the total number of person-time observations in the dataset.

Say that the user then asks stremr to compute an effect estimate for interventions on a subset of the A nodes (resp. C or N) such some elements of the subsets of person-time observations on A do not fall into any of the strata specified for gA.

Stremr should then return an error message to warn the user than they did not specify a sufficient number of strata for gA. Currently, stremr will return an output which will be incorrect.

Idea to fix this: add an argument check in getIPWeights that checks that the A (resp. C and N) person-time observations on which the user wish to intervene are all contained in the union of the person-time observations on A that fall within each stratum specified for gA. If not, stremr should give an error that asks the user to modify/add how they specified the strata for the PS model for gA.

Error when running example code

Seems some definitions are missing in stratify_ if when following your example code.

> OData <- fitPropensity(OData, gform_CENS = gform_CENS, gform_TRT = gform_TRT, gform_MONITOR = gform_MONITOR, stratify_CENS = stratify_CENS)
Error in create_subset_expr(outvars = res$outvars, stratify.EXPRS = stratify.EXPRS) : 
  Could not locate the appropriate regression variable(s) within the supplied stratification list stratify_CENS, stratify_TRT or stratify_MONITOR.
The regression outcome variable(s) specified in gform_CENS, gform_TRT or gform_MONITOR were: ( 'C,TI,N' )
However, the item names in the matching stratification list were: ( 'C' )

romainkp / stremr Goto Github PK

stremr's Issues

fitPropensity has error when the exposure is categorical with more than 2 levels (‘max’ not meaningful for factors). Please see the example bellow

Get Source Data from another Github repository

tried modeling treatment with Gradient Boosting machines same error

Install stremr Package Version: 0.8.99 and get Libraries

Get Source Data from another Github Repository

What should be the dimension of the intervened_TRT when we are using multiple dummy treatment

----------------------------------------------------------------------

Simulated Data

----------------------------------------------------------------------

define lagged N, first value is always 1 (always monitored at the first time point):

----------------------------------------------------------------------

Define intervention (always treated):

----------------------------------------------------------------------

----------------------------------------------------------------------

Import Data

----------------------------------------------------------------------

----------------------------------------------------------------------

Look at the input data object

----------------------------------------------------------------------

----------------------------------------------------------------------

Access the input data

----------------------------------------------------------------------

----------------------------------------------------------------------

Model the Propensity Scores

----------------------------------------------------------------------

----------------------------------------------------------------------

Fit Propensity Scores

----------------------------------------------------------------------

----------------------------------------------------------------------

IPW Ajusted KM or Saturated MSM

----------------------------------------------------------------------

----------------------------------------------------------------------

Bounded IPW

----------------------------------------------------------------------

Recommend Projects

Recommend Topics

Recommend Org

Jobs