GithubHelp home page GithubHelp logo

smartdata-analysis-and-statistics / precmed Goto Github PK

View Code? Open in Web Editor NEW
4.0 3.0 0.0 46.18 MB

A doubly robust precision medicine approach to estimate and validate conditional average treatment effects

Home Page: https://smartdata-analysis-and-statistics.github.io/precmed/

License: Apache License 2.0

R 99.67% TeX 0.33%
precision-medicine cran r r-package

precmed's Introduction

precmed

CRAN_Status_Badge metacran downloads DOI R-CMD-check

Overview

precmed was developed to help researchers with the implementation of precision medicine in R. A key objective of precision medicine is to determine the optimal treatment separately for each patient instead of applying a common treatment to all patients. Personalizing treatment decisions becomes particularly relevant when treatment response differs across patients, or when patients have different preferences about benefits and harms. This package offers statistical methods to develop and validate prediction models for estimating individualized treatment effects. These treatment effects are also known as the conditional average treatment effects (CATEs) and describe how different subgroups of patients respond to the same treatment. Presently, precmed focuses on the personalization of two competitive treatments using randomized data from a clinical trial (Zhao et al. 2013) or using real-world data (RWD) from a non-randomized study (Yadlowsky et al. 2020).

Installation

The precmed package can be installed from CRAN as follows:

install.packages("precmed")

The latest version can be installed from GitHub as follows:

install.packages("devtools")
devtools::install_github(repo = "smartdata-analysis-and-statistics/precmed")

Package capabilities

The main functions in the precmed package are:

Function Description
catefit() Estimation of the conditional average treatment effect (CATE)
atefit() Doubly robust estimator for the average treatment effect (ATE)
catecv() Development and cross-validation of the CATE
abc() Compute the area between the average treatment difference curve of competing models for the CATE (Zhao et al. 2013)
plot() Two side-by-side line plots of validation curves from the precmed object
boxplot() Plot the proportion of subjects with an estimated treatment effect no less than $c$ over a range of values for $c$ (Zhao et al. 2013).

For more info: https://smartdata-analysis-and-statistics.github.io/precmed/

Recommended workflow

We recommend the following workflow to develop a model for estimating the CATE in order to identify treatment effect heterogeneity:

  1. Compare up to five modelling approaches (e.g., Poisson regression, boosting) for estimating the CATE using cross-validation through catecv.
  2. Select the best modelling approach using 3 metrics:
    • Compare the steepness of the validation curves in the validation samples across methods using plot(). Two side-by-side plots are generated, visualizing the estimated average treatment effects in a series of nested subgroups. On the left side the curve is shown for the training set, and on the right side the curve is shown for the validation set. Each line in the plots represents one scoring method (e.g., boosting, randomForest) specified under the argument score.method.
    • The area between curves (ABC) using abc quantifies a model’s ability to capture treatment effect heterogeneity. Higher ABC values are preferable as they indicate that more treatment effect heterogeneity is captured by the scoring method.
    • Compare the distribution of the estimated ATE across different levels of the CATE score percentiles using boxplot().
  3. Apply the best modelling approach in the original data or in a new external dataset using catefit().
  4. Optional. Use atefit() to estimate ATE between 2 treatment groups with a doubly robust estimator and estimate the variability of the ATE with a bootstrap approach.

In the vignettes, we will adopt a different workflow to gradually expose the user from simple to more complex methods.

User input

When applying catefit() or catecv(), the user has to (at least) input:

  • response: type of outcome/response (either count or survival)
  • data: a data frame with individual patient data
  • score.method: methods to estimate the CATE (e.g., boosting, poisson, twoReg, contrastReg)
  • cate.model: a formula describing the outcome model (e.g., outcome ~ age + gender + previous_treatment)
  • ps.model: a formula describing the propensity score model to adjust for confounding (e.g., treatment ~ age + previous_treatment)

Vignettes

  1. Examples with count outcome of the entire workflow
  2. Examples with survival outcome of the entire workflow
  3. Additional examples for the precmed package
  4. Theoretical details

References

Yadlowsky, Steve, Fabio Pellegrini, Federica Lionetto, Stefan Braune, and Lu Tian. 2020. “Estimation and Validation of Ratio-Based Conditional Average Treatment Effects Using Observational Data.” Journal of the American Statistical Association, 1–18.

Zhao, Lihui, Lu Tian, Tianxi Cai, Brian Claggett, and Lee-Jen Wei. 2013. “Effectively Selecting a Target Population for a Future Comparative Study.” Journal of the American Statistical Association 108 (502): 527–39. https://doi.org/10.1080/01621459.2013.770705.

precmed's People

Contributors

jcaldasmagalhaes avatar nightlordtw avatar stanwijn avatar tdebray123 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

precmed's Issues

Extensive vignettes and CRAN

The vignettes take a lot of time to build, and thereby affects the duration of R package checks. We may have to disable checking of vignettes, or even exclude them from CRAN submission.
@jcaldasmagalhaes Could you please check if CRAN has any limitations on vignettes (e.g., in terms of build time, or amount of information presented)
@StanWijn Could you please check how we can disable the checking of vignettes in RStudio?

Warning in plotsurv

 > # Try:
   > plotsurv(x = output_cv2, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup")
   Error in plotsurv(x = output_cv2, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup") : 
     could not find function "plotsurv"
   Execution halted

Possibly, the function plotsurv is not exported in the NAMESPACE, this would need to be fixed with the roxxygen documentation

Add refs to DESCRIPTION

Request Beni Altmann (CRAN):

If there are references describing the methods in your package, please add these in the description field of your DESCRIPTION file in the form authors (year) doi:... authors (year) arXiv:... authors (year, ISBN:...) or if those are not available: https:... with no space after 'doi:', 'arXiv:', 'https:' and angle brackets for auto-linking. (If you want to add a title as well please put it in
quotes: "Title")

Change default value for verbose

The argument verbose is an integer value indicating whether intermediate progress messages and histograms should be printed. Possible values differ between methods, and include

  • 2 progress bar, run time, and all errors and warnings
  • 1 indicates messages are printed and
  • 0 no output is printed

Lets set default versose to 0 for all methods?

More things to resolve suggested by Stan

script line Error Resolve
outputs_count.R line 263 # TODO: now score.method has no default (mandatory argument) Set to “poisson”?
CATE_count.R 253 & 333 # TODO: if model has a single predictor, GBM must have cv.folds = 0 zoonproject/zoon#130
outputs_continuous.R 287 ##TODO: Insert n.trees.rf
570 # TODO: is it right?
1032 # TODO: Phoebe I think we removed the arg.check on interactions but here it is?
1043 #TODO: Needs to be specified
1088 #TODO: modify this..can we use chisq?
Plots.R 635 # TODO: Temporary solutions as cbind not working when cv.n = 1

Add function to generate example data

The example dataset has a fixed size; we may want to use the helper function instead. Also, we currently have negative ages; maybe better use a Gamma distribution to avoid this.

Revise Count-examples

It would be helpful to restructure the example section as follows:

  1. Introduction of the example (please add a baseline table)
  2. Estimation of avarage treatment effect (atefit) => introduce concept of confounding adjustment (formula PS with logistic regression), prognostic prediction (formula for y; use of Poisson regression; and some explanaition why we are adjusting for prognostic covariates when estimating the ATE), need for bootstrapping when estimating standard error.
  3. Estimation of individual treatment effect (catefit) . Explain what kind of ITE model is being estimated; how do we allow for treatment-covariate interaction?
  4. Internal validation (catecv): Why do we need it? What can we compare? What is the output?

Source: https://github.com/smartdata-analysis-and-statistics/precmed/blob/main/vignettes/Count-examples.Rmd

OOB warning from GBM function when model has single predictor

In eb93cb2 I fixed the error when using "boosting" and when the model has a single predictor by changing cv = 0. This resulted in an error in the gbm.perf function (line 346 & 351 of CATE_count.R) because the default method was "cv" which required cv > 1. Therefore, in case of a single predictor, i changed the method for gbm.perf to "OOB" .

Although this fixed the error, it will now repeat the same warning 10 times:

OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.

I tried to wrap the gbm.perf in "tryCatch" or withCallingHandlers, but this does not seem to resolve the issue so I assume the warnings comes from another function that uses the object generated by gbm.perf?

I don't think this is a mayor issue (as this only occurs when "boosting" and a single predictor), but maybe fix this in the future.

Missing documentation

> checking for missing documentation entries ... WARNING
  Undocumented code objects:
    'meanExample'
  Undocumented data sets:
    'meanExample'
  All user-level objects in a package should have documentation entries.
  See chapter 'Writing R documentation files' in the 'Writing R
  Extensions' manual.

Update license

Gabrielle received approval to publish precmed as open source under the Apache 2.0 license. Biogen should be the copyright holder. Please update the information files accordingly

Interpretation of output survival

It is not entirely clear what the groups "high", "low" and "exclusive" are referring to when adopting survival models; need to clarify this in the vignettes

init.model for continuous endpoints

For continuous outcomes, the user need to specify init.model. Previously, this variable was initialized to a non-specified global variable as follows: init.model = init.model. Since init.model is not defined anywhere in precmed, I have set it equal to NULL by default. However, this issue still needs to be fixed.

#### PRE-PROCESSING ####
  out <- data.preproc.mean(fun = "pm", cate.model = cate.model, init.model = init.model, ps.model = ps.model,
                           score.method = score.method, data = data, prop.cutoff = prop.cutoff, ps.method = ps.method)

examples of pmmean crash

Error in intxmean(y = y, trt = trt, x.cate = x.cate, x.init = x.init, :
object 'x.init' not found
In addition: Warning message:
In data.preproc.mean(fun = "pm", cate.model = cate.model, init.model = init.model, :

Error in intxmean(y = y, trt = trt, x.cate = x.cate, x.init = x.init, :
object 'x.init' not found

Check argument order

After we restructured the function arguments to prioritize mandatory over optional arguments (moved to the front), there is a mismatch between the order in the function and the help file & examples.

I have already restructured the outputs_main() file, but we need to check all the other files to make sure consistancy with the function, helpfiles, examples and vignettes.

Note: the vignettes all match the new structure already

what.ratio variable in plot: correct name?

plot.R at line 215:

 # Define y-axis if default is NULL
  if (is.null(ylab)){
    if (x$response == "count"){
      what.ratio <- "Rate ratio"
    } else if (x$response == "survival"){
      if (plot.hr == TRUE){
        what.ratio <- "Hazard ratio"
      } else {
        what.ratio <- "RMTL ratio"
      }
    } else if (x$response == "continuous"){
      ##TODO: what.ratio? correct name?
      what.ratio <- "Mean difference"
    }

Suggestion: rename what.ratio to plot.ratio

Need for generic functions

It seems that generic functions are needed to manipulate objects generated by PrecMed; for instance in Precision-Medicine-R-package-Count-outcome-functions.html there are calls to output_pm$coefficients to access model coefficients. A function coef() would be more desirable. This woudl require to group the output of certain functions as a dedicated object (rather than a list), and to define new functions that can interact with these objects (and which would need to be exported in the NAMESPACE)

Verification of examples

Examples: check which examples take too much time (the examples are listed in the documentation above each method, and should be introduced by Roxygen using the @example tag). You can test the examples manually by loading all the R files in your memory and running the example code in RStudio

Warning in example (data.preproc.surv)

   > # Survival outcome
   > tau0 <- with(survivalExample,
   +              min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95)))
   > 
   > output_cv2 <- cv(response = "survival",
   +                  cate.model = survival::Surv(y, d) ~ age + female
   +                                                          + previous_cost + previous_number_relapses,
   +                  ps.model = trt ~ age + previous_treatment,
   +                  ipcw.model = ~ age + previous_cost + previous_treatment,
   +                  data = survivalExample,
   +                  score.method = c("poisson", "randomForest"),
   +                  followup.time = NULL,
   +                  tau0 = tau0,
   +                  surv.min = 0.025,
   +                  higher.y = TRUE,
   +                  cv.n = 5,
   +                  initial.predictor.method = "randomForest",
   +                  plot.gbmperf = FALSE,
   +                  seed = 999)
   Warning in data.preproc.surv(fun = "cv", cate.model = cate.model, ps.model = ps.model,  :
     Variable trt was recoded to 0/1 with drug0->0 and drug1->1.
   

Object-oriented programming

The main output of the R package is currently a list. I would change the attribute type of the list to an object (e.g. "precmed" object) for which we will later create various generic functions

Interpretation of plots

It would be helpful to clarify how the different plots should be interpreted. Perhaps best to add this explanaition in vignette 2 and vignette 3 introducing the count & survival example

score.method has no default (mandatory argument)

At line 265 of outputs_count.R:

# TODO: now score.method has no default (mandatory argument)

In the output_count the score.method has no default. Lets resolve the TODO.
Should we set this score.method to = "poisson" similar to the example? Or leave it without default and remove the TODO?

also need to resolve at pmcount (line 738 of outputs_count.R) and catefitmean (line 812 of outputs_continuous.R)

Check use of wrapper functions

wrappers are already there (e.g., cv function), we should check if they are properly defined and consistently used in the examples.

Usually, a wrapper function would be very small, and forward function calls to the dedicated functions. An example is given below:

uvmeta <- function(r, r.se, r.vi, method="REML", test="knha", labels, na.action, 
                   n.chains=4, pars, verbose=FALSE, ...) 
  UseMethod("uvmeta")

We can specify in the cv function which outcome is being modelled, and then use a switch statement to forward the function call the the relevant cv subfunction

Errors and warnings of main functions

Two main errors/warnings still occur. Check with BioGen if we should resolve these.

1 ) Error in cv_surv() when estimating ATE in nested subgroups using "poisson", "randomForest".

Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
return NAs in the corresponding subgroup.
Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest"

The consecutive functions do not seem affected by these errors (plot, abc, outcomes). Is this really an error or can we reduce or resolve this error in any way?

Example:

library(precmed)
tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95)))
output_cv2 <- cv(response = "survival",
                 cate.model = survival::Surv(y, d) ~ age +
                                                     female +
                                                     previous_cost +
                                                     previous_number_relapses,
                 ps.model = trt ~ age + previous_treatment,
                 ipcw.model = ~ age + previous_cost + previous_treatment,
                 data = survivalExample,
                 score.method = c("poisson", "randomForest"),
                 followup.time = NULL,
                 tau0 = tau0,
                 surv.min = 0.025,
                 higher.y = TRUE,
                 cv.n = 5,
                 initial.predictor.method = "randomForest",
                 plot.gbmperf = FALSE,
                 seed = 999)

Output:
  |                                                                                                            |   0%
cv = 1 
  splitting the data..
  training..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
  validating..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
   Wed Aug 31 17:01:04 2022 
  |======================                                                                                      |  20%
cv = 2 
  splitting the data..
  training..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
  validating..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
   Wed Aug 31 17:01:23 2022 
  |===========================================                                                                 |  40%
cv = 3 
  splitting the data..
  training..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
  validating..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
   Wed Aug 31 17:01:42 2022 
  |=================================================================                                           |  60%
cv = 4 
  splitting the data..
  training..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
  validating..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
   Wed Aug 31 17:02:01 2022 
  |======================================================================================                      |  80%
cv = 5 
  splitting the data..
  training..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
  validating..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
   Wed Aug 31 17:02:21 2022 
  |============================================================================================================| 100%


  1. Warnings regarding the conversion of drug0 and drug1 to 0/1

data.preproc will convert the variable trt to 0/1 from drug0 and drug1. This is probably done with interpretation in mind
(on line 314 of utility_count.R this check / conversion is performed). Do we want to show this warning in the examples?

Example:

 output_cv <- cv(response = "count",
                 cate.model = y ~ age + female + previous_treatment +
                                      previous_cost + previous_number_relapses + offset(log(years)),
                 ps.model = trt ~ age + previous_treatment,
                 data = countExample,
                 higher.y = FALSE,
                 score.method = "poisson",
                 cv.n = 5,
                 plot.gbmperf = FALSE,
                 seed = 999)

Output:
cv = 1 
  splitting the data..
  training..
  validating..

cv = 2 
  splitting the data..
  training..
  validating..

cv = 3 
  splitting the data..
  training..
  validating..

cv = 4 
  splitting the data..
  training..
  validating..

cv = 5 
  splitting the data..
  training..
  validating..

Total runtime : 13.28 secs 
Warning message:
In data.preproc(fun = "cv", cate.model = cate.model, ps.model = ps.model,  :
  Variable trt was recoded to 0/1 with drug0->0 and drug1->1.

Reduce title length

Request Beni Altmann (CRAN): Please reduce the length of the title to less than 65 characters.

Rd \usage sections warnings

> checking Rd \usage sections ... WARNING
  Undocumented arguments in documentation object 'cvmean'
    'error.maxNR' 'max.iterNR' 'tune'
  
  Undocumented arguments in documentation object 'intxmean'
    'n.trees.rf'
  
  Documented arguments not in \usage in documentation object 'pmcount':
    'prop.multi'
  
  Undocumented arguments in documentation object 'pmmean'
    'n.trees.rf' 'error.maxNR' 'max.iterNR' 'tune'
  
  Documented arguments not in \usage in documentation object 'pmsurv':
    'prop.multi'
  
  Functions with \usage entries need to have the appropriate \alias
  entries, and all their arguments documented.
  The \usage entries must correspond to syntactically valid R code.
  See chapter 'Writing R documentation files' in the 'Writing R
  Extensions' manual.

Use of donttest isntead of dontrun

Request CRAN:

\dontrun{} should only be used if the example really cannot be executed (e.g. because of missing additional software, missing API keys, ...) by the user. That's why wrapping examples in \dontrun{} adds the comment ("# Not run:") as a warning for the user. Does not seem necessary.
Please replace \dontrun with \donttest.

Please unwrap the examples if they are executable in < 5 sec, or replace dontrun{} with \donttest{}.

R code problems

> checking R code for possible problems ... NOTE
  scoremean: warning in suppressMessages({: partial argument match of
    'class' to 'classes'
  scoremean: warning in predict0 <- predict(object = fit0.boosting,
    newdata = datanew, n.trees = best0.iter): partial argument match of
    'class' to 'classes'
  scoremean: warning in predict1 <- predict(object = fit1.boosting,
    newdata = datanew, n.trees = best1.iter): partial argument match of
    'class' to 'classes'
  scoremean: warning in }, class = "message"): partial argument match of
    'class' to 'classes'
  drmean.inference: no visible binding for global variable 'init.model'
  ipcw.surv: no visible global function definition for 'pnorm'
  onearmglmmean.dr: no visible global function definition for 'lm'
  pm: no visible binding for global variable 'verbose'
  twoarmglmmean.dr: no visible global function definition for 'lm'
  Undefined global functions or variables:
    init.model lm pnorm verbose
  Consider adding
    importFrom("stats", "lm", "pnorm")
  to your NAMESPACE file.

Warning in cross-validation (estimation of ATE using "randomForest"

 > output_cv2 <- cv(response = "survival",
   +                  cate.model = survival::Surv(y, d) ~ age + female
   +                                                          + previous_cost + previous_number_relapses,
   +                  ps.model = trt ~ age + previous_treatment,
   +                  ipcw.model = ~ age + previous_cost + previous_treatment,
   +                  data = survivalExample,
   +                  score.method = c("poisson", "randomForest"),
   +                  followup.time = NULL,
   +                  tau0 = tau0,
   +                  surv.min = 0.025,
   +                  higher.y = TRUE,
   +                  cv.n = 5,
   +                  initial.predictor.method = "randomForest",
   +                  plot.gbmperf = FALSE,
   +                  seed = 999)
   Warning in data.preproc.surv(fun = "cv", cate.model = cate.model, ps.model = ps.model,  :
     Variable trt was recoded to 0/1 with drug0->0 and drug1->1.
   
   
     |                                                                            
     |                                                                      |   0%
   cv = 1 
     splitting the data..
     training..
       Error(s) occurred when estimating the ATEs in the nested subgroup using "randomForest";
       return NAs in the corresponding subgroup. 
   Warning in cvsurv(cate.model = cate.model, ps.model = ps.model, data = data,  :
     Error(s) occurred when estimating the ATEs in the nested subgroup in the training set using "randomForest" in cross-validation iteration 1. NAs are returned for RMTL ratio and HR in the corresponding subgroup; see 'errors/warnings'.
       Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
   Warning in cvsurv(cate.model = cate.model, ps.model = ps.model, data = data,  :
     Warning(s) occurred when estimating the ATEs in the nested subgroup in the training set using "poisson", "randomForest" in cross-validation iteration 1; see 'errors/warnings'.
     validating..
       Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
       return NAs in the corresponding subgroup. 
   Warning in cvsurv(cate.model = cate.model, ps.model = ps.model, data = data,  :
     Error(s) occurred when estimating the ATEs in the nested subgroup in the validation set using "poisson", "randomForest" in cross-validation iteration 1. NAs are returned for RMTL ratio and HR in the corresponding subgroup; see 'errors/warnings'.
       Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
   Warning in cvsurv(cate.model = cate.model, ps.model = ps.model, data = data,  :
     Warning(s) occurred when estimating the ATEs in the nested subgroup in the validation set using "poisson", "randomForest" in cross-validation iteration 1; see 'errors/warnings'.

Add return documentation

Request Beni Altmann (CRAN):

Please add \value to .Rd files regarding exported methods and explain the functions results in the documentation. Please write about the structure of the output (class) and also what the output means. (If a function does not return a value, please document that too, e.g.
\value{No return value, called for side effects} or similar) Missing Rd-tags:
plot.atefit.Rd: \value
print.atefit.Rd: \value
print.catefit.Rd: \value

Change default values for initial.predictor.method

Functions that have initial.predictor.method as argument currently use default values that are rather time consuming (e.g., boosting, or randomForest). Lets replace them by faster alternatives:

  • For survival outcomes, lets set initial.predictor.method = "logistic" by default
  • For count outcomes, let set initial.predictor.method = "poisson" by default
  • For continuous outcomes, lets set initial.predictor.method = "gaussian" by default

Release precmed 1.0

First release:

Prepare for release:

  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('major')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • usethis::use_news_md()
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Revise Survival-examples

It would be helpful to restructure the example section as follows:

  1. Introduction of the example (please add a baseline table)
  2. Estimation of avarage treatment effect (atefit) => introduce concept of confounding adjustment (formula PS with logistic regression), prognostic prediction (formula for y; use of Poisson regression; and some explanaition why we are adjusting for prognostic covariates when estimating the ATE), need for bootstrapping when estimating standard error.
  3. Estimation of individual treatment effect (catefit) . Explain what kind of ITE model is being estimated; how do we allow for treatment-covariate interaction?
  4. Internal validation (catecv): Why do we need it? What can we compare? What is the output?

Source: https://github.com/smartdata-analysis-and-statistics/precmed/blob/main/vignettes/Survival-examples.Rmd

Random Forest arguments

Currently all random forest arguments are listed individually; we could use (...) to pass on RF arguments automatically. I dont think we need to specify them in the function call

Use of unstated dependencies

> checking for unstated dependencies in vignettes ... NOTE
  '::' or ':::' import not declared from: 'htmltools'

Warning in example (data.preproc)

v  checking files in 'vignettes' ... 
E  checking examples (1m 40s)
   Running examples in 'PrecMed-Ex.R' failed
   The error most likely occurred in:
   
   > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
   > ### Name: cv
   > ### Title: Cross-validation of the conditional average treatment effect
   > ###   (CATE) score for count, survival or continuous outcomes
   > ### Aliases: cv
   > 
   > ### ** Examples
   > 
   > # Count outcome
   > output_cv <- cv(response = "count",
   +                 cate.model = y ~ age + female + previous_treatment +
   +                                      previous_cost + previous_number_relapses + offset(log(years)),
   +                 ps.model = trt ~ age + previous_treatment,
   +                 data = countExample,
   +                 higher.y = FALSE,
   +                 score.method = "poisson",
   +                 cv.n = 5,
   +                 plot.gbmperf = FALSE,
   +                 seed = 999)
   Warning in data.preproc(fun = "cv", cate.model = cate.model, ps.model = ps.model,  :
     Variable trt was recoded to 0/1 with drug0->0 and drug1->1.
   

Rd line widths

The issues below should be fixed in the roxxygen documentation in the R files

N  checking Rd line widths ... 
   Rd file 'abc.Rd':
     \examples lines wider than 100 characters:
        cv.mean <- cvmean(cate.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
                          init.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
   
   Rd file 'boxplot.PrecMed.Rd':
     \examples lines wider than 100 characters:
        cv.mean <- cvmean(cate.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
                          init.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
   
   Rd file 'cv.Rd':
     \examples lines wider than 100 characters:
                        cate.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
                        init.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
   
   Rd file 'dr.inference.Rd':
     \examples lines wider than 100 characters:
                               cate.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
   
   Rd file 'plot.PrecMed.Rd':
     \examples lines wider than 100 characters:
        cv.mean <- cvmean(cate.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
                          init.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
   
   Rd file 'pm.Rd':
     \examples lines wider than 100 characters:
                        cate.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
                        init.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
   
   Rd file 'pmsurv.Rd':
     \examples lines wider than 100 characters:
        pm <- pmsurv(cate.model = survival::Surv(y, d) ~ age + female + previous_cost + previous_number_relapses,
   
   These lines will be truncated in the PDF manual.

Dictionary of arguments

Please make a dictionary of arguments used in the public functions (to make sure it is consistent). No need to include functions that are not intented to be used directly by exrenal users (and are not expored in the NAMESPACE)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.