smartdata-analysis-and-statistics / precmed Goto Github PK

View Code? Open in Web Editor NEW

4.0 3.0 0.0 46.18 MB

A doubly robust precision medicine approach to estimate and validate conditional average treatment effects

Home Page: https://smartdata-analysis-and-statistics.github.io/precmed/

License: Apache License 2.0

R 99.67% TeX 0.33%

precision-medicine cran r r-package

precmed's Introduction

precmed

Overview

precmed was developed to help researchers with the implementation of precision medicine in R. A key objective of precision medicine is to determine the optimal treatment separately for each patient instead of applying a common treatment to all patients. Personalizing treatment decisions becomes particularly relevant when treatment response differs across patients, or when patients have different preferences about benefits and harms. This package offers statistical methods to develop and validate prediction models for estimating individualized treatment effects. These treatment effects are also known as the conditional average treatment effects (CATEs) and describe how different subgroups of patients respond to the same treatment. Presently, precmed focuses on the personalization of two competitive treatments using randomized data from a clinical trial (Zhao et al. 2013) or using real-world data (RWD) from a non-randomized study (Yadlowsky et al. 2020).

Installation

The precmed package can be installed from CRAN as follows:

install.packages("precmed")

The latest version can be installed from GitHub as follows:

install.packages("devtools")
devtools::install_github(repo = "smartdata-analysis-and-statistics/precmed")

Package capabilities

The main functions in the precmed package are:

Function	Description
catefit()	Estimation of the conditional average treatment effect (CATE)
atefit()	Doubly robust estimator for the average treatment effect (ATE)
catecv()	Development and cross-validation of the CATE
abc()	Compute the area between the average treatment difference curve of competing models for the CATE (Zhao et al. 2013)
plot()	Two side-by-side line plots of validation curves from the `precmed` object
boxplot()	Plot the proportion of subjects with an estimated treatment effect no less than $c$ over a range of values for $c$ (Zhao et al. 2013).

For more info: https://smartdata-analysis-and-statistics.github.io/precmed/

Recommended workflow

We recommend the following workflow to develop a model for estimating the CATE in order to identify treatment effect heterogeneity:

Compare up to five modelling approaches (e.g., Poisson regression, boosting) for estimating the CATE using cross-validation through catecv.
Select the best modelling approach using 3 metrics:
- Compare the steepness of the validation curves in the validation samples across methods using plot(). Two side-by-side plots are generated, visualizing the estimated average treatment effects in a series of nested subgroups. On the left side the curve is shown for the training set, and on the right side the curve is shown for the validation set. Each line in the plots represents one scoring method (e.g., boosting, randomForest) specified under the argument score.method.
- The area between curves (ABC) using abc quantifies a model’s ability to capture treatment effect heterogeneity. Higher ABC values are preferable as they indicate that more treatment effect heterogeneity is captured by the scoring method.
- Compare the distribution of the estimated ATE across different levels of the CATE score percentiles using boxplot().
Apply the best modelling approach in the original data or in a new external dataset using catefit().
Optional. Use atefit() to estimate ATE between 2 treatment groups with a doubly robust estimator and estimate the variability of the ATE with a bootstrap approach.

In the vignettes, we will adopt a different workflow to gradually expose the user from simple to more complex methods.

User input

When applying catefit() or catecv(), the user has to (at least) input:

response: type of outcome/response (either count or survival)
data: a data frame with individual patient data
score.method: methods to estimate the CATE (e.g., boosting, poisson, twoReg, contrastReg)
cate.model: a formula describing the outcome model (e.g., outcome ~ age + gender + previous_treatment)
ps.model: a formula describing the propensity score model to adjust for confounding (e.g., treatment ~ age + previous_treatment)

Vignettes

References

Yadlowsky, Steve, Fabio Pellegrini, Federica Lionetto, Stefan Braune, and Lu Tian. 2020. “Estimation and Validation of Ratio-Based Conditional Average Treatment Effects Using Observational Data.” Journal of the American Statistical Association, 1–18.

Zhao, Lihui, Lu Tian, Tianxi Cai, Brian Claggett, and Lee-Jen Wei. 2013. “Effectively Selecting a Target Population for a Future Comparative Study.” Journal of the American Statistical Association 108 (502): 527–39. https://doi.org/10.1080/01621459.2013.770705.

precmed's People

Contributors

Stargazers

Watchers

precmed's Issues

check verbose consistency

check all output functions to see if the verbose (0, 1, or 2) is consistent

change name in vignettes (PrecMed to precmed)

According to CRAN guidelines, Stan changed the name of the package from PrecMed to precmed, but this change still need to be reflected in the vignettes.

Extensive vignettes and CRAN

The vignettes take a lot of time to build, and thereby affects the duration of R package checks. We may have to disable checking of vignettes, or even exclude them from CRAN submission.
@jcaldasmagalhaes Could you please check if CRAN has any limitations on vignettes (e.g., in terms of build time, or amount of information presented)
@StanWijn Could you please check how we can disable the checking of vignettes in RStudio?

Warning in plotsurv

 > # Try:
   > plotsurv(x = output_cv2, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup")
   Error in plotsurv(x = output_cv2, ylab = "RMTL ratio of drug1 vs drug0 in each subgroup") : 
     could not find function "plotsurv"
   Execution halted

Possibly, the function plotsurv is not exported in the NAMESPACE, this would need to be fixed with the roxxygen documentation

Use of non-standard directories

> checking top-level files ... NOTE
  Non-standard file/directory found at top level:
    'data-raw'

@jcaldasmagalhaes Please check what is in this folder and where the corresponding data are used.

Add refs to DESCRIPTION

Request Beni Altmann (CRAN):

If there are references describing the methods in your package, please add these in the description field of your DESCRIPTION file in the form authors (year) doi:... authors (year) arXiv:... authors (year, ISBN:...) or if those are not available: https:... with no space after 'doi:', 'arXiv:', 'https:' and angle brackets for auto-linking. (If you want to add a title as well please put it in
quotes: "Title")

Change default value for verbose

The argument verbose is an integer value indicating whether intermediate progress messages and histograms should be printed. Possible values differ between methods, and include

2 progress bar, run time, and all errors and warnings
1 indicates messages are printed and
0 no output is printed

Lets set default versose to 0 for all methods?

review get started

https://smartdata-analysis-and-statistics.github.io/precmed/articles/precmed.html

Remove function description from vignettes

More things to resolve suggested by Stan

script	line	Error	Resolve
outputs_count.R	line 263	# TODO: now score.method has no default (mandatory argument)	Set to “poisson”?
CATE_count.R	253 & 333	# TODO: if model has a single predictor, GBM must have cv.folds = 0 zoonproject/zoon#130
outputs_continuous.R	287	##TODO: Insert n.trees.rf
	570	# TODO: is it right?
	1032	# TODO: Phoebe I think we removed the arg.check on interactions but here it is?
	1043	#TODO: Needs to be specified
	1088	#TODO: modify this..can we use chisq?
Plots.R	635	# TODO: Temporary solutions as cbind not working when cv.n = 1

Add function to generate example data

The example dataset has a fixed size; we may want to use the helper function instead. Also, we currently have negative ages; maybe better use a Gamma distribution to avoid this.

Revise Count-examples

It would be helpful to restructure the example section as follows:

Introduction of the example (please add a baseline table)
Estimation of avarage treatment effect (atefit) => introduce concept of confounding adjustment (formula PS with logistic regression), prognostic prediction (formula for y; use of Poisson regression; and some explanaition why we are adjusting for prognostic covariates when estimating the ATE), need for bootstrapping when estimating standard error.
Estimation of individual treatment effect (catefit) . Explain what kind of ITE model is being estimated; how do we allow for treatment-covariate interaction?
Internal validation (catecv): Why do we need it? What can we compare? What is the output?

Source: https://github.com/smartdata-analysis-and-statistics/precmed/blob/main/vignettes/Count-examples.Rmd

OOB warning from GBM function when model has single predictor

In eb93cb2 I fixed the error when using "boosting" and when the model has a single predictor by changing cv = 0. This resulted in an error in the gbm.perf function (line 346 & 351 of CATE_count.R) because the default method was "cv" which required cv > 1. Therefore, in case of a single predictor, i changed the method for gbm.perf to "OOB" .

Although this fixed the error, it will now repeat the same warning 10 times:

OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.

I tried to wrap the gbm.perf in "tryCatch" or withCallingHandlers, but this does not seem to resolve the issue so I assume the warnings comes from another function that uses the object generated by gbm.perf?

I don't think this is a mayor issue (as this only occurs when "boosting" and a single predictor), but maybe fix this in the future.

review index

https://smartdata-analysis-and-statistics.github.io/precmed/index.html

Missing documentation

> checking for missing documentation entries ... WARNING
  Undocumented code objects:
    'meanExample'
  Undocumented data sets:
    'meanExample'
  All user-level objects in a package should have documentation entries.
  See chapter 'Writing R documentation files' in the 'Writing R
  Extensions' manual.

Default value for tau in cvcatefitsurv

The default value for tau is NULL, but some of the examples (in the vignette) do not work with this default. We may have to adapt the default value

Update license

Gabrielle received approval to publish precmed as open source under the Apache 2.0 license. Biogen should be the copyright holder. Please update the information files accordingly

URL in DESCRIPTION is invalid

URL: https://github.biogen.com/pages/pjiang/PMMS <- DEAD LINK
Replace with github link or cran link

Implement feedback on the vignettes

Interpretation of output survival

It is not entirely clear what the groups "high", "low" and "exclusive" are referring to when adopting survival models; need to clarify this in the vignettes

init.model for continuous endpoints

For continuous outcomes, the user need to specify init.model. Previously, this variable was initialized to a non-specified global variable as follows: init.model = init.model. Since init.model is not defined anywhere in precmed, I have set it equal to NULL by default. However, this issue still needs to be fixed.

#### PRE-PROCESSING ####
  out <- data.preproc.mean(fun = "pm", cate.model = cate.model, init.model = init.model, ps.model = ps.model,
                           score.method = score.method, data = data, prop.cutoff = prop.cutoff, ps.method = ps.method)

examples of pmmean crash

Error in intxmean(y = y, trt = trt, x.cate = x.cate, x.init = x.init, :
object 'x.init' not found
In addition: Warning message:
In data.preproc.mean(fun = "pm", cate.model = cate.model, init.model = init.model, :

Error in intxmean(y = y, trt = trt, x.cate = x.cate, x.init = x.init, :
object 'x.init' not found

Check argument order

After we restructured the function arguments to prioritize mandatory over optional arguments (moved to the front), there is a mismatch between the order in the function and the help file & examples.

I have already restructured the outputs_main() file, but we need to check all the other files to make sure consistancy with the function, helpfiles, examples and vignettes.

Note: the vignettes all match the new structure already

what.ratio variable in plot: correct name?

plot.R at line 215:

 # Define y-axis if default is NULL
  if (is.null(ylab)){
    if (x$response == "count"){
      what.ratio <- "Rate ratio"
    } else if (x$response == "survival"){
      if (plot.hr == TRUE){
        what.ratio <- "Hazard ratio"
      } else {
        what.ratio <- "RMTL ratio"
      }
    } else if (x$response == "continuous"){
      ##TODO: what.ratio? correct name?
      what.ratio <- "Mean difference"
    }

Suggestion: rename what.ratio to plot.ratio

Need for generic functions

It seems that generic functions are needed to manipulate objects generated by PrecMed; for instance in Precision-Medicine-R-package-Count-outcome-functions.html there are calls to output_pm$coefficients to access model coefficients. A function coef() would be more desirable. This woudl require to group the output of certain functions as a dedicated object (rather than a list), and to define new functions that can interact with these objects (and which would need to be exported in the NAMESPACE)

Verification of examples

Examples: check which examples take too much time (the examples are listed in the documentation above each method, and should be introduced by Roxygen using the @example tag). You can test the examples manually by loading all the R files in your memory and running the example code in RStudio

Warning in example (data.preproc.surv)

   > # Survival outcome
   > tau0 <- with(survivalExample,
   +              min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95)))
   > 
   > output_cv2 <- cv(response = "survival",
   +                  cate.model = survival::Surv(y, d) ~ age + female
   +                                                          + previous_cost + previous_number_relapses,
   +                  ps.model = trt ~ age + previous_treatment,
   +                  ipcw.model = ~ age + previous_cost + previous_treatment,
   +                  data = survivalExample,
   +                  score.method = c("poisson", "randomForest"),
   +                  followup.time = NULL,
   +                  tau0 = tau0,
   +                  surv.min = 0.025,
   +                  higher.y = TRUE,
   +                  cv.n = 5,
   +                  initial.predictor.method = "randomForest",
   +                  plot.gbmperf = FALSE,
   +                  seed = 999)
   Warning in data.preproc.surv(fun = "cv", cate.model = cate.model, ps.model = ps.model,  :
     Variable trt was recoded to 0/1 with drug0->0 and drug1->1.

Object-oriented programming

The main output of the R package is currently a list. I would change the attribute type of the list to an object (e.g. "precmed" object) for which we will later create various generic functions

Interpretation of plots

It would be helpful to clarify how the different plots should be interpreted. Perhaps best to add this explanaition in vignette 2 and vignette 3 introducing the count & survival example

score.method has no default (mandatory argument)

At line 265 of outputs_count.R:

# TODO: now score.method has no default (mandatory argument)

In the output_count the score.method has no default. Lets resolve the TODO.
Should we set this score.method to = "poisson" similar to the example? Or leave it without default and remove the TODO?

also need to resolve at pmcount (line 738 of outputs_count.R) and catefitmean (line 812 of outputs_continuous.R)

Check use of wrapper functions

wrappers are already there (e.g., cv function), we should check if they are properly defined and consistently used in the examples.

Usually, a wrapper function would be very small, and forward function calls to the dedicated functions. An example is given below:

uvmeta <- function(r, r.se, r.vi, method="REML", test="knha", labels, na.action, 
                   n.chains=4, pars, verbose=FALSE, ...) 
  UseMethod("uvmeta")

We can specify in the cv function which outcome is being modelled, and then use a switch statement to forward the function call the the relevant cv subfunction

Errors and warnings of main functions

Two main errors/warnings still occur. Check with BioGen if we should resolve these.

1 ) Error in cv_surv() when estimating ATE in nested subgroups using "poisson", "randomForest".

Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
return NAs in the corresponding subgroup.
Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest"

The consecutive functions do not seem affected by these errors (plot, abc, outcomes). Is this really an error or can we reduce or resolve this error in any way?

Example:

library(precmed)
tau0 <- with(survivalExample, min(quantile(y[trt == "drug1"], 0.95), quantile(y[trt == "drug0"], 0.95)))
output_cv2 <- cv(response = "survival",
                 cate.model = survival::Surv(y, d) ~ age +
                                                     female +
                                                     previous_cost +
                                                     previous_number_relapses,
                 ps.model = trt ~ age + previous_treatment,
                 ipcw.model = ~ age + previous_cost + previous_treatment,
                 data = survivalExample,
                 score.method = c("poisson", "randomForest"),
                 followup.time = NULL,
                 tau0 = tau0,
                 surv.min = 0.025,
                 higher.y = TRUE,
                 cv.n = 5,
                 initial.predictor.method = "randomForest",
                 plot.gbmperf = FALSE,
                 seed = 999)

Output:
  |                                                                                                            |   0%
cv = 1 
  splitting the data..
  training..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
  validating..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
   Wed Aug 31 17:01:04 2022 
  |======================                                                                                      |  20%
cv = 2 
  splitting the data..
  training..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
  validating..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
   Wed Aug 31 17:01:23 2022 
  |===========================================                                                                 |  40%
cv = 3 
  splitting the data..
  training..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
  validating..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
   Wed Aug 31 17:01:42 2022 
  |=================================================================                                           |  60%
cv = 4 
  splitting the data..
  training..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
  validating..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
   Wed Aug 31 17:02:01 2022 
  |======================================================================================                      |  80%
cv = 5 
  splitting the data..
  training..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
  validating..
    Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
    return NAs in the corresponding subgroup. 
    Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
   Wed Aug 31 17:02:21 2022 
  |============================================================================================================| 100%

Warnings regarding the conversion of drug0 and drug1 to 0/1

data.preproc will convert the variable trt to 0/1 from drug0 and drug1. This is probably done with interpretation in mind
(on line 314 of utility_count.R this check / conversion is performed). Do we want to show this warning in the examples?

Example:

 output_cv <- cv(response = "count",
                 cate.model = y ~ age + female + previous_treatment +
                                      previous_cost + previous_number_relapses + offset(log(years)),
                 ps.model = trt ~ age + previous_treatment,
                 data = countExample,
                 higher.y = FALSE,
                 score.method = "poisson",
                 cv.n = 5,
                 plot.gbmperf = FALSE,
                 seed = 999)

Output:
cv = 1 
  splitting the data..
  training..
  validating..

cv = 2 
  splitting the data..
  training..
  validating..

cv = 3 
  splitting the data..
  training..
  validating..

cv = 4 
  splitting the data..
  training..
  validating..

cv = 5 
  splitting the data..
  training..
  validating..

Total runtime : 13.28 secs 
Warning message:
In data.preproc(fun = "cv", cate.model = cate.model, ps.model = ps.model,  :
  Variable trt was recoded to 0/1 with drug0->0 and drug1->1.

Reduce title length

Request Beni Altmann (CRAN): Please reduce the length of the title to less than 65 characters.

Add logo to the package and/or github page

Logo is available but currently missing

Rd \usage sections warnings

> checking Rd \usage sections ... WARNING
  Undocumented arguments in documentation object 'cvmean'
    'error.maxNR' 'max.iterNR' 'tune'
  
  Undocumented arguments in documentation object 'intxmean'
    'n.trees.rf'
  
  Documented arguments not in \usage in documentation object 'pmcount':
    'prop.multi'
  
  Undocumented arguments in documentation object 'pmmean'
    'n.trees.rf' 'error.maxNR' 'max.iterNR' 'tune'
  
  Documented arguments not in \usage in documentation object 'pmsurv':
    'prop.multi'
  
  Functions with \usage entries need to have the appropriate \alias
  entries, and all their arguments documented.
  The \usage entries must correspond to syntactically valid R code.
  See chapter 'Writing R documentation files' in the 'Writing R
  Extensions' manual.

Use of donttest isntead of dontrun

Request CRAN:

\dontrun{} should only be used if the example really cannot be executed (e.g. because of missing additional software, missing API keys, ...) by the user. That's why wrapping examples in \dontrun{} adds the comment ("# Not run:") as a warning for the user. Does not seem necessary.
Please replace \dontrun with \donttest.

Please unwrap the examples if they are executable in < 5 sec, or replace dontrun{} with \donttest{}.

R code problems

> checking R code for possible problems ... NOTE
  scoremean: warning in suppressMessages({: partial argument match of
    'class' to 'classes'
  scoremean: warning in predict0 <- predict(object = fit0.boosting,
    newdata = datanew, n.trees = best0.iter): partial argument match of
    'class' to 'classes'
  scoremean: warning in predict1 <- predict(object = fit1.boosting,
    newdata = datanew, n.trees = best1.iter): partial argument match of
    'class' to 'classes'
  scoremean: warning in }, class = "message"): partial argument match of
    'class' to 'classes'
  drmean.inference: no visible binding for global variable 'init.model'
  ipcw.surv: no visible global function definition for 'pnorm'
  onearmglmmean.dr: no visible global function definition for 'lm'
  pm: no visible binding for global variable 'verbose'
  twoarmglmmean.dr: no visible global function definition for 'lm'
  Undefined global functions or variables:
    init.model lm pnorm verbose
  Consider adding
    importFrom("stats", "lm", "pnorm")
  to your NAMESPACE file.

Warning in cross-validation (estimation of ATE using "randomForest"

 > output_cv2 <- cv(response = "survival",
   +                  cate.model = survival::Surv(y, d) ~ age + female
   +                                                          + previous_cost + previous_number_relapses,
   +                  ps.model = trt ~ age + previous_treatment,
   +                  ipcw.model = ~ age + previous_cost + previous_treatment,
   +                  data = survivalExample,
   +                  score.method = c("poisson", "randomForest"),
   +                  followup.time = NULL,
   +                  tau0 = tau0,
   +                  surv.min = 0.025,
   +                  higher.y = TRUE,
   +                  cv.n = 5,
   +                  initial.predictor.method = "randomForest",
   +                  plot.gbmperf = FALSE,
   +                  seed = 999)
   Warning in data.preproc.surv(fun = "cv", cate.model = cate.model, ps.model = ps.model,  :
     Variable trt was recoded to 0/1 with drug0->0 and drug1->1.
   
   
     |                                                                            
     |                                                                      |   0%
   cv = 1 
     splitting the data..
     training..
       Error(s) occurred when estimating the ATEs in the nested subgroup using "randomForest";
       return NAs in the corresponding subgroup. 
   Warning in cvsurv(cate.model = cate.model, ps.model = ps.model, data = data,  :
     Error(s) occurred when estimating the ATEs in the nested subgroup in the training set using "randomForest" in cross-validation iteration 1. NAs are returned for RMTL ratio and HR in the corresponding subgroup; see 'errors/warnings'.
       Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
   Warning in cvsurv(cate.model = cate.model, ps.model = ps.model, data = data,  :
     Warning(s) occurred when estimating the ATEs in the nested subgroup in the training set using "poisson", "randomForest" in cross-validation iteration 1; see 'errors/warnings'.
     validating..
       Error(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest";
       return NAs in the corresponding subgroup. 
   Warning in cvsurv(cate.model = cate.model, ps.model = ps.model, data = data,  :
     Error(s) occurred when estimating the ATEs in the nested subgroup in the validation set using "poisson", "randomForest" in cross-validation iteration 1. NAs are returned for RMTL ratio and HR in the corresponding subgroup; see 'errors/warnings'.
       Warning(s) occurred when estimating the ATEs in the nested subgroup using "poisson", "randomForest" 
   Warning in cvsurv(cate.model = cate.model, ps.model = ps.model, data = data,  :
     Warning(s) occurred when estimating the ATEs in the nested subgroup in the validation set using "poisson", "randomForest" in cross-validation iteration 1; see 'errors/warnings'.

Add return documentation

Request Beni Altmann (CRAN):

Please add \value to .Rd files regarding exported methods and explain the functions results in the documentation. Please write about the structure of the output (class) and also what the output means. (If a function does not return a value, please document that too, e.g.
\value{No return value, called for side effects} or similar) Missing Rd-tags:
plot.atefit.Rd: \value
print.atefit.Rd: \value
print.catefit.Rd: \value

Change default values for initial.predictor.method

Functions that have initial.predictor.method as argument currently use default values that are rather time consuming (e.g., boosting, or randomForest). Lets replace them by faster alternatives:

For survival outcomes, lets set initial.predictor.method = "logistic" by default
For count outcomes, let set initial.predictor.method = "poisson" by default
For continuous outcomes, lets set initial.predictor.method = "gaussian" by default

examples: how much time do they take

Check how much time each examle takes, so that we can do something in case any example takes too long

Release precmed 1.0

First release:

usethis::use_cran_comments()
Update (aspirational) install instructions in README
Proofread Title: and Description:
Check that all exported functions have @return and @examples
Check that Authors@R: includes a copyright holder (role 'cph')
Check licensing of included files
Review https://github.com/DavisVaughan/extrachecks

Prepare for release:

Submit to CRAN:

usethis::use_version('major')
devtools::submit_cran()
Approve email

Wait for CRAN...

Revise Survival-examples

It would be helpful to restructure the example section as follows:

Introduction of the example (please add a baseline table)
Estimation of avarage treatment effect (atefit) => introduce concept of confounding adjustment (formula PS with logistic regression), prognostic prediction (formula for y; use of Poisson regression; and some explanaition why we are adjusting for prognostic covariates when estimating the ATE), need for bootstrapping when estimating standard error.
Estimation of individual treatment effect (catefit) . Explain what kind of ITE model is being estimated; how do we allow for treatment-covariate interaction?
Internal validation (catecv): Why do we need it? What can we compare? What is the output?

Source: https://github.com/smartdata-analysis-and-statistics/precmed/blob/main/vignettes/Survival-examples.Rmd

Rename functions

check function names: dr.inference might be confusing

Random Forest arguments

Currently all random forest arguments are listed individually; we could use (...) to pass on RF arguments automatically. I dont think we need to specify them in the function call

add stop to all continuous functions

all continuous functions are included but need to have a break to stop users from using them

Use of unstated dependencies

> checking for unstated dependencies in vignettes ... NOTE
  '::' or ':::' import not declared from: 'htmltools'

Warning in example (data.preproc)

v  checking files in 'vignettes' ... 
E  checking examples (1m 40s)
   Running examples in 'PrecMed-Ex.R' failed
   The error most likely occurred in:
   
   > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
   > ### Name: cv
   > ### Title: Cross-validation of the conditional average treatment effect
   > ###   (CATE) score for count, survival or continuous outcomes
   > ### Aliases: cv
   > 
   > ### ** Examples
   > 
   > # Count outcome
   > output_cv <- cv(response = "count",
   +                 cate.model = y ~ age + female + previous_treatment +
   +                                      previous_cost + previous_number_relapses + offset(log(years)),
   +                 ps.model = trt ~ age + previous_treatment,
   +                 data = countExample,
   +                 higher.y = FALSE,
   +                 score.method = "poisson",
   +                 cv.n = 5,
   +                 plot.gbmperf = FALSE,
   +                 seed = 999)
   Warning in data.preproc(fun = "cv", cate.model = cate.model, ps.model = ps.model,  :
     Variable trt was recoded to 0/1 with drug0->0 and drug1->1.

Rd line widths

The issues below should be fixed in the roxxygen documentation in the R files

N  checking Rd line widths ... 
   Rd file 'abc.Rd':
     \examples lines wider than 100 characters:
        cv.mean <- cvmean(cate.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
                          init.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
   
   Rd file 'boxplot.PrecMed.Rd':
     \examples lines wider than 100 characters:
        cv.mean <- cvmean(cate.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
                          init.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
   
   Rd file 'cv.Rd':
     \examples lines wider than 100 characters:
                        cate.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
                        init.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
   
   Rd file 'dr.inference.Rd':
     \examples lines wider than 100 characters:
                               cate.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
   
   Rd file 'plot.PrecMed.Rd':
     \examples lines wider than 100 characters:
        cv.mean <- cvmean(cate.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
                          init.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
   
   Rd file 'pm.Rd':
     \examples lines wider than 100 characters:
                        cate.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
                        init.model = y ~ age  + previous_treatment + previous_cost + previous_status_measure,
   
   Rd file 'pmsurv.Rd':
     \examples lines wider than 100 characters:
        pm <- pmsurv(cate.model = survival::Surv(y, d) ~ age + female + previous_cost + previous_number_relapses,
   
   These lines will be truncated in the PDF manual.

Dictionary of arguments

Please make a dictionary of arguments used in the public functions (to make sure it is consistent). No need to include functions that are not intented to be used directly by exrenal users (and are not expored in the NAMESPACE)