GithubHelp home page GithubHelp logo

mwheymans / psfmi Goto Github PK

View Code? Open in Web Editor NEW
9.0 3.0 7.0 5.07 MB

psfmi: Predictor Selection Functions for Logistic and Cox regression models in multiply imputed datasets

R 69.99% HTML 30.01%
predictor selection logistic regression imputation imputed-datasets spline pool cox-regression spline-predictors

psfmi's Introduction

psfmi

CRAN_Release_Badge Monthly downloads badge R-CMD-check minimal R version

The package provides functions to apply pooling, backward and forward selection of linear, logistic and Cox regression models across multiply imputed data sets using Rubin’s Rules (RR). The D1, D2, D3, D4 and the median p-values method can be used to pool the significance of categorical variables (multiparameter test). The model can contain continuous, dichotomous, categorical and restricted cubic spline predictors and interaction terms between all these type of variables. Variables can also be forced in the model during selection.

Validation of the prediction models can be performed with cross-validation or bootstrapping across multiply imputed data sets and pooled model performance measures as AUC value, Reclassification, R-square, Hosmer and Lemeshow test, scaled Brier score and calibration plots are generated. Also a function to externally validate logistic prediction models across multiple imputed data sets is available and a function to compare models in multiply imputed data.

Installation

You can install the released version of psfmi with:

install.packages("psfmi")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("mwheymans/psfmi")

Citation

Cite the package as:

Martijn W Heymans (2021). psfmi: Prediction Model Pooling, Selection and Performance Evaluation 
Across Multiply Imputed Datasets. R package version 1.1.0. https://mwheymans.github.io/psfmi/ 

Examples

This example shows you how to pool a logistic regression model across 5 multiply imputed datasets and that includes two restricted cubic spline variables and a categorical, continuous and dichotomous variable. The pooling method that is used is method D1.

library(psfmi)

pool_lr <- psfmi_lr(data=lbpmilr, formula = Chronic ~ rcs(Pain, 3) + 
                      JobDemands + rcs(Tampascale, 3) + factor(Satisfaction) + 
                      Smoking, nimp=5, impvar="Impnr", method="D1")

pool_lr$RR_model
#> $`Step 1 - no variables removed -`
#>                            term      estimate  std.error  statistic        df
#> 1                   (Intercept) -21.374498123 7.96491209 -2.6835824  65.71094
#> 2                    JobDemands  -0.007500147 0.05525835 -0.1357288  38.94021
#> 3                       Smoking   0.072207184 0.51097303  0.1413131  47.98415
#> 4         factor(Satisfaction)2  -0.506544055 0.56499941 -0.8965391 139.35335
#> 5         factor(Satisfaction)3  -2.580503376 0.77963853 -3.3098715 100.66273
#> 6              rcs(Pain, 3)Pain  -0.090675006 0.50510774 -0.1795162  26.92182
#> 7             rcs(Pain, 3)Pain'   1.183787048 0.55697046  2.1254036  94.79276
#> 8  rcs(Tampascale, 3)Tampascale   0.583697990 0.22707747  2.5704796  77.83368
#> 9 rcs(Tampascale, 3)Tampascale'  -0.602128298 0.29484065 -2.0422160  31.45559
#>       p.value           OR    lower.EXP  upper.EXP
#> 1 0.009206677 5.214029e-10 6.460344e-17 0.00420815
#> 2 0.892734942 9.925279e-01 8.875626e-01 1.10990663
#> 3 0.888214212 1.074878e+00 3.847422e-01 3.00295282
#> 4 0.371511077 6.025744e-01 1.971829e-01 1.84141687
#> 5 0.001296125 7.573587e-02 1.612863e-02 0.35563604
#> 6 0.858876729 9.133145e-01 3.239353e-01 2.57503035
#> 7 0.036152843 3.266722e+00 1.081155e+00 9.87043962
#> 8 0.012063538 1.792655e+00 1.140659e+00 2.81733025
#> 9 0.049589266 5.476448e-01 3.002599e-01 0.99885104

pool_lr$multiparm
#> $`Step 1 - no variables removed -`
#>                      p-values D1 F-statistic
#> JobDemands           0.892487763  0.01842230
#> Smoking              0.887968553  0.01996939
#> factor(Satisfaction) 0.002611518  6.04422205
#> rcs(Pain,3)          0.014630986  4.84409246
#> rcs(Tampascale,3)    0.130741167  2.24870192

This example shows you how to apply forward selection of the above model using a p-value of 0.05.

library(psfmi)

pool_lr <- psfmi_lr(data=lbpmilr, formula = Chronic ~ rcs(Pain, 3) + 
                      JobDemands + rcs(Tampascale, 3) + factor(Satisfaction) + 
                      Smoking, p.crit = 0.05, direction="FW", 
                      nimp=5, impvar="Impnr", method="D1")
#> Entered at Step 1 is - rcs(Pain,3)
#> Entered at Step 2 is - factor(Satisfaction)
#> 
#> Selection correctly terminated, 
#> No new variables entered the model

pool_lr$RR_model_final
#> $`Final model`
#>                    term   estimate std.error  statistic        df     p.value
#> 1           (Intercept) -3.6027668 1.5427414 -2.3353018  60.25659 0.022875170
#> 2 factor(Satisfaction)2 -0.4725289 0.5164342 -0.9149838 145.03888 0.361718841
#> 3 factor(Satisfaction)3 -2.3328994 0.7317131 -3.1882707 122.95905 0.001815476
#> 4      rcs(Pain, 3)Pain  0.6514983 0.4028728  1.6171315  51.09308 0.112008088
#> 5     rcs(Pain, 3)Pain'  0.4703811 0.4596490  1.0233483  75.29317 0.309419924
#>           OR   lower.EXP upper.EXP
#> 1 0.02724823 0.001245225 0.5962503
#> 2 0.62342367 0.224644070 1.7301016
#> 3 0.09701406 0.022793375 0.4129150
#> 4 1.91841309 0.854476033 4.3070942
#> 5 1.60060402 0.640677978 3.9987846

pool_lr$multiparm
#> $`Step 0 - selected - rcs(Pain,3)`
#>                        p-value D1
#> JobDemands           7.777737e-01
#> Smoking              9.371529e-01
#> factor(Satisfaction) 9.271071e-01
#> rcs(Pain,3)          3.282999e-07
#> rcs(Tampascale,3)    2.780012e-06
#> 
#> $`Step 1 - selected - factor(Satisfaction)`
#>                       p-value D1
#> JobDemands           0.952900908
#> Smoking              0.769394518
#> factor(Satisfaction) 0.004738608
#> rcs(Tampascale,3)    0.125280292

More examples for logistic, linear and Cox regression models as well as internal and external validation of prediction models can be found on the package website or in the online book Applied Missing Data Analysis.

psfmi's People

Contributors

lionel- avatar mwheymans avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

psfmi's Issues

psfmi_coxr does not work with stratified cox model

Hi all,
thank you for providing this amazing package! I have tried to used psfmi to run a stratified cox model, but as it seems this is not implemented. Interactions and splines are covered, yet is there any plan of including this feature in the future? Thank you!

Question about installing

I would like to thank you for the wonderful work you have done.

There is a problem installing the package.

I am currently using Mac OS X version 12.3.

The following steps were followed.

1)devtools::install_github("mwheymans/psfmi")

2)Enter one or more numbers, or an empty line to skip updates:1(All)

3)Do you want to install from sources the packages which need compilation? Yes

I saw the error message.

installing to /Users/jaeman/Library/R/4.0/library/00LOCK-minqa/00new/minqa/libs

** R

** byte-compile and prepare package for lazy loading

** help

*** installing help indices

** building package indices

** testing if installed package can be loaded from temporary location

Error: package or namespace load failed for ‘minqa’ in dyn.load(file, DLLpath = DLLpath, ...):

unable to load shared object '/Users/jaeman/Library/R/4.0/library/00LOCK-minqa/00new/minqa/libs/minqa.so':

dlopen(/Users/jaeman/Library/R/4.0/library/00LOCK-minqa/00new/minqa/libs/minqa.so, 0x0006): symbol not found in flat namespace 'bobyqa'

Error: loading failed

Execution halted

ERROR: loading failed

  • removing ‘/Users/jaeman/Library/R/4.0/library/minqa’

  • restoring previous ‘/Users/jaeman/Library/R/4.0/library/minqa’

Error: Failed to install 'psfmi' from GitHub:

(converted from warning) installation of package ‘minqa’ had non-zero exit status

question on the pooled CI for AUC from mivalext_lr() function, extra square rooted?

I have an imputated data with several nonmissing and not-imputated variables. However, I realised when I use mivalext_lr() to obtain pooled AUC and 95% CI of my logistic regression, the pooled 95% CI is bigger then I get from individual imputated data, (which are the same as the variable used was not imputated), and actually exactly the square root of the CI upper and lower limit obtained from each imputated data set. I understand I should pool AUC after logit transformation with Rubin's rule, integrating the variance of each AUC itself and the variance between imputation. So I tried to look into the source code, and added some print out command, the variate between each imputation (b.roc.logit) truly print out as 0, but I felt the p.se.roc.logit had an extra sqrt.... Can someone help me to take a look? unless it supposed to have bigger SE even for not imputated variables after pooling..... I tried psfmi_perform() in the same package, got the same big CI.

incorrect p-values?

Ok, so i performed the pooled logistic regression model in order to pick out my predictor variables using the forward approach. The ORs seem correct to me based off of other analyses but I find the p-values very strange because according to the 95% CI, many of the variables are not significantly associated with my outcome, but the p-values tell another story, and i think there is a problem with my p-values, but I am not sure what I did or what needs to be changed in order to get the correct p-values. here is my code and my results:

Code :

pool_lr <- psfmi_lr(data=dataset_mom, nimp=25, impvar="x_imputation", Outcome="M2_M_PPD", predictors=c("M0_M_nation", "M0_P_nation", "M2_P_PPD", "M00M2_PEREACC", "mother_medicine", "fchild", "number_household", "relative_poverty"), p.crit = 0.25, cat.predictors = c("M02M_CONGPAT", "Fwanted_child", "Mwanted_child", "M0_P_age", "M2_conflict","M0_siblingbis", "mother_diploma", "dad_profession", "mom_profession","M0_zone", "conab","condp","conbp"), method="D2", keep.predictors = "M02M_CONGPAT", direction = "FW")

and here are the results (found in the link):
boosting model results

any ideas why the p-values would be like this or what i can fix?
thank you so much for your help!

plot the pooled ROC curve

The package 'psfmi' is very useful to calculate the pool C-statistic (AUC), with the use of 'pool_performance'. I wondering if there is a way to plot the pooled ROC curve (like below)? Thank you very much!

ROC-Curve-Plot-for-a-No-Skill-Classifier-and-a-Logistic-Regression-Model

psfmi_validate parallelisation

Hello,

Is it possible to parallelise the psfmi_validate function?

This would provide very useful speed improvements in large datasets.

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.