GithubHelp home page GithubHelp logo

Tidy multiple models at once about broom HOT 8 CLOSED

tidymodels avatar tidymodels commented on July 19, 2024
Tidy multiple models at once

from broom.

Comments (8)

dgrtwo avatar dgrtwo commented on July 19, 2024

I frequently have the same need or similar needs. A simple version of this function would be straightforward enough to write as:

straighten = function(...) {
    straight <- lapply(list(...), tidy)
    straight <- lapply(names(straight), function(n) cbind(model=n, straight[[n]]))
    rbind_all(straight)
}

(For now this gives a warning for turning strings into factors; I should consider turning all tidy output into character vectors so they can be easily recombined).

However, I'd be interested in seeing whether this can be done using dplyr's tools rather than adding a new utility to broom. I have a common pattern wherein I write a function that performs one model fit or simulation, and makes its decisions based on simple character/numeric arguments. Then I create a table of my parameter combinations, using either data.frame or expand.grid, and use group_by and do to perform the model:

fitmod <- function(..., model="linear") {
    if (model == "linear") lm(...)
    else if (model == "logit") glm(..., family="binomial")
}

library(dplyr)
results <- data.frame(model=c("linear", "logit")) %>% group_by(model) %>%
    do(tidy(fitmod(y ~ x, dat, model=.$model)))

On its own, this looks much more complicated than the use of straighten. However, consider that you could add any number of parameters, or factorial combinations of parameters, and it would perform and label all combinations (straighten would need to give each model a unique name). If there are ways to further simplify this kind of pattern I'd be interested in supporting them.

from broom.

markdanese avatar markdanese commented on July 19, 2024

One other area where this comes into play is with multinomial regression (package nnet). In this case there are essentially n-1 models, where n is the number of factors in the outcome variable. For example, one could model preferences for ice cream as a function of covariates, with responses of vanilla, chocolate, and strawberry. In this case, the model essentially outputs coefficients for the covariates for two models: strawberry vs. chocolate and vanilla vs. chocolate (it uses the first level of the factor that defines the outcome as the reference group in the model). So, in this case, it is essentially the same as running the separate logistic regression models. However, it is all in one model, so you could save the output to a list of data.frames.

A list of data frames, one per model, would generalize to running multiple models. In the simple case of two models, one could then merge the two dataframes on the coefficient column and create and expanded data frame with all of the coefficients from both models, and all of the output for all of the models. (Not my idea, but someone in my office suggested this.) One would have to use all = TRUE in merge and it would return NA for coefficients that are in one model but not another. Also, one would have to have a way of differentiating the columns that apply to each model (perhaps using .1, .2, etc as a suffix). There is no question it gets complicated.

The package texreg does a nice job with aligning table output for side-by-side models. Might be worth a look.

Anyway, these are just random thoughts in case they are helpful.

from broom.

mbojan avatar mbojan commented on July 19, 2024

This is an old issue, but I run into this myself.

Should the long-term solution to this be outer-join of multiple tidyied models on the term column with adjusted coumn names (e.g. estimate1, estimate2 if estimates come from models 1, 2, and so on)?

from broom.

nutterb avatar nutterb commented on July 19, 2024

Something along these lines?

library(broom)
library(survival)
fit1 <- lm(mpg ~ qsec + wt + am, data = mtcars)
fit2 <- glm(am ~ mpg + qsec + factor(gear), data = mtcars, family = binomial)
fit3 <- coxph(Surv(futime, fustat)~ age + resid.ds, data = ovarian)


straighten <- function(..., fn = tidy){
  fits <- list(...)
  if (is.null(names(fits))) names(fits) <- character(length(fits))
  
  # If a fit isn't named, use the object name
  dots <- match.call(expand.dots = FALSE)$...
  obj_nms <- vapply(dots, deparse, character(1))
  names(fits)[names(fits) == ""] <- obj_nms[names(fits) == ""]
  
  purrr::map2(.x = fits,
              .y = names(fits),
              .f = function(x, n){
                data.frame(model = n, 
                           fn(x),
                           stringsAsFactors = FALSE)
              }) %>%
    dplyr::bind_rows()
}

straighten(x = fit1, fit2, ovarian = fit3)



fit1 <- lm(mpg ~ wt + disp, data = mtcars)
fit2 <- lm(mpg ~ wt + disp + factor(gear), data = mtcars)

library(dplyr)
library(reshape2)
straighten(fit1, fit2) %>% 
  select(model, term, estimate) %>% 
  dcast(term ~ model, 
        value.var = "estimate")  


straighten(fit1, fit2, fn = glance)

from broom.

alexpghayes avatar alexpghayes commented on July 19, 2024

Okay I've been thinking about this for a while and finally have some thoughts. One feature request that keeps popping up again and again and is a tidying method that works on multiple models at once (#202, #206, several other places I don't recall off the top of my head).

I'm increasingly of the opinion that broom verbs should work on a single model, mostly because working with a single model plays really well with purrr::map in a very explicit way. For example, the code examples above become:

library(survival)

fit1 <- lm(mpg ~ qsec + wt + am, data = mtcars)
fit2 <- glm(am ~ mpg + qsec + factor(gear), data = mtcars, family = binomial)
fit3 <- coxph(Surv(futime, fustat)~ age + resid.ds, data = ovarian)

models <- list(fit1 = fit1, fit2 = fit2, fit3 = fit3)
purrr::map_df(models, broom::tidy, .id = "model")
#>    model          term     estimate    std.error    statistic      p.value
#> 1   fit1   (Intercept)    9.6177805 6.959593e+00  1.381945832 1.779152e-01
#> 2   fit1          qsec    1.2258860 2.886696e-01  4.246675671 2.161737e-04
#> 3   fit1            wt   -3.9165037 7.112016e-01 -5.506882344 6.952711e-06
#> 4   fit1            am    2.9358372 1.410905e+00  2.080819191 4.671551e-02
#> 5   fit2   (Intercept) 1050.1649470 7.025103e+05  0.001494875 9.988073e-01
#> 6   fit2           mpg   32.1663723 1.796845e+04  0.001790158 9.985717e-01
#> 7   fit2          qsec  -99.1600110 5.862305e+04 -0.001691485 9.986504e-01
#> 8   fit2 factor(gear)4  126.1569921 8.824948e+04  0.001429549 9.988594e-01
#> 9   fit2 factor(gear)5  -60.5404408 1.555674e+05 -0.000389159 9.996895e-01
#> 10  fit3           age    0.1445394 5.141628e-02  2.811161152 4.936306e-03
#> 11  fit3      resid.ds    0.6141357 7.335776e-01  0.837178889 4.024920e-01
#>       conf.low conf.high
#> 1           NA        NA
#> 2           NA        NA
#> 3           NA        NA
#> 4           NA        NA
#> 5           NA        NA
#> 6           NA        NA
#> 7           NA        NA
#> 8           NA        NA
#> 9           NA        NA
#> 10  0.04376539 0.2453135
#> 11 -0.82365000 2.0519214

The output will be even nicer one we return tibbles. Since this is the workflow that I'd like to promote, I'm hesitant to also have a straighten that does the same thing.

More broadly, I think there's a big need at the moment to document the purrr::map workflow for modelling as pertains to broom (#353).

from broom.

nutterb avatar nutterb commented on July 19, 2024

I just cut straighten from my branch. If you're able to comment on #115, I'll be able to commit and update my pull request.

from broom.

nutterb avatar nutterb commented on July 19, 2024

In that case, I recommend closing this in favor of #353 (and then you'll be under 90 open issues!!! :) )

from broom.

github-actions avatar github-actions commented on July 19, 2024

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

from broom.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.