Comments (8)
I frequently have the same need or similar needs. A simple version of this function would be straightforward enough to write as:
straighten = function(...) {
straight <- lapply(list(...), tidy)
straight <- lapply(names(straight), function(n) cbind(model=n, straight[[n]]))
rbind_all(straight)
}
(For now this gives a warning for turning strings into factors; I should consider turning all tidy output into character vectors so they can be easily recombined).
However, I'd be interested in seeing whether this can be done using dplyr's tools rather than adding a new utility to broom. I have a common pattern wherein I write a function that performs one model fit or simulation, and makes its decisions based on simple character/numeric arguments. Then I create a table of my parameter combinations, using either data.frame or expand.grid, and use group_by and do to perform the model:
fitmod <- function(..., model="linear") {
if (model == "linear") lm(...)
else if (model == "logit") glm(..., family="binomial")
}
library(dplyr)
results <- data.frame(model=c("linear", "logit")) %>% group_by(model) %>%
do(tidy(fitmod(y ~ x, dat, model=.$model)))
On its own, this looks much more complicated than the use of straighten. However, consider that you could add any number of parameters, or factorial combinations of parameters, and it would perform and label all combinations (straighten would need to give each model a unique name). If there are ways to further simplify this kind of pattern I'd be interested in supporting them.
from broom.
One other area where this comes into play is with multinomial regression (package nnet). In this case there are essentially n-1 models, where n is the number of factors in the outcome variable. For example, one could model preferences for ice cream as a function of covariates, with responses of vanilla, chocolate, and strawberry. In this case, the model essentially outputs coefficients for the covariates for two models: strawberry vs. chocolate and vanilla vs. chocolate (it uses the first level of the factor that defines the outcome as the reference group in the model). So, in this case, it is essentially the same as running the separate logistic regression models. However, it is all in one model, so you could save the output to a list of data.frames.
A list of data frames, one per model, would generalize to running multiple models. In the simple case of two models, one could then merge the two dataframes on the coefficient column and create and expanded data frame with all of the coefficients from both models, and all of the output for all of the models. (Not my idea, but someone in my office suggested this.) One would have to use all = TRUE in merge and it would return NA for coefficients that are in one model but not another. Also, one would have to have a way of differentiating the columns that apply to each model (perhaps using .1, .2, etc as a suffix). There is no question it gets complicated.
The package texreg does a nice job with aligning table output for side-by-side models. Might be worth a look.
Anyway, these are just random thoughts in case they are helpful.
from broom.
This is an old issue, but I run into this myself.
Should the long-term solution to this be outer-join of multiple tidy
ied models on the term
column with adjusted coumn names (e.g. estimate1
, estimate2
if estimates come from models 1, 2, and so on)?
from broom.
Something along these lines?
library(broom)
library(survival)
fit1 <- lm(mpg ~ qsec + wt + am, data = mtcars)
fit2 <- glm(am ~ mpg + qsec + factor(gear), data = mtcars, family = binomial)
fit3 <- coxph(Surv(futime, fustat)~ age + resid.ds, data = ovarian)
straighten <- function(..., fn = tidy){
fits <- list(...)
if (is.null(names(fits))) names(fits) <- character(length(fits))
# If a fit isn't named, use the object name
dots <- match.call(expand.dots = FALSE)$...
obj_nms <- vapply(dots, deparse, character(1))
names(fits)[names(fits) == ""] <- obj_nms[names(fits) == ""]
purrr::map2(.x = fits,
.y = names(fits),
.f = function(x, n){
data.frame(model = n,
fn(x),
stringsAsFactors = FALSE)
}) %>%
dplyr::bind_rows()
}
straighten(x = fit1, fit2, ovarian = fit3)
fit1 <- lm(mpg ~ wt + disp, data = mtcars)
fit2 <- lm(mpg ~ wt + disp + factor(gear), data = mtcars)
library(dplyr)
library(reshape2)
straighten(fit1, fit2) %>%
select(model, term, estimate) %>%
dcast(term ~ model,
value.var = "estimate")
straighten(fit1, fit2, fn = glance)
from broom.
Okay I've been thinking about this for a while and finally have some thoughts. One feature request that keeps popping up again and again and is a tidying method that works on multiple models at once (#202, #206, several other places I don't recall off the top of my head).
I'm increasingly of the opinion that broom
verbs should work on a single model, mostly because working with a single model plays really well with purrr::map
in a very explicit way. For example, the code examples above become:
library(survival)
fit1 <- lm(mpg ~ qsec + wt + am, data = mtcars)
fit2 <- glm(am ~ mpg + qsec + factor(gear), data = mtcars, family = binomial)
fit3 <- coxph(Surv(futime, fustat)~ age + resid.ds, data = ovarian)
models <- list(fit1 = fit1, fit2 = fit2, fit3 = fit3)
purrr::map_df(models, broom::tidy, .id = "model")
#> model term estimate std.error statistic p.value
#> 1 fit1 (Intercept) 9.6177805 6.959593e+00 1.381945832 1.779152e-01
#> 2 fit1 qsec 1.2258860 2.886696e-01 4.246675671 2.161737e-04
#> 3 fit1 wt -3.9165037 7.112016e-01 -5.506882344 6.952711e-06
#> 4 fit1 am 2.9358372 1.410905e+00 2.080819191 4.671551e-02
#> 5 fit2 (Intercept) 1050.1649470 7.025103e+05 0.001494875 9.988073e-01
#> 6 fit2 mpg 32.1663723 1.796845e+04 0.001790158 9.985717e-01
#> 7 fit2 qsec -99.1600110 5.862305e+04 -0.001691485 9.986504e-01
#> 8 fit2 factor(gear)4 126.1569921 8.824948e+04 0.001429549 9.988594e-01
#> 9 fit2 factor(gear)5 -60.5404408 1.555674e+05 -0.000389159 9.996895e-01
#> 10 fit3 age 0.1445394 5.141628e-02 2.811161152 4.936306e-03
#> 11 fit3 resid.ds 0.6141357 7.335776e-01 0.837178889 4.024920e-01
#> conf.low conf.high
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> 6 NA NA
#> 7 NA NA
#> 8 NA NA
#> 9 NA NA
#> 10 0.04376539 0.2453135
#> 11 -0.82365000 2.0519214
The output will be even nicer one we return tibbles. Since this is the workflow that I'd like to promote, I'm hesitant to also have a straighten
that does the same thing.
More broadly, I think there's a big need at the moment to document the purrr::map
workflow for modelling as pertains to broom
(#353).
from broom.
I just cut straighten
from my branch. If you're able to comment on #115, I'll be able to commit and update my pull request.
from broom.
In that case, I recommend closing this in favor of #353 (and then you'll be under 90 open issues!!! :) )
from broom.
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
from broom.
Related Issues (20)
- tidy.anova fails with long predictor names (two lines?): Logical subscript `idx` must be size 1 or 2, not 3. HOT 3
- tidy.varest contains wrong (repeated) coefficients? HOT 4
- transition to cli errors
- add / polish alt text
- Regression Model Accuracy Metrics for Objects of class oohbchoice HOT 2
- No tidy method for objects of class npregression HOT 2
- Issues regarding CI computing for lm.beta models HOT 2
- augment error with `na.action = na.exclude` in `lm` HOT 1
- add conf.int and exponentiate arguments to `tidy.cch()`
- Error: No tidy method for objects of class npregression HOT 2
- Possible bug with `tidy() ` function on `lm.beta` object HOT 3
- augment() fails HOT 3
- address GHA failures re: minimum R version HOT 1
- Bug in `tidy.survfit()` coming in the next release of the {survival} pkg HOT 5
- Create tidy method HOT 5
- Support for glmtoolbox (GEE) HOT 2
- Allow choosing the type of residuals in `tidy.betareg()` HOT 3
- GHA failure re: archived package HOT 1
- dependency install errors on GHA HOT 3
- Release broom 1.0.6 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from broom.