GithubHelp home page GithubHelp logo

Comments (11)

bbolker avatar bbolker commented on August 20, 2024 3

It's actually reasonably easy to get tidy.lm working with polr objects - onejust has to define a family function for polr objects, then the rest of the machinery works.

library(MASS)
library(broom)

family.polr <- function(object,...) NULL
tidy.polr <- broom:::tidy.lm

house.plr <- polr(Sat ~ Infl + Type + Cont, weights = Freq, data = housing)
tidy(house.plr)
##            term   estimate std.error statistic
## 1    InflMedium  0.5663937 0.1046528  5.412123
## 2      InflHigh  1.2888191 0.1271561 10.135720
## 3 TypeApartment -0.5723501 0.1192380 -4.800064
## 4    TypeAtrium -0.3661866 0.1551733 -2.359855
## 5   TypeTerrace -1.0910149 0.1514860 -7.202083
## 6      ContHigh  0.3602841 0.0955358  3.771195
## 7    Low|Medium -0.4961353 0.1248472 -3.973939
## 8   Medium|High  0.6907083 0.1254719  5.504883

from broom.

bbolker avatar bbolker commented on August 20, 2024

bump?

from broom.

dgrtwo avatar dgrtwo commented on August 20, 2024

Sorry, I've been unavailable! This work and ideas are terrific and as soon as I can in the next two days I'm going to do them justice. I was hoping for exactly this kind of exploration of models from someone.

from broom.

ashander avatar ashander commented on August 20, 2024

Great topic. I wanted to take a look at integrating coefplot2 capabilities to extend broom to get (at least) Wald CIs. For full integration there is a license issue -- both coefplot2 and arm are GPL-equivalent as far as I can tell. Importing coefplot2::coeftab doesn't seem wise as coefplot2 not on cran..

Thoughts?

from broom.

bbolker avatar bbolker commented on August 20, 2024

If broom's interface can be extended appropriately, I have no problem ditching coefplot2 entirely (if one is willing to use ggplot, 99% of the effort of producing the coefficient plot is in tidying the coefficient tables; if desired, it would be pretty easy to write a basic-plot plot method for broom objects). I also don't have a problem copying code over, as long as I can make sure it's code I wrote myself (and hence have the right to re-license) as opposed to copied from arm. Frankly, though, nothing in arm::coefplot or coefplot2::coefplot is that hard to replicate.

Alternatively, if broom does all of the back-end stuff that coefplot2::coeftab does, I might gut coefplot2 and make it a thin layer on top of broom.

Can you give a more specific use case?

from broom.

dgrtwo avatar dgrtwo commented on August 20, 2024

I apologize for the late response, I have been distracted (mostly by finishing my PhD thesis). Your ideas and potential contributions could be one of the most important things to happen to broom and I'm very thankful!

Some responses:

Model types

broom can handle lm, glm, and merMod, but not bugs, polr, mcmc, glmmadmb, glmmML, mcmc, MCMCglmm, or rjags. I would love for these to be added to broom and will do everything to help. (Of course you would get co-authorship on the package).

Interface issues

parameter types

At the moment, the tidy method for merMod objects offers the choice of "fixed" or "random". The "fixed" case makes sense, but the other possibilities get a little bit interesting. My own most common use cases for random-effects parameters is to want to retrieve the standard deviations and correlations of the random effects, not the estimated values of the coefficients for each level of the grouping variable(s).

I think I thought about this when I was looking into tidy.merMod output, but this is the area of modeling I am least familiar with. It sounds to me like there are two "modes" of tidying: tidying the per-level estimates, or tidying the standard deviations and correlations of random effects. Perhaps a mode argument- I'm fine with the latter being the default.

If you opened up an issue or a fork we could talk about this one more.

confidence intervals etc

While many model types have well-defined and useful standard errors, others don't. In particular the Wald standard errors for GLMs can be very unreliable; profile confidence intervals are more reliable. The same is often true for random effects parameters. It would be nice to have the option to incorporate lower and upper confidence intervals in a tidy data frame, although it's a little hard to know how to get this -- would you pass a confidence-interval data frame or a likelihood profile? (For most models including glm objects it's easy and lightweight to recompute the profile confidence intervals via confint(), but for merMod objects this can be an expensive operation ...)

One thing to think about here is that the default column names that R uses for confidence intervals (2.5 %, 97.5 %) are a nuisance -- I usually use lwr and upr, although these don't specify the level -- maybe lwr_0.025, upr_0.975?

Right now a number of tidying methods take an argument conf.int that optionally returns the confidence interval, which then leads to conf.low and conf.high columns being added.

In my view we actually shouldn't include the confidence level in the column names. Instead, if you stored multiple confidence levels in the same object, there should be a third column: conf.level- that contains the level. That would allow, for example, a ggplotted curve to have both a "thin" confidence interval and a "fat" one.

I would leave the conf.level to be created by the user, though, in the case that she is combining multiple confidence levels. By convention I avoid having broom return values that describe what was passed to the model rather than what is returned (for example, glance.glm does not return a column with family).

There are sometimes additional decisions: for Bayesian models, should highest posterior density (coda::HPDinterval) or quantiles of the marginal posterior be used?

That's a good question- probably set as an option to the tidy function.

Wish Lists

  • automatically rescale parameters, i.e. take a model that was fitted without centering and scaling parameters and adjust the parameters and SEs accordingly. This is fairly straightforward if the means and standard deviations of the original predictor variables are known (so is the inverse transformation). arm::coefplot does this, I think, but only by re-fitting the model, which is an unnecessary expense.

Interesting- I'm unfamiliar with this process. Could a rescale_model function work directly on the tidied output and the original data? How much would it need to know about the model?

automatically change scale of parameter/confidence intervals; for example, Wald estimates of variance parameters are much better on the log scale, and this conversion is easy [i.e. if the std dev estimate is b and its standard error is s, then the confidence intervals based on a log scale are exp(log(b) +/- 1.96*s/b]

This sounds like it works well on tidy output. We already give this option (exponentiate = TRUE) for tidy.lm and a few other tidying methods. It could be an option for any tidy. method.

Alternatively, if broom does all of the back-end stuff that coefplot2::coeftab does, I might gut coefplot2 and make it a thin layer on top of broom.

I think this sounds like a great idea. I've been interested in general methods for plotting standard tidy outputs and could try to contribute my own as well.

from broom.

ashander avatar ashander commented on August 20, 2024

@bbolker just thinking of a subset of your use cases (getting confidence intervals in the output of tidy with effects="fixed" for easy plotting). The point you raise with the current output of tidy for random effects is a good one. Ultimately it might make sense for tidy.merMod and other mixed models to tidy both fixed and random into one data.frame, or have the option to output only fixed or only random.

I also didn't mean to that arm::coeftab or coefplot2::coeftab would be super hard to reproduce, but noting that actually integrating their code might raise a license issue. A quick and dirty approach would be to import the methods and use them within tidy methods for the object types that work with the two coefplot. For long term, this isn't desirable. For coefplot2 because it isn't on CRAN, and more generally for maintainability. It might be a good strategy though, for developing tests against which broom-specific methods can be written. These could also take the conf.int arg mentioned.

Both your notions -- of writing plot methods for broom objects, or making coeplot2 a thin wrapper if broom can handle all tidying -- seem great. The strength of broom is its clear goal and well defined outputs (data.frame) but lightweight plotting seems like a natural addition.

from broom.

bbolker avatar bbolker commented on August 20, 2024

haven't had a chance to absorb everything here but just wanted to point out that I have started to tackle some of it on my own broom fork; will presumably submit a pull request eventually, but feel free to drop in and comment on what I'm trying there.

from broom.

RMHogervorst avatar RMHogervorst commented on August 20, 2024

I would still like the extensions for polr etc.

two questions:

  1. Is there a manual or something to help extending broom?
  2. Since the output of summary ( polr ) looks just like normal models, is there a way to force broom to use it default lm method?

from broom.

alexpghayes avatar alexpghayes commented on August 20, 2024

@bbolker Are there lingering issues in this thread or is it safe to close?

from broom.

github-actions avatar github-actions commented on August 20, 2024

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

from broom.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.