GithubHelp home page GithubHelp logo

Factor variables about broom HOT 6 CLOSED

matthieugomez avatar matthieugomez commented on August 20, 2024
Factor variables

from broom.

Comments (6)

dgrtwo avatar dgrtwo commented on August 20, 2024

I see what you mean but this seems like something that would best be handled in post-tidying dplyr steps. An analysis could be arbitrarily complicated, with multiple interaction terms in different combinations, and any operation to plot them would need to filter out and manipulate terms anyway.

Also, I can't reproduce the above example. Why does the z factor have one term rather than having one for each level? And it's not clear what the z column in the output represents: why is it z in one case but 1/2/3 in others (If it's a factor, does it simply have an extra level relative to the z input?) Could you suggest a reproducible example?

from broom.

matthieugomez avatar matthieugomez commented on August 20, 2024

I see. I now handle that using regex, but I was just wondering if something cleaner was possible. It corresponds a little bit to tidyr::separate

Attached are examples (the second row in the first post corresponds to the intercept (edited))

N=1e3; K=100
set.seed(1)
DT <- data.table(
  z = sample(3, N/K, TRUE),
  x =  sample(5, N, TRUE),
  y =  sample(1e6, N, TRUE)                       
)
tidy(lm(y~ x+ x:as.factor(z), DT))
#>              term   estimate std.error  statistic      p.value
#> 1     (Intercept) 501831.257 22033.010 22.7763371 9.041980e-93
#> 2               x  -8988.640  7681.250 -1.1702053 2.421983e-01
#> 3 x:as.factor(z)2   8682.732  6670.360  1.3016885 1.933239e-01
#> 4 x:as.factor(z)3   5488.104  7472.208  0.7344688 4.628359e-01

tidy(lm(y~ as.factor(z)+ x:as.factor(z), DT))
#>              term     estimate std.error    statistic      p.value
#> 1     (Intercept) 539837.42605  41834.66 12.904069970 2.464753e-35
#> 2   as.factor(z)2 -38945.31370  54239.54 -0.718024374 4.729110e-01
#> 3   as.factor(z)3 -70280.63443  57378.57 -1.224858572 2.209187e-01
#> 4 as.factor(z)1:x -19071.47164  12166.52 -1.567537331 1.173074e-01
#> 5 as.factor(z)2:x    -52.63436  10312.06 -0.005104157 9.959285e-01
#> 6 as.factor(z)3:x   5886.15152  12694.31  0.463684419 6.429754e-01

tidy(lm(y~ x*as.factor(z), DT))
#>              term  estimate std.error  statistic      p.value
#> 1     (Intercept) 539837.43  41834.66 12.9040700 2.464753e-35
#> 2               x -19071.47  12166.52 -1.5675373 1.173074e-01
#> 3   as.factor(z)2 -38945.31  54239.54 -0.7180244 4.729110e-01
#> 4   as.factor(z)3 -70280.63  57378.57 -1.2248586 2.209187e-01
#> 5 x:as.factor(z)2  19018.84  15948.75  1.1924969 2.333511e-01
#> 6 x:as.factor(z)3  24957.62  17583.22  1.4194002 1.560959e-01

from broom.

simonthelwall avatar simonthelwall commented on August 20, 2024

I agree with @matthieugomez, I would also like to see tidy handle interactions. My current workflow is to perform the regression and then calculate stratum-specific effects with the survey package, for each interaction term. If this could all be wrapped up in a tidy() function, with exponentiation and confidence intervals, it would be much neater.

I've got no idea how it could be achieved though.

counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
# print(d.AD <- data.frame(treatment, outcome, counts))

glm.D93 <- glm(counts ~ outcome * treatment, family = poisson())
#anova(glm.D93)
summary(glm.D93)

require(survey)
svycontrast(glm.D93, c("outcome2" = 1, "outcome3:treatment3" = 1))
exp(svycontrast(glm.D93, c("outcome2" = 1, "outcome3:treatment3" = 1))[[1]])

from broom.

nutterb avatar nutterb commented on August 20, 2024

@matthieugomez, Here's a sort-of solution that gets the two columns you want using regular expressions. Based on the common rules for variable names, I think that this will work for nearly all factor specifications because there really isn't a whole lot of transforming you can do to a factor. (I bring this up because I don't handle things like + and ^ in the regular expressions. I can't think of an example where one would use such a thing with a factor)

But I'll agree with @dgrtwo that this is probably not something suited to what broom is trying to accomplish. At least not in my understanding. Also, this solution is model specific, so while broom can combine the results of multiple models, the code below has to be applied for each model independently.

mtcars2 <- transform(mtcars,
                     am = factor(am, 0:1, c("Manual", "Automatic")))
Hmisc::label(mtcars2[, c("mpg", "am", "qsec", "wt")], self=FALSE) <- 
  c("Miles per Gallon", "Transmission", "Quarter Mile Time", "Weight")

fit0 <- lm(mpg ~ factor(am) * qsec + wt, data=mtcars)
tbl0 <- tidy(fit0)

makeRegex <- function(fit){
  fctr <- attributes(terms(fit))$dataClasses
  fctr <- names(fctr)[fctr == "factor"]
  fctr_regex <- paste0(fctr, collapse="|")
  fctr_regex <- gsub("[(]", "[(]", fctr_regex)
  fctr_regex <- gsub("[)]", "[)]", fctr_regex)
  fctr_regex <- gsub("[.]", "[.]", fctr_regex)
  fctr_regex <- paste0("(", fctr_regex, ")")
  return(fctr_regex)
}

fr <- makeRegex(fit0)

tbl0$level <- gsub(fr, "", tbl0$term)
tbl0$term_alt <- sapply(tbl0$term, 
                        function(x, reg){
                          x <- unlist(strsplit(x, ":"))
                          x <- ifelse(grepl(reg, x), 
                                      stringr::str_extract(x, reg),
                                      x)
                          x <- paste0(x, collapse=":")
                        },
                        fr)
tbl0

fit <- lm(mpg ~ am * qsec + wt, data=mtcars2)
tbl <- tidy(fit)

fr <- makeRegex(fit)

tbl$level <- gsub(fr, "", tbl$term)
tbl$term_alt <- sapply(tbl$term, 
                        function(x, reg){
                          x <- unlist(strsplit(x, ":"))
                          x <- ifelse(grepl(reg, x), 
                                      stringr::str_extract(x, reg),
                                      x)
                          x <- paste0(x, collapse=":")
                        },
                        fr)
tbl

from broom.

alexpghayes avatar alexpghayes commented on August 20, 2024

I agree that this would be nice, but think it's out of scope for broom at the moment. I see broom as tidying summary methods, etc, that produce non-tidy output. Making those methods produce more useful information in the first place is hugely valuable, but not something for broom to take on at the moment.

As an aside, I think a there's a potential for blockbuster package if you reimagined the interface for linear models and got it right.

from broom.

github-actions avatar github-actions commented on August 20, 2024

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

from broom.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.