When the formula contains terms such as : or , it would be nice to split term in two

I agree with <a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I agree that this would be nice, but think it's out of scope for <code class="notransl

Factor variables about broom HOT 6 CLOSED

matthieugomez commented on August 20, 2024

Factor variables

from broom.

Comments (6)

dgrtwo commented on August 20, 2024

I see what you mean but this seems like something that would best be handled in post-tidying dplyr steps. An analysis could be arbitrarily complicated, with multiple interaction terms in different combinations, and any operation to plot them would need to filter out and manipulate terms anyway.

Also, I can't reproduce the above example. Why does the z factor have one term rather than having one for each level? And it's not clear what the z column in the output represents: why is it z in one case but 1/2/3 in others (If it's a factor, does it simply have an extra level relative to the z input?) Could you suggest a reproducible example?

from broom.

matthieugomez commented on August 20, 2024

I see. I now handle that using regex, but I was just wondering if something cleaner was possible. It corresponds a little bit to tidyr::separate

Attached are examples (the second row in the first post corresponds to the intercept (edited))

N=1e3; K=100
set.seed(1)
DT <- data.table(
  z = sample(3, N/K, TRUE),
  x =  sample(5, N, TRUE),
  y =  sample(1e6, N, TRUE)                       
)
tidy(lm(y~ x+ x:as.factor(z), DT))
#>              term   estimate std.error  statistic      p.value
#> 1     (Intercept) 501831.257 22033.010 22.7763371 9.041980e-93
#> 2               x  -8988.640  7681.250 -1.1702053 2.421983e-01
#> 3 x:as.factor(z)2   8682.732  6670.360  1.3016885 1.933239e-01
#> 4 x:as.factor(z)3   5488.104  7472.208  0.7344688 4.628359e-01

tidy(lm(y~ as.factor(z)+ x:as.factor(z), DT))
#>              term     estimate std.error    statistic      p.value
#> 1     (Intercept) 539837.42605  41834.66 12.904069970 2.464753e-35
#> 2   as.factor(z)2 -38945.31370  54239.54 -0.718024374 4.729110e-01
#> 3   as.factor(z)3 -70280.63443  57378.57 -1.224858572 2.209187e-01
#> 4 as.factor(z)1:x -19071.47164  12166.52 -1.567537331 1.173074e-01
#> 5 as.factor(z)2:x    -52.63436  10312.06 -0.005104157 9.959285e-01
#> 6 as.factor(z)3:x   5886.15152  12694.31  0.463684419 6.429754e-01

tidy(lm(y~ x*as.factor(z), DT))
#>              term  estimate std.error  statistic      p.value
#> 1     (Intercept) 539837.43  41834.66 12.9040700 2.464753e-35
#> 2               x -19071.47  12166.52 -1.5675373 1.173074e-01
#> 3   as.factor(z)2 -38945.31  54239.54 -0.7180244 4.729110e-01
#> 4   as.factor(z)3 -70280.63  57378.57 -1.2248586 2.209187e-01
#> 5 x:as.factor(z)2  19018.84  15948.75  1.1924969 2.333511e-01
#> 6 x:as.factor(z)3  24957.62  17583.22  1.4194002 1.560959e-01

from broom.

simonthelwall commented on August 20, 2024

I agree with @matthieugomez, I would also like to see tidy handle interactions. My current workflow is to perform the regression and then calculate stratum-specific effects with the survey package, for each interaction term. If this could all be wrapped up in a tidy() function, with exponentiation and confidence intervals, it would be much neater.

I've got no idea how it could be achieved though.

counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
# print(d.AD <- data.frame(treatment, outcome, counts))

glm.D93 <- glm(counts ~ outcome * treatment, family = poisson())
#anova(glm.D93)
summary(glm.D93)

require(survey)
svycontrast(glm.D93, c("outcome2" = 1, "outcome3:treatment3" = 1))
exp(svycontrast(glm.D93, c("outcome2" = 1, "outcome3:treatment3" = 1))[[1]])

from broom.

nutterb commented on August 20, 2024

@matthieugomez, Here's a sort-of solution that gets the two columns you want using regular expressions. Based on the common rules for variable names, I think that this will work for nearly all factor specifications because there really isn't a whole lot of transforming you can do to a factor. (I bring this up because I don't handle things like + and ^ in the regular expressions. I can't think of an example where one would use such a thing with a factor)

But I'll agree with @dgrtwo that this is probably not something suited to what broom is trying to accomplish. At least not in my understanding. Also, this solution is model specific, so while broom can combine the results of multiple models, the code below has to be applied for each model independently.

mtcars2 <- transform(mtcars,
                     am = factor(am, 0:1, c("Manual", "Automatic")))
Hmisc::label(mtcars2[, c("mpg", "am", "qsec", "wt")], self=FALSE) <- 
  c("Miles per Gallon", "Transmission", "Quarter Mile Time", "Weight")

fit0 <- lm(mpg ~ factor(am) * qsec + wt, data=mtcars)
tbl0 <- tidy(fit0)

makeRegex <- function(fit){
  fctr <- attributes(terms(fit))$dataClasses
  fctr <- names(fctr)[fctr == "factor"]
  fctr_regex <- paste0(fctr, collapse="|")
  fctr_regex <- gsub("[(]", "[(]", fctr_regex)
  fctr_regex <- gsub("[)]", "[)]", fctr_regex)
  fctr_regex <- gsub("[.]", "[.]", fctr_regex)
  fctr_regex <- paste0("(", fctr_regex, ")")
  return(fctr_regex)
}

fr <- makeRegex(fit0)

tbl0$level <- gsub(fr, "", tbl0$term)
tbl0$term_alt <- sapply(tbl0$term, 
                        function(x, reg){
                          x <- unlist(strsplit(x, ":"))
                          x <- ifelse(grepl(reg, x), 
                                      stringr::str_extract(x, reg),
                                      x)
                          x <- paste0(x, collapse=":")
                        },
                        fr)
tbl0

fit <- lm(mpg ~ am * qsec + wt, data=mtcars2)
tbl <- tidy(fit)

fr <- makeRegex(fit)

tbl$level <- gsub(fr, "", tbl$term)
tbl$term_alt <- sapply(tbl$term, 
                        function(x, reg){
                          x <- unlist(strsplit(x, ":"))
                          x <- ifelse(grepl(reg, x), 
                                      stringr::str_extract(x, reg),
                                      x)
                          x <- paste0(x, collapse=":")
                        },
                        fr)
tbl

from broom.

alexpghayes commented on August 20, 2024

I agree that this would be nice, but think it's out of scope for broom at the moment. I see broom as tidying summary methods, etc, that produce non-tidy output. Making those methods produce more useful information in the first place is hugely valuable, but not something for broom to take on at the moment.

As an aside, I think a there's a potential for blockbuster package if you reimagined the interface for linear models and got it right.

from broom.

github-actions commented on August 20, 2024

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

from broom.

Factor variables about broom HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs