GithubHelp home page GithubHelp logo

ICC report about report HOT 17 OPEN

easystats avatar easystats commented on May 9, 2024
ICC report

from report.

Comments (17)

strengejacke avatar strengejacke commented on May 9, 2024

I think the issue here is more difficult. The ICC that is mentioned in the paper from Koo & Li refers to Anovas, and it's calcualted in a very different way and can't be interpreted in the same way (although the "Anova" ICC and the mixed model ICC go into similar directions).

It's not optimal to have an index with identical names, though they refer to something different. For mixed models, an ICC above .15 or .2 is "high", while an ICC below .05 is "low" - although these are very rough rules of thumbs.

For mixed models, the ICC is not related to reliability. It just tells you how much of the explained variance of a model can be accounted to the random effects (i.e. the grouping structure). Ad far as I know, there are just a few software packages that can calculate the ICC for mixed models, and the approach from Nagakawa is only available in R (and via scripts in Stata, I think).

If the ICC is low, the outcome of a model has almost no variation in the defined groups (random effects), while the outcome has a large variation depending on the grouping factor when the ICC is high. For a low ICC, a simple model would return similar estimates as compared to a mixed model. For a high ICC, the estimates would differ more. But there's no general rule what ICC you need. In fact, you can justify a mixed model just by theoretical consideration, even if the ICC is low. You never lose anything when using a mixed model instead of a simple model. I personally use the ICC to see the variation of the outcome in different groups, and maybe investigate this variation further, if it's of interest. Here is a paper from me where we used the ICC to analyse variation of our outcome between countries.

from report.

DominiqueMakowski avatar DominiqueMakowski commented on May 9, 2024

I see. Within this context, would you recommend the reporting of the ICC for mixed models? If so, what would be a good generic, yet informative sentence?

from report.

strengejacke avatar strengejacke commented on May 9, 2024

I would report the ICC for mixed models. Maybe something like

"X% of the variance can explained by the model's grouping structure (random effects)."

If the ICC is below 5%, I would probably add:

"The ICC is rather low, indicating not much variation in the outcome between the groups."

And then the ICC value. However, CI for the ICC is a bit more difficult, we would need something like bootstrapping or so...

from report.

strengejacke avatar strengejacke commented on May 9, 2024

I guess, the time consuming part is performance::null_model(), where the model is updated to calculate the NULL model. This may take up to some seconds, and bootstrapping multiplicates this by a large factor. Currently I have no good idea how to speed up this process...

from report.

DominiqueMakowski avatar DominiqueMakowski commented on May 9, 2024

"X% of the variance can be explained by the model's grouping structure (random effects)."

Although consistent with its statistical definition, this formulation provides information that could be seen as overlapping with the two R2s (marginal and conditional). Would you recommend reporting both the R2s and ICC? Or only one or the other? If both, how to emphasize the eventual difference between the two?

On a loosely related note, have you ever heard or seen the use of a ratio between the variance explained by the random effects and the total variance explained by the model, that would be used to understand the relative weight of random effects to the model (independently of its absolute explanatory power)? Would it make sense?

from report.

strengejacke avatar strengejacke commented on May 9, 2024

ratio between the variance explained by the random effects and the total variance explained by the model

That's actually the definition of the ICC for mixed models.

The ICC is calculated by dividing the between-group-variance (random intercept variance) by the total variance (i.e. sum of between-group-variance and within-group (residual) variance).

The difference between ICC and R2 is that ICC relates to the variance of the random effects, while R2 relates to the "overall" variance of the models (i.e. including fixed effects variances). They're indeed very similar, but still relate to different parts of the model.

I would report both ICC and marginal and conditional R2. The wording would differ only slightly: the ICC refers to the grouping structure of the model (random effects), i.e. how much of the total model variance (R2) is explained by the random effects (ICC).

from report.

DominiqueMakowski avatar DominiqueMakowski commented on May 9, 2024

Aaaaa I missed that part! It is much clearer now. I got confused somehow by the meaning of total variance.

Anyway, I will try to add a basis for reports for frequentists mixed models, it will be easier to see how to add the ICC then.

from report.

strengejacke avatar strengejacke commented on May 9, 2024

I got confused somehow by the meaning of total variance.

Especially when reading the paper from Nagakawa et al. (2017), where he extended the compuation to models other than Gaussian. Here we have also the distributional variance etc., so it's indeed a bit confusing. And I'm no expert in reading formulas, as I'm no statistican but mostly self-taught. ;-)

from report.

DominiqueMakowski avatar DominiqueMakowski commented on May 9, 2024

[Moved from #15] Here's the current short and long report for lmer:

> report(circus::download_model("lmerMod_1"))

We fitted a linear mixed model to predict wt with cyl. The model included gear as random effects. The 
model's total explanatory power is substantial (conditional R2 = 0.63) and the part related to the fixed 
effects alone (marginal R2) is of 0.55. The model's intercept is at 0.65. 

Within this model: 
  - cyl is significant (beta = 0.40, 95% CI [-0.39, 1.20], p < .001) and medium (std. beta = 0.74).



> report(circus::download_model("lmerMod_1")) %>% to_fulltext()

We fitted a linear mixed model to predict wt with cyl (formula = wt ~ cyl). The model included gear 
as random effects (formula = ~1 | gear). Confidence intervals (CIs) and p values were computed using 
Wald approximation. Effect sizes were labelled following Cohen's (1988) recommendations. The model's 
total explanatory power is substantial (conditional R2 = 0.63) and the part related to the fixed effects 
alone (marginal R2) is of 0.55. The model's intercept, corresponding to cyl = 0, is at 0.65 (t = 1.29, 95% CI 
[-0.63, 1.93], p > .1). 

Within this model: 
  - cyl is positive, significant (beta = 0.40, SE = 0.08, t = 5.29, 95% CI [-0.39, 1.20], p < .001) and 
medium (std. beta = 0.74, std. SE = 0.14, std. 95% CI [, ]).

How could we phrase the ICC report?

"X% of the variance can explained by the model's grouping structure (random effects)."

It seems to me a bit close/overlapping with the R2s, but maybe that's because I am quite new to ICCs

from report.

strengejacke avatar strengejacke commented on May 9, 2024

Well, that actually fit's well into the report in that phrase, and would complement the missing part of "information on variances", I think.

from report.

DominiqueMakowski avatar DominiqueMakowski commented on May 9, 2024

Re-looking at it, I think there is still something I don't get. I have trouble clarifying the difference between the ICC interpreted this way and the R2...

Let's take this model for example:

> model_performance(circus::lmerMod_1)
       AIC      BIC R2_conditional R2_marginal ICC_adjusted ICC_conditional
1 71.59892 77.46187      0.6275509   0.5502792    0.1718216      0.07727173

From what I understand from this:

  • The full model (fixed + random) explains 62% (conditional R2) of the variance of the outcome
  • The fixed parts (total - random) explains 55% (marginal R2) of the variance of the outcome

In that case, I would deduct that the random effects explain 62-55=7% of the variance of the outcome, which is itself 7/62=11% of the total variance explained by the model. However, the suggested values based on ICC would say "the proportion of the variance explained by the grouping structure" (the random?) is 17% (adjusted ICC), which appears as not coherent with the previous values. Moreover, as the ICC conditional is supposed to encompass both fixed and random effects, how can it be lower than the adjusted ICC?

from report.

strengejacke avatar strengejacke commented on May 9, 2024

We should look at the formulas closely. The major difference is that ICC has not the fixed effects variance in the denominator. From the "unexplained" variances, i.e. the variances not related to the fixed part, x% are explained by the random effects. R2 has the fixed effects variance in the denominator and refers to the total model variance.

Nakagawa only report the conditional ICC once in their paper. I'm not sure when this makes sense, but usually we want to know the adjusted ICC.

That's how I understand the difference.

from report.

DominiqueMakowski avatar DominiqueMakowski commented on May 9, 2024

But if the adjusted ICC can be inferred straightforwardly from the two R2s, isn't it a redundant index? And in that case, isn't the ratio between the explained variance by the random (ICC conditional = R2 cond - R2 marg = 0.077 in the above example) and the variance explained by the model (R2 cond = 0.62), in this case 12%, a more interesting relative index of the importance of the grouping structure to the model rather than to the outcome?

from report.

strengejacke avatar strengejacke commented on May 9, 2024

The adjusted ICC cannot be directly inferred from the r-squared values.

Originally, the ICC was only calculated for the NULL models, ie without fixed effects. And only for random intercept models. Thus there were only the residual and the random effects variation. ICC was the random effects variation divided by the sum of residual and random effects variation.

Later the ICC was extended to models with fixed effects and also with more complex random effects structures, that's why it's now more similar to an r-squared value, unlike the "original" ICC.

ICC and R2 are related, but still can't be derived from each other. The ICC relates to the random effects and to the between subject or between group variation. That's probably the leading distinction.

from report.

strengejacke avatar strengejacke commented on May 9, 2024

Quote from Nakagawa 2017

it is not surprising that the coefficient of determination R2 is a commonly reported statistic, because it represents the proportion of variance explained by a linear model. The intra-class correlation coefficient (ICC) is a related statistic that quantifies the proportion of variance explained by a grouping (random) factor in multilevel/hierarchical data.

from report.

DominiqueMakowski avatar DominiqueMakowski commented on May 9, 2024

@strengejacke I am truly sorry to annoy you with this particular index, but I am trying to get it right (and I reckon that other users might have the same questions), and reading Nakagawa over and over doesn't seem to help.

I understand that ICC and R2s are not computed in the same way and not statistically related. However, in all of the following models, the conditional ICC can be derived from the two R2s:

perf <- performance::model_performance(circus::lmerMod_1)
testthat::expect_equal(perf$R2_conditional-perf$R2_marginal, perf$ICC_conditional)

perf <- performance::model_performance(circus::merMod_1)
testthat::expect_equal(perf$R2_conditional-perf$R2_marginal, perf$ICC_conditional)

perf <- performance::model_performance(circus::merMod_2)
testthat::expect_equal(perf$R2_conditional-perf$R2_marginal, perf$ICC_conditional)

Is it because there are no random slopes in these models (only random intercepts?)

Anyway, about the ICC report, for the following model:

perf <- performance::model_performance(lme4::lmer(Sepal.Length ~ Sepal.Width + (1|Species), data=iris))
perf
       AIC      BIC R2_conditional R2_marginal ICC_adjusted ICC_conditional
1 202.6361 214.6787      0.8560323  0.09061311     0.841687       0.7654192

If I follow you, you would suggest a report along those lines:

"The total model explained a proportion of variance (R2 conditional) of .86. Within this model, the fixed factors alone (R2 marginal), and the grouping random factor (ICC adjusted), explained a proportion explained a proportion of variance of respectively .09 and .84"

is that correct?

from report.

strengejacke avatar strengejacke commented on May 9, 2024

Ok, let's try to summarize a bit. Maybe your example is a bit misleading, because you get the ICC results by adding / substracting R2-results. But I think this is just coincidence.

r2_marginal <- vars$var.fixef / (vars$var.fixef + vars$var.ranef + vars$var.resid)
r2_conditional <- (vars$var.fixef + vars$var.ranef) / (vars$var.fixef + vars$var.ranef + vars$var.resid)
icc_adjusted <- vars$var.ranef / (vars$var.ranef + vars$var.resid)
icc_conditional <- vars$var.ranef / (vars$var.fixef + vars$var.ranef + vars$var.resid)

The denominator for both R2 and conditional ICC are the same, but the nominator between R2 and ICC are different. That's why you usually should not be able to directly derive ICC values from R2 and vice versa.

Here's an example with a simple random intercept model and the "old" ICC-function, as well as the new approach from Nakagawa:

library(lme4)
data("sleepstudy")
m1 <- lme4::lmer(Reaction ~ Days + (1 | Subject), data = sleepstudy)

sjstats::icc(m1)
#>   ICC (Subject): 0.5893

sjstats::icc(m1, adjusted = TRUE)
#>    Adjusted ICC: 0.5893
#> Conditional ICC: 0.4244

performance::icc(m1)
#> $ICC_adjusted
#> [1] 0.5893089
#> 
#> $ICC_conditional
#> [1] 0.4243698

performance::r2_nakagawa(m1)
#> $R2_conditional
#> [1] 0.7042555
#> 
#> $R2_marginal
#> [1] 0.2798856

As you can see, adjusted ICC and "normal" ICC (i.e. random effects variance divided by random effects + residual variance) are identical (that's to be expected).

I think the difficulty is that the ICC refers to the ratio of the between-group-variance and sum of between-group and within-group variance, and thus only accounts for the random parts of the model. The residual variance we are talking here about is not comparable to a "classical" residual variance from a simple linear model, because the residual variance in mixed models (also) refers to the random effects. This is why it is difficult to have a r-squared measure for mixed models. I think, Nakagawa came from ICC, added the fixed effects variances to the formula to approximate something that can be considered as "R2"...

For binomial models, the "classical" and Nakagawa ICC are also identical:

library(lme4)
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
              data = cbpp, family = binomial)

performance::icc(gm1)
#> $ICC_adjusted
#> [1] 0.1113561
#> 
#> $ICC_conditional
#> [1] 0.1018504

sjstats::icc(gm1)
#>   ICC (herd): 0.1114

sjstats::icc(gm1, adjusted = TRUE)
#>    Adjusted ICC: 0.1114
#> Conditional ICC: 0.1019

The adjusted ICC differs from the "classical" ICC for models with nested random effects and / or random slopes, and for models other than Gaussian or binomial, but this is just a sidenote.

from report.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.