lindeloev / tests-as-linear Goto Github PK
View Code? Open in Web Editor NEWCommon statistical tests are linear models (or: how to teach stats)
Home Page: https://lindeloev.github.io/tests-as-linear/
Common statistical tests are linear models (or: how to teach stats)
Home Page: https://lindeloev.github.io/tests-as-linear/
Usually, we use a symmetric matrix for the Sigma
parameter of MASS::mvrnorm
function.
# Fixed correlation
D_correlation = data.frame(MASS::mvrnorm(30, mu = c(0.9, 0.9),
Sigma = matrix(c(1, 0.8, 1, 0.8), ncol = 2), empirical = TRUE)) # Correlated data
I think it should be Sigma = matrix(c(1, 0.8, 0.8, 0.8), ncol = 2)
I find the "Exact?" column of the "Common statistical tests are linear models" pdf to be somewhat misleading, since the "Exact?" column links to simulations that show correspondence for sufficiently large n. My concerns would be alleviated if the column were renamed "Correspondence" or "Equivalence".
I really appreciate this project: nice work!
First, just want to say that i love this! So thanks for the work.
Maybe you can add a little note for the section on correlations:
There is a difference between corr(x,y)
and lm(y ~ 1 + x)
. Correlation is commutative corr(x,y) = corr(y,x)
but lm(y ~ 1 + x) โ lm(x ~ 1 + y)
. It's especially relevant when both x
and y
contain measurement error.
A good reference for this is here:
https://elifesciences.org/articles/00638
The tls
package in R provides one option for computing this.
The HTML formatting seems to have disappeared for all but Kruskal-Wallis.
BAD:
https://lindeloev.github.io/tests-as-linear/simulate_spearman.html
https://lindeloev.github.io/tests-as-linear/simulate_mannwhitney.html
https://lindeloev.github.io/tests-as-linear/simulate_wilcoxon.html
GOOD:
https://lindeloev.github.io/tests-as-linear/simulate_kruskall.html
Following the tweet, I have been made aware of many excellent ressources. This issue just serves to collect them before I add them somewhere.
https://www.middleprofessor.com/files/applied-biostatistics_bookdown/_book/ looks like a solid intro to linear modeling equivalent to the stats 101 models. Downsides: there is little visualization, and no mention of non-parametric (i think?), and a lot more sampling theory. Check if there are worked examples.
https://siminab.github.io/2018/01/10/everything-in-statistical-modeling-can-be-seen-as-a-regression/ contains the basics, but likely too superficial.
https://www.ncbi.nlm.nih.gov/pubmed/20063905 looks like an excellent academic discussion of rote learning vs. modeling.
Under section 5.1.4 in https://lindeloev.github.io/tests-as-linear/#51_independent_t-test_and_mann-whitney_u, it says to notice the identical t, df and estimates but the df are not identical. Is the t even in the tables? Is it the mean?
Or is that meant to refer to the results of the Mann Whitney U in section 5.1.5?
Or am I confused about the same thing as #17?
Make these figures, and include them in the cheat sheet.
Hi
This link https://www.uni-tuebingen.de/fileadmin/Uni_Tuebingen/SFB/SFB_833/A_Bereich/A1/Christoph_Scheepers_-_Statistikworkshop.pdf
doesn't work anymore. Thanks
Thanks very much for the R code and explanation of the GLM !
I think pretty cool to let people understand all those statistics in GLM way
Though the R code is easy and clear , is it possible to add python code for reference ?
I list below as those I can find (mostly scipy, , but the syntax is NOT as beautiful as R ...
[P] : Pearson correlation : scipy.stats.pearsonr(x, y)
https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.pearsonr.html
[N] Spearman correlation : scipy.stats.spearmanr
https://en.wikipedia.org/wiki/Spearman's_rank_correlation_coefficient
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html
[P] Two-sample t test : scipy.stats.ttest_ind
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html
[P] Welch's t-test
Need DIY: https://pythonfordatascience.org/welch-t-test-python-pandas/
[N] Mann-Whitney rank test : scipy.stats.mannwhitneyu
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html
[P] : One-way ANOVA : scipy.stats.f_oneway
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html
[N] : Kruskal-Walis : scipy.stats.kruskal
https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.stats.kruskal.html
[P] One-way ANCOVA : smf.ols(formula='y ~ a + b + c' , data=df).fit()
[P] Two-way ANOVA : smf.ols(formula= 'y ~ C(a)*C(b)', df).fit()
[N] Chi-squared test : scipy.stats.chi2_contingency
https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.chi2_contingency.html
example https://pythonfordatascience.org/chi-square-test-of-independence-python/
[N] Goodness to fit
As seen at the opening source code block in 2 Settings and toy data, rnorm_fixed is a function defined as
rnorm_fixed = function(N, mu = 0, sd = 1) scale(rnorm(N)) * sd + mu
. Scaling something only to unscale it right after is confusing; rnorm(N, mean = mu, sd = sd)
should do just fine.
Maybe I'm misunderstanding it, but shouldn't the one-way ANOVA null hypothesis be
Add this to the worked examples. Also, perhaps demonstrate/discuss how to model unequal expected frequencies cf https://twitter.com/matthewlewis896/status/1115545191300120576
Make a good figure and icon, and include it in the cheat sheet.
Hi!
In the first paragraph of Section 7, there is a statement:
See this nice introduction to Chi-Square tests as linear models.
for which the link is broken.
I have not been able to find the document elsewhere.
Thanks for this wonderful resource.
The simulated data is currently balanced, normal, with approx. equal variances and no correlation. The results should generalize with deviations from these. If this can be implemented in a way that does not obfuscate the real message/argument, it would be an improvement.
Hi, thanks for this great resource! I'm working through the book now.
Can I confirm that the p-values published in the table of section '4.1.3 R code: Wilcoxon signed-rank test' are correct? I get different p-values for both the Wilcoxon test and the linear model using signed ranks (0.2628 and 02650 respectively). I have been able to replicate all other tests in the book so far using the toy data set. Thanks.
This is a great cheat sheet and comparison of the methods that you've made. Thanks for taking the time to think about it and write it up!
One small comment....
I'm sure you're aware, but aov is just a wrapper for lm with some specific settings (e.g. Helmut contrasts) and print/summary methods to approximate a classical ANOVA table, so it's be difficult for the models to return something different... the way section 6.1.3 is written at the moment feels a bit like you're surprised that they yield the same thing.
Cheers!
This site is fantastic - please can I share a couple of minor errors:
In the table, the degrees of freedom for the t test are showing as 48, instead of 98.
The confidence intervals are also mismatching, but I can make them match exactly if the linear model CIs on beta_1 were used (if the directionality were reversed on either the t-test or the lm).
Thank you!
It currently says "one intercept per group". Make it clear that only one group (plus the intercept/reference group) is "turned on" for any given y
.
The second line of your table (i.e. lm) are with wrong values, but the results from R are right.
Congratulations for this great tutorial!
Thanks for this great resource! In the section on Pearson Correlation, what is rank(x)
? Did I miss it? If not, I suggest to elaborate on this in the text as I am probably not the only one with this question. Awesome work!
Common name should be Kruskal-Wallis (sheet has an extra L)
Linear Model in Words should read "Same, but it predicts the rank of y" (currently reads "signed rank")
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.