Comments (22)
from ggstatsplot.
No I don't think any practitioner will be confused. I just teach and tend to be very careful until it sinks into their heads. Today will be slow (meetings) but I'll keep slogging along.
And to your earlier comment about having users simply use scale
as part of the formula to lm
that does not produce the classic standardized coefficients I'm used to in my discipline but as Gelman notes there are actually multiple ways you can standardize. I'm used to this one:
lm(formula = mpg ~ wt * disp, data = mtcars)
#>
#> Call:
#> lm(formula = mpg ~ wt * disp, data = mtcars)
#>
#> Coefficients:
#> (Intercept) wt disp wt:disp
#> 44.08200 -6.49568 -0.05636 0.01171
lm(formula = scale(mpg) ~ scale(wt) * scale(disp), data = mtcars)
#>
#> Call:
#> lm(formula = scale(mpg) ~ scale(wt) * scale(disp), data = mtcars)
#>
#> Coefficients:
#> (Intercept) scale(wt) scale(disp)
#> -0.2026 -0.6161 -0.3845
#> scale(wt):scale(disp)
#> 0.2355
lm.beta::lm.beta(lm(formula = mpg ~ wt * disp, data = mtcars))
#>
#> Call:
#> lm(formula = mpg ~ wt * disp, data = mtcars)
#>
#> Standardized Coefficients::
#> (Intercept) wt disp wt:disp
#> 0.000000 -1.054555 -1.158954 1.296692
lsr::standardCoefs(lm(formula = mpg ~ wt * disp, data = mtcars))
#> b beta
#> wt -6.49567966 -1.054555
#> disp -0.05635816 -1.158954
#> wt:disp 0.01170542 1.296692
Created on 2018-10-09 by the reprex package (v0.2.1)
from ggstatsplot.
I'll try and get at least some done before Thanksgiving.
from ggstatsplot.
@ibecav I think, for now, we can stop tweaking the vignettes because the package size is now 3.3 MB
, way below the CRAN
limit of 5 MB
.
I think the thing that really needs to be implemented asap is tests; only 14% of the code has code coverage, which is far from ideal.
Do you mind working on adding more tests? I will work in parallel on adding a new function for dot plots.
from ggstatsplot.
from ggstatsplot.
Sounds good. Also, note that purrr_examples
vignette has also changed over the weekend.
from ggstatsplot.
from ggstatsplot.
@ibecav In light of the discussion about lm_effsize_ci
bug, I was thinking that it'll be a good idea if you can give the vignette for ggcoefstats
a read. It can really benefit from your statistical expertise as I am not really well-versed in all nuances surrounding various kinds of regression models.
(Please don't remove any examples because, right now, this is the only way to check which regression models are supported by this function. We can downsize it only if the need arises.)
from ggstatsplot.
I took a look today and definitely see the need. Wow that's a lot of vignette! I'll see if I can add some value.
from ggstatsplot.
from ggstatsplot.
Yeah, I agree that that vignette is pretty big, but that was deliberate on my part. I just want the user to have the impression that any regression model they can think of is supported by this function!
Thanks for pointing me to CGPfunctions
. Looks pretty cool!
I especially love the ggslopegraph
function. I am definitely considering including slopegraphs in ggstatsplot
to show categorical data since a lot of people don't want to use pie charts, for good reasons of course.
I've been thinking about adding support for more complex factorial designs (not just 2-way, but any kind of design the user can think of) in ggstatsplot
by 1.0.0.
release, something along the lines of ggbetweenstats
in terms of plot design. The closest thing I have seen is the new afex_plot()
function that is mighty general in terms of what kinds of factorial designs it can display and also satisfying with regards to the plot design:
https://cran.r-project.org/web/packages/afex/vignettes/afex_plot_introduction.html
from ggstatsplot.
Thanks for the tip on afex. A little disappointed that it doesn't accept many of the standard model objects.
I have been going over the vignette for ggcoefstats. Some points for discussion before I start editing:
-
Is more or less a feature request. Would you consider supporting standardized coefficients? If the predictors are even one or two orders of magnitude different like wt & disp in mtcars it throws the plot off
-
Any problem if I significantly reorder things?
-
Long term would you consider breaking the vignette up into smaller chunks if it made sense topically?
Thanks.
set.seed(123)
# your current example
ggstatsplot::ggcoefstats(x = lm(formula = mpg ~ cyl * am, data = mtcars))
# a different exmaple some would say more classic regression
ggstatsplot::ggcoefstats(x = lm(formula = mpg ~ wt * disp, data = mtcars))
# Note R squared
summary(lm(formula = mpg ~ cyl * am, data = mtcars))
#>
#> Call:
#> lm(formula = mpg ~ cyl * am, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -6.5255 -1.2820 -0.0191 1.6301 5.9745
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 30.8735 3.1882 9.684 1.95e-10 ***
#> cyl -1.9757 0.4485 -4.405 0.000141 ***
#> am 10.1754 4.3046 2.364 0.025258 *
#> cyl:am -1.3051 0.7070 -1.846 0.075507 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.939 on 28 degrees of freedom
#> Multiple R-squared: 0.7852, Adjusted R-squared: 0.7621
#> F-statistic: 34.11 on 3 and 28 DF, p-value: 1.73e-09
# Note R squared
summary(lm(formula = mpg ~ wt * disp, data = mtcars))
#>
#> Call:
#> lm(formula = mpg ~ wt * disp, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.267 -1.677 -0.836 1.351 5.017
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 44.081998 3.123063 14.115 2.96e-14 ***
#> wt -6.495680 1.313383 -4.946 3.22e-05 ***
#> disp -0.056358 0.013239 -4.257 0.00021 ***
#> wt:disp 0.011705 0.003255 3.596 0.00123 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.455 on 28 degrees of freedom
#> Multiple R-squared: 0.8501, Adjusted R-squared: 0.8341
#> F-statistic: 52.95 on 3 and 28 DF, p-value: 1.158e-11
# stanardized coefficents mean the x axis scale is less problematic
# and make results clearer when thing are on different scales
lsr::standardCoefs(lm(formula = mpg ~ cyl * am, data = mtcars))
#> b beta
#> cyl -1.975735 -0.5854553
#> am 10.175407 0.8424555
#> cyl:am -1.305116 -0.5871095
lsr::standardCoefs(lm(formula = mpg ~ wt * disp, data = mtcars))
#> b beta
#> wt -6.49567966 -1.054555
#> disp -0.05635816 -1.158954
#> wt:disp 0.01170542 1.296692
Created on 2018-10-08 by the reprex package (v0.2.1)
from ggstatsplot.
- Yes, this is something I've would like to implement for sure. Maybe even for the
0.0.7
release.
(With this approach: http://www.stat.columbia.edu/~gelman/research/published/standardizing7.pdf)
This is already implemented in this function from dotwhisker
package, but I am wondering if I should write it myself and avoid another dependency or just import this function:
https://github.com/fsolt/dotwhisker/blob/master/R/by_2sd.R
I was kind of hoping that the users would do this themselves. For instance, for the example you provided:
ggstatsplot::ggcoefstats(x = lm(
formula = scale(mpg) ~ scale(wt) * scale(disp),
data = mtcars
))
-
No problem if you reorder, but please don't remove any of the existing models.
-
I kind of like the current "one vignette per function" approach. If you are worried about the increasing size of the vignette and the toll it's going to take on the speed of
CRAN
checks, we can just move this vignette to a website article and point to it in the additional vignette.
I've already started doing this:
https://indrajeetpatil.github.io/ggstatsplot/articles/additional.html
from ggstatsplot.
from ggstatsplot.
from ggstatsplot.
beta versus b hat has nothing to do with standardizing more about population versus sample
from ggstatsplot.
from ggstatsplot.
@ibecav Thinking of doing a minor release by the end of this month to make some important bug fixes available to the users. Will you have time to add a few more tests, especially for the subtitle making functions that are at the heart of everything (https://indrajeetpatil.github.io/ggstatsplot/reference/index.html#section-helper-functions-statistics-subtitles-)? Their code is currently not covered by any tests.
It will be pretty straightforward, just like with the tests you had written for the effect size functions and shouldn't take a lot of time.
I've been working on a new function to display post-hoc comparisons for ggbetweenstats
and that will keep me occupied until the next release.
from ggstatsplot.
I did an initial pull request for one test. Take a look and then I can work the others.
from ggstatsplot.
Just had a closer look at all the testthat files you added and modified few things.
For future reference, can you please structure all tests so that they follow the points listed below-
- Try not to have any lints in the code. So, for example, no line should go above 90 characters limit; better to have each argument on a separate line. The
lintr
bot will catch it, but better to do it on our side anyway. - Add a comment for each test and not for a full block of tests.
- If you are copy-pasting tests from one file to another, make sure to change the comments (e.g., you were talking about checking bayes factor in a file containing tests for robust anovas and Pearson's chi-squared test).
- Always have a test for sample size
n
. This is going to be crucial whenNA
s are present in the dataset. - All subtitle making functions are exported, so no need to have
:::
call.
This will also make them easier to debug or change in case the functions being tested themselves change.
We are making a good headway towards increating the code coverage!
https://indrajeetpatil.github.io/ggstatsplot/articles/tests_and_coverage.html
from ggstatsplot.
from ggstatsplot.
@ibecav I have added some new tests in for ggbetweenstats
helpers, which brought to my attention that all the new helpers I had added for pairwise comparisons have 0% code coverage! :(
https://github.com/IndrajeetPatil/ggstatsplot/blob/master/R/helpers_pairwise_comparison.R
If you get time, can you add tests for these as well? I am working on adding tests for ggcoefstats
, which is going to take a lot of tests to have complete code coverage.
I am planning to submit 0.0.7
to CRAN
after thanksgiving, most probably on the 28th of November. So, if you add more tests until then, they will be part of this release. If not, 0.0.8
.
from ggstatsplot.
Related Issues (20)
- [ggbetweenstats] mtcars example: BA ($caption_data) not reported HOT 4
- ggbetweenstats: Welch's ANOVA producing NAs HOT 5
- zero-length inputs cannot be mixed with those of non-zero length with ggbetweenstats because of StatsExpression 1.5.2 HOT 3
- text = element_text(size = .) can not change sample size label size HOT 2
- package ‘ggstatsplot’ is not available (for R version 3.6.3) HOT 1
- Set group.var from string failed HOT 3
- p-value arrow-heads not being displayed in graph HOT 1
- Missing specify_decimal_p() function HOT 1
- Outlier values included within min-max range of boxplot HOT 4
- User question about dynamic names
- ggstatsplot installation issue. HOT 3
- Setting scale for histograms in ggscatterstats
- Significance is indicated by “*” instead of a specific value HOT 1
- Simpler statistical results HOT 3
- Pairwise comparisons not showing HOT 3
- "Removing" the violin plot from `ggbetweenstats` does not really remove it, but rather adds a thin line on the plot. HOT 5
- ggpiestats Cramer's V upper confidence intervals is always 1, and it shouldn't be HOT 2
- packages not installing in Rstudio HOT 1
- Invalid class "ddenseModelMatrix" object HOT 1
- Error in `filter()`: ! In argument: `!is.na(x)`. Caused by error: HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ggstatsplot.