GithubHelp home page GithubHelp logo

goals for 0.0.7 about ggstatsplot HOT 22 CLOSED

indrajeetpatil avatar indrajeetpatil commented on June 25, 2024
goals for 0.0.7

from ggstatsplot.

Comments (22)

ibecav avatar ibecav commented on June 25, 2024 1

from ggstatsplot.

ibecav avatar ibecav commented on June 25, 2024 1

No I don't think any practitioner will be confused. I just teach and tend to be very careful until it sinks into their heads. Today will be slow (meetings) but I'll keep slogging along.

And to your earlier comment about having users simply use scale as part of the formula to lm that does not produce the classic standardized coefficients I'm used to in my discipline but as Gelman notes there are actually multiple ways you can standardize. I'm used to this one:

screen shot 2018-10-09 at 08 43 19

lm(formula = mpg ~ wt * disp, data = mtcars)
#> 
#> Call:
#> lm(formula = mpg ~ wt * disp, data = mtcars)
#> 
#> Coefficients:
#> (Intercept)           wt         disp      wt:disp  
#>    44.08200     -6.49568     -0.05636      0.01171
lm(formula = scale(mpg) ~ scale(wt) * scale(disp), data = mtcars)
#> 
#> Call:
#> lm(formula = scale(mpg) ~ scale(wt) * scale(disp), data = mtcars)
#> 
#> Coefficients:
#>           (Intercept)              scale(wt)            scale(disp)  
#>               -0.2026                -0.6161                -0.3845  
#> scale(wt):scale(disp)  
#>                0.2355
lm.beta::lm.beta(lm(formula = mpg ~ wt * disp, data = mtcars))
#> 
#> Call:
#> lm(formula = mpg ~ wt * disp, data = mtcars)
#> 
#> Standardized Coefficients::
#> (Intercept)          wt        disp     wt:disp 
#>    0.000000   -1.054555   -1.158954    1.296692
lsr::standardCoefs(lm(formula = mpg ~ wt * disp, data = mtcars))
#>                   b      beta
#> wt      -6.49567966 -1.054555
#> disp    -0.05635816 -1.158954
#> wt:disp  0.01170542  1.296692

Created on 2018-10-09 by the reprex package (v0.2.1)

from ggstatsplot.

ibecav avatar ibecav commented on June 25, 2024 1

I'll try and get at least some done before Thanksgiving.

from ggstatsplot.

IndrajeetPatil avatar IndrajeetPatil commented on June 25, 2024

@ibecav I think, for now, we can stop tweaking the vignettes because the package size is now 3.3 MB, way below the CRAN limit of 5 MB.

I think the thing that really needs to be implemented asap is tests; only 14% of the code has code coverage, which is far from ideal.

Do you mind working on adding more tests? I will work in parallel on adding a new function for dot plots.

from ggstatsplot.

ibecav avatar ibecav commented on June 25, 2024

from ggstatsplot.

IndrajeetPatil avatar IndrajeetPatil commented on June 25, 2024

Sounds good. Also, note that purrr_examples vignette has also changed over the weekend.

from ggstatsplot.

ibecav avatar ibecav commented on June 25, 2024

from ggstatsplot.

IndrajeetPatil avatar IndrajeetPatil commented on June 25, 2024

@ibecav In light of the discussion about lm_effsize_ci bug, I was thinking that it'll be a good idea if you can give the vignette for ggcoefstats a read. It can really benefit from your statistical expertise as I am not really well-versed in all nuances surrounding various kinds of regression models.

(Please don't remove any examples because, right now, this is the only way to check which regression models are supported by this function. We can downsize it only if the need arises.)

from ggstatsplot.

ibecav avatar ibecav commented on June 25, 2024

I took a look today and definitely see the need. Wow that's a lot of vignette! I'll see if I can add some value.

from ggstatsplot.

ibecav avatar ibecav commented on June 25, 2024

from ggstatsplot.

IndrajeetPatil avatar IndrajeetPatil commented on June 25, 2024

Yeah, I agree that that vignette is pretty big, but that was deliberate on my part. I just want the user to have the impression that any regression model they can think of is supported by this function!

Thanks for pointing me to CGPfunctions. Looks pretty cool!
I especially love the ggslopegraph function. I am definitely considering including slopegraphs in ggstatsplot to show categorical data since a lot of people don't want to use pie charts, for good reasons of course.

I've been thinking about adding support for more complex factorial designs (not just 2-way, but any kind of design the user can think of) in ggstatsplot by 1.0.0. release, something along the lines of ggbetweenstats in terms of plot design. The closest thing I have seen is the new afex_plot() function that is mighty general in terms of what kinds of factorial designs it can display and also satisfying with regards to the plot design:
https://cran.r-project.org/web/packages/afex/vignettes/afex_plot_introduction.html

from ggstatsplot.

ibecav avatar ibecav commented on June 25, 2024

Thanks for the tip on afex. A little disappointed that it doesn't accept many of the standard model objects.

I have been going over the vignette for ggcoefstats. Some points for discussion before I start editing:

  1. Is more or less a feature request. Would you consider supporting standardized coefficients? If the predictors are even one or two orders of magnitude different like wt & disp in mtcars it throws the plot off

  2. Any problem if I significantly reorder things?

  3. Long term would you consider breaking the vignette up into smaller chunks if it made sense topically?

Thanks.

set.seed(123)
# your current example
ggstatsplot::ggcoefstats(x = lm(formula = mpg ~ cyl * am, data = mtcars))

# a different exmaple some would say more classic regression
ggstatsplot::ggcoefstats(x = lm(formula = mpg ~ wt * disp, data = mtcars))

# Note R squared
summary(lm(formula = mpg ~ cyl * am, data = mtcars))
#> 
#> Call:
#> lm(formula = mpg ~ cyl * am, data = mtcars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -6.5255 -1.2820 -0.0191  1.6301  5.9745 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  30.8735     3.1882   9.684 1.95e-10 ***
#> cyl          -1.9757     0.4485  -4.405 0.000141 ***
#> am           10.1754     4.3046   2.364 0.025258 *  
#> cyl:am       -1.3051     0.7070  -1.846 0.075507 .  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.939 on 28 degrees of freedom
#> Multiple R-squared:  0.7852, Adjusted R-squared:  0.7621 
#> F-statistic: 34.11 on 3 and 28 DF,  p-value: 1.73e-09
# Note R squared
summary(lm(formula = mpg ~ wt * disp, data = mtcars))
#> 
#> Call:
#> lm(formula = mpg ~ wt * disp, data = mtcars)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -3.267 -1.677 -0.836  1.351  5.017 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 44.081998   3.123063  14.115 2.96e-14 ***
#> wt          -6.495680   1.313383  -4.946 3.22e-05 ***
#> disp        -0.056358   0.013239  -4.257  0.00021 ***
#> wt:disp      0.011705   0.003255   3.596  0.00123 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.455 on 28 degrees of freedom
#> Multiple R-squared:  0.8501, Adjusted R-squared:  0.8341 
#> F-statistic: 52.95 on 3 and 28 DF,  p-value: 1.158e-11
# stanardized coefficents mean the x axis scale is less problematic
# and make results clearer when thing are on different scales
lsr::standardCoefs(lm(formula = mpg ~ cyl * am, data = mtcars))
#>                b       beta
#> cyl    -1.975735 -0.5854553
#> am     10.175407  0.8424555
#> cyl:am -1.305116 -0.5871095
lsr::standardCoefs(lm(formula = mpg ~ wt * disp, data = mtcars))
#>                   b      beta
#> wt      -6.49567966 -1.054555
#> disp    -0.05635816 -1.158954
#> wt:disp  0.01170542  1.296692

Created on 2018-10-08 by the reprex package (v0.2.1)

from ggstatsplot.

IndrajeetPatil avatar IndrajeetPatil commented on June 25, 2024
  1. Yes, this is something I've would like to implement for sure. Maybe even for the 0.0.7 release.
    (With this approach: http://www.stat.columbia.edu/~gelman/research/published/standardizing7.pdf)

This is already implemented in this function from dotwhisker package, but I am wondering if I should write it myself and avoid another dependency or just import this function:
https://github.com/fsolt/dotwhisker/blob/master/R/by_2sd.R

I was kind of hoping that the users would do this themselves. For instance, for the example you provided:

ggstatsplot::ggcoefstats(x = lm(
  formula = scale(mpg) ~ scale(wt) * scale(disp),
  data = mtcars
))

  1. No problem if you reorder, but please don't remove any of the existing models.

  2. I kind of like the current "one vignette per function" approach. If you are worried about the increasing size of the vignette and the toll it's going to take on the speed of CRAN checks, we can just move this vignette to a website article and point to it in the additional vignette.

I've already started doing this:
https://indrajeetpatil.github.io/ggstatsplot/articles/additional.html

from ggstatsplot.

ibecav avatar ibecav commented on June 25, 2024

from ggstatsplot.

IndrajeetPatil avatar IndrajeetPatil commented on June 25, 2024

from ggstatsplot.

ibecav avatar ibecav commented on June 25, 2024

beta versus b hat has nothing to do with standardizing more about population versus sample

https://stats.stackexchange.com/questions/210543/what-is-the-difference-between-beta-1-and-hat-beta-1

from ggstatsplot.

IndrajeetPatil avatar IndrajeetPatil commented on June 25, 2024

from ggstatsplot.

IndrajeetPatil avatar IndrajeetPatil commented on June 25, 2024

@ibecav Thinking of doing a minor release by the end of this month to make some important bug fixes available to the users. Will you have time to add a few more tests, especially for the subtitle making functions that are at the heart of everything (https://indrajeetpatil.github.io/ggstatsplot/reference/index.html#section-helper-functions-statistics-subtitles-)? Their code is currently not covered by any tests.

It will be pretty straightforward, just like with the tests you had written for the effect size functions and shouldn't take a lot of time.

I've been working on a new function to display post-hoc comparisons for ggbetweenstats and that will keep me occupied until the next release.

from ggstatsplot.

ibecav avatar ibecav commented on June 25, 2024

I did an initial pull request for one test. Take a look and then I can work the others.

from ggstatsplot.

IndrajeetPatil avatar IndrajeetPatil commented on June 25, 2024

Just had a closer look at all the testthat files you added and modified few things.

For future reference, can you please structure all tests so that they follow the points listed below-

  1. Try not to have any lints in the code. So, for example, no line should go above 90 characters limit; better to have each argument on a separate line. The lintr bot will catch it, but better to do it on our side anyway.
  2. Add a comment for each test and not for a full block of tests.
  3. If you are copy-pasting tests from one file to another, make sure to change the comments (e.g., you were talking about checking bayes factor in a file containing tests for robust anovas and Pearson's chi-squared test).
  4. Always have a test for sample size n. This is going to be crucial when NAs are present in the dataset.
  5. All subtitle making functions are exported, so no need to have ::: call.

This will also make them easier to debug or change in case the functions being tested themselves change.

We are making a good headway towards increating the code coverage!
https://indrajeetpatil.github.io/ggstatsplot/articles/tests_and_coverage.html

from ggstatsplot.

ibecav avatar ibecav commented on June 25, 2024

from ggstatsplot.

IndrajeetPatil avatar IndrajeetPatil commented on June 25, 2024

@ibecav I have added some new tests in for ggbetweenstats helpers, which brought to my attention that all the new helpers I had added for pairwise comparisons have 0% code coverage! :(
https://github.com/IndrajeetPatil/ggstatsplot/blob/master/R/helpers_pairwise_comparison.R

If you get time, can you add tests for these as well? I am working on adding tests for ggcoefstats, which is going to take a lot of tests to have complete code coverage.

I am planning to submit 0.0.7 to CRAN after thanksgiving, most probably on the 28th of November. So, if you add more tests until then, they will be part of this release. If not, 0.0.8.

from ggstatsplot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.