GithubHelp home page GithubHelp logo

rsquaredacademy / descriptr Goto Github PK

View Code? Open in Web Editor NEW
34.0 5.0 11.0 11.37 MB

Generate descriptive statistics

Home Page: https://descriptr.rsquaredacademy.com/

License: Other

R 100.00%
descriptive-statistics rstats eda summary-statistics

descriptr's People

Contributors

aravindhebbali avatar gegznav avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

descriptr's Issues

ftable returned by freq_table must be data.frame or tibble

ftable returned by freq_table() must be a data.frame or tibble instead of matrix.

> ecom <- readr::read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/web.csv', 
+   col_types = list(col_integer(), 
+   col_factor(levels = c('bing', 'direct', 'google', 'social', 'yahoo')), 
+   col_factor(levels = c('tablet', 'laptop', 'mobile')), 
+   col_factor(levels = c('true', 'false')), col_integer(), col_double(), 
+   col_double(), col_character(), col_factor(levels = c('true', 'false')), 
+   col_double(), col_double())            
+ )
> freq_table(ecom$referrer)
$ftable
     Levels   Frequency Cum Frequency Percent Cum Percent
[1,] "bing"   "194"     "194"         "19.4"  "19.4"     
[2,] "direct" "191"     "385"         "19.1"  "38.5"     
[3,] "google" "208"     "593"         "20.8"  "59.3"     
[4,] "social" "200"     "793"         "20"    "79.3"     
[5,] "yahoo"  "207"     "1000"        "20.7"  "100"      

$varname
[1] "referrer"

attr(,"class")
[1] "freq_table"

Redesign tabulation

Redesign and rename

  • ds_oway_tables() to ds_auto_tabulate()
  • ds_tway_tables() to ds_auto_cross_table()

Both the functions should allow users to specify a subset of columns to be used.

Documentation

Add the following:

  • Contributing Guide
  • Issue Template

ds_summary_stats does not show missing values

In the presence of missing values, ds_summary_stats() does not show the correct number of observations and corresponding missing values.

library(descriptr)
mt <- mtcarz
mt[2, 1] <- NA
ds_summary_stats(mt, mpg)
#>                         Univariate Analysis                          
#> 
#>  N                       31.00      Variance                37.51 
#>  Missing                  0.00      Std Deviation            6.12 
#>  Mean                    20.06      Range                   23.50 
#>  Median                  19.20      Interquartile Range      7.45 
#>  Mode                    10.40      Uncorrected SS       13601.31 
#>  Trimmed Mean            19.92      Corrected SS          1125.19 
#>  Skewness                 0.68      Coeff Variation         30.53 
#>  Kurtosis                -0.10      Std Error Mean           1.10 
#> 
#>                               Quantiles                               
#> 
#>               Quantile                            Value                
#> 
#>              Max                                  33.90                
#>              99%                                  33.45                
#>              95%                                  31.40                
#>              90%                                  30.40                
#>              Q3                                   22.80                
#>              Median                               19.20                
#>              Q1                                   15.35                
#>              10%                                  14.30                
#>              5%                                   11.85                
#>              1%                                   10.40                
#>              Min                                  10.40                
#> 
#>                             Extreme Values                            
#> 
#>                 Low                                High                
#> 
#>   Obs                        Value       Obs                        Value 
#>   14                         10.4        19                         33.9  
#>   15                         10.4        17                         32.4  
#>   23                         13.3        18                         30.4  
#>    6                         14.3        27                         30.4  
#>   16                         14.7        25                         27.3

Plots

plot.ds_data_summary() should create plots for the following data types:

  • numeric/integer
  • factors

Multiple two way tables returns error

tway_tables() returns the following error:

> ecom <- readr::read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/web.csv', 
+   col_types = list(col_integer(), 
+   col_factor(levels = c('bing', 'direct', 'google', 'social', 'yahoo')), 
+   col_factor(levels = c('tablet', 'laptop', 'mobile')), 
+   col_factor(levels = c('true', 'false')), col_integer(), col_double(), 
+   col_double(), col_character(), col_factor(levels = c('true', 'false')), 
+   col_double(), col_double())            
+ )
> tway_tables(ecom)
    Cell Contents
 |---------------|
 |     Frequency |
 |       Percent |
 |       Row Pct |
 |       Col Pct |
 |---------------|

 Total Observations:  1000 

Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
Called from: sort.list(y)

Session Info

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252   
[3] LC_MONETARY=English_India.1252 LC_NUMERIC=C                  
[5] LC_TIME=English_India.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] descriptr_0.1.1 readr_1.1.1     dplyr_0.7.2     bindrcpp_0.2   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11         learnr_0.9           compiler_3.4.1      
 [4] git2r_0.15.0         bindr_0.1            tools_3.4.1         
 [7] digest_0.6.10        jsonlite_1.5         evaluate_0.10       
[10] memoise_1.0.0.9001   tibble_1.3.4         pkgconfig_2.0.1     
[13] rlang_0.1.2          rstudioapi_0.6       shiny_1.0.4         
[16] curl_2.2             yaml_2.1.14          withr_1.0.2         
[19] stringr_1.2.0        httr_1.3.1           knitr_1.17          
[22] hms_0.2              htmlwidgets_0.9      devtools_1.13.3     
[25] tidyselect_0.1.1     rprojroot_1.2-10     glue_1.1.1          
[28] R6_2.2.2             rmarkdown_1.6        tidyr_0.7.0         
[31] purrr_0.2.3          skimr_0.9000         magrittr_1.5        
[34] backports_1.0.4      htmltools_0.3.6      colformat_0.0.0.9000
[37] rsconnect_0.8.5      assertthat_0.2.0     mime_0.5            
[40] xtable_1.8-0         httpuv_1.3.5         stringi_1.1.2       
[43] crayon_1.3.2.9000    markdown_0.8

0.5.0 checklist

Prepare for release:

  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • Polish NEWS

Perform release:

  • Bump version (in DESCRIPTION and NEWS)
  • devtools::check_win_devel() (again!)
  • devtools::submit_cran()
  • pkgdown::build_site()
  • Approve email

Wait for CRAN...

  • Tag release
  • Bump dev version

Template from r-lib/usethis#338

Multiple plot options

Users should be able to select plotting library from the following:

  • ggplot2 (default)
  • plotly
  • rbokeh

Multiple variable statistics throws error in the presence of NA

ds_multi_stats() throws an error in the presence of missing values.

library(descriptr)
mt <- mtcarz
mt[c(3, 6, 9, 12), c(2, 3, 5, 6, 8, 10)] <- NA
ds_multi_stats(mt, disp, hp, mpg)
#> Error in summarise_impl(.data, dots): Evaluation error: missing values and NaN's not allowed if 'na.rm' is FALSE.

`descriptr:::print_screen` fails if a variable has more than one class

Example

example_ds <- data.frame(
    col_integer    = c(2L, 2L, 2L, 5L, 5L),
    col_numeric    = c(1.9, NA, 2.9, 9.1, 9.6),
    col_ordinal    = ordered(c("S", "V", "V", "S", NA)),
    col_factor     = factor(c("R", "G", "G", "B", "B")),
    col_logical    = c(FALSE, TRUE, TRUE, FALSE, TRUE),
    col_character  = c("-", "-", "some text", "-", "-"),
    col_date       = Sys.Date(),
    col_time       = Sys.time(),
    
    stringsAsFactors = FALSE
)

descriptr::ds_screener(example_ds)

This example fais with error:

Error in max(sapply(lengths[[i]], nchar)) : 
  invalid 'type' (list) of argument

kable friendly output

Create kable() friendly output for the following:

  • ds_summary_stats()
  • ds_screener()
  • ds_cross_table()
  • ds_freq_table()
  • ds_freq_cont()
  • ds_group_summary()
  • ds_oway_tables()
  • ds_tway_tables()

Redesign summary stats

The ds_summary_stats() function should be redesigned to:

  • accept multiple column names
  • identify all continuous variables in the data set
  • return multiple data.frames instead of a big list

Multiple summary statistics

Redesign ds_multi_stats() to work generate summary statistics for all continuous variables in the data set.

Auto summary

ds_auto_summary() should identify and generate appropriate summary for the following data types:

  • numeric/integer
  • factor

Forthcoming release of ggplot2 and descriptr

We are contacting you because you are the maintainer of descriptr, which imports ggplot2 and uses vdiffr to manage visual test cases. The upcoming release of ggplot2 includes several improvements to plot rendering, including the ability to specify lineend and linejoin in geom_rect() and geom_tile(), and improved rendering of text. These improvements will result in subtle changes to your vdiffr dopplegangers when the new version is released.

Because vdiffr test cases do not run on CRAN by default, your CRAN checks will still pass. However, we suggest updating your visual test cases with the new version of ggplot2 as soon as possible to avoid confusion. You can install the development version of ggplot2 using remotes::install_github("tidyverse/ggplot2").

If you have any questions, let me know!

Accept multiple arguments

The following should accept multiple arguments or detect all continuous variables in the data set and return
metrics for all of them.

  • ds_measures_location()
  • ds_measures_variation()
  • ds_measures_symmetry()
  • ds_percentiles()

Automated report

Integrate the descriptive statistics report template from reportr.

Multiple one way table returns an error

oway_tables() returns the following error:

> ecom <- readr::read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/web.csv', 
+   col_types = list(col_integer(), 
+   col_factor(levels = c('bing', 'direct', 'google', 'social', 'yahoo')), 
+   col_factor(levels = c('tablet', 'laptop', 'mobile')), 
+   col_factor(levels = c('true', 'false')), col_integer(), col_double(), 
+   col_double(), col_character(), col_factor(levels = c('true', 'false')), 
+   col_double(), col_double())            
+ )
> oway_tables(ecom)
Error in freq_table2.default(factors.df[, i], nam[i]) : 
  (list) object cannot be coerced to type 'double'
Called from: freq_table2.default(factors.df[, i], nam[i])

Session Info

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252   
[3] LC_MONETARY=English_India.1252 LC_NUMERIC=C                  
[5] LC_TIME=English_India.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] descriptr_0.1.1 readr_1.1.1     dplyr_0.7.2     bindrcpp_0.2   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11         learnr_0.9           compiler_3.4.1      
 [4] git2r_0.15.0         bindr_0.1            tools_3.4.1         
 [7] digest_0.6.10        jsonlite_1.5         evaluate_0.10       
[10] memoise_1.0.0.9001   tibble_1.3.4         pkgconfig_2.0.1     
[13] rlang_0.1.2          rstudioapi_0.6       shiny_1.0.4         
[16] curl_2.2             yaml_2.1.14          withr_1.0.2         
[19] stringr_1.2.0        httr_1.3.1           knitr_1.17          
[22] hms_0.2              htmlwidgets_0.9      devtools_1.13.3     
[25] tidyselect_0.1.1     rprojroot_1.2-10     glue_1.1.1          
[28] R6_2.2.2             rmarkdown_1.6        tidyr_0.7.0         
[31] purrr_0.2.3          skimr_0.9000         magrittr_1.5        
[34] backports_1.0.4      htmltools_0.3.6      colformat_0.0.0.9000
[37] rsconnect_0.8.5      assertthat_0.2.0     mime_0.5            
[40] xtable_1.8-0         httpuv_1.3.5         stringi_1.1.2       
[43] crayon_1.3.2.9000    markdown_0.8 

README template

Use the standard template for README:

  • Overview
  • Installation
  • Shiny App
  • Usage
  • Articles
  • Features
  • Getting Help
  • Code of Conduct

Check suggested packages

Continue to launch shiny app from xplorerr package. Check whether suggested packages are available and offer to install missing packages.

Error in cross table in presence of NA

Error in the grand total displayed in ds_cross_table() in the presence of NA's.

library(descriptr)
mt <- mtcarz
mt[c(3, 6, 9, 12), c(2, 3, 5, 6, 8, 10)] <- NA
ds_cross_table(mt, gear, cyl)
#>     Cell Contents
#>  |---------------|
#>  |     Frequency |
#>  |       Percent |
#>  |       Row Pct |
#>  |       Col Pct |
#>  |---------------|
#> 
#>  Total Observations:  32 
#> 
#> ----------------------------------------------------------------------------
#> |              |                            cyl                            |
#> ----------------------------------------------------------------------------
#> |         gear |            4 |            6 |            8 |    Row Total |
#> ----------------------------------------------------------------------------
#> |            3 |            1 |            1 |           11 |           13 |
#> |              |        0.031 |        0.031 |        0.344 |              |
#> |              |         0.08 |         0.08 |         0.85 |         0.41 |
#> |              |         0.11 |         0.17 |         0.85 |              |
#> ----------------------------------------------------------------------------
#> |            4 |            6 |            4 |            0 |           10 |
#> |              |        0.188 |        0.125 |            0 |              |
#> |              |          0.6 |          0.4 |            0 |         0.31 |
#> |              |         0.67 |         0.67 |            0 |              |
#> ----------------------------------------------------------------------------
#> |            5 |            2 |            1 |            2 |            5 |
#> |              |        0.062 |        0.031 |        0.062 |              |
#> |              |          0.4 |          0.2 |          0.4 |         0.16 |
#> |              |         0.22 |         0.17 |         0.15 |              |
#> ----------------------------------------------------------------------------
#> | Column Total |            9 |            6 |           13 |           32 |
#> |              |        0.281 |        0.187 |        0.406 |              |
#> ----------------------------------------------------------------------------

Return plot objects

Return plot objects instead of printing. Use the argument print_plot with the default value TRUE.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.