rsquaredacademy / descriptr Goto Github PK
View Code? Open in Web Editor NEWGenerate descriptive statistics
Home Page: https://descriptr.rsquaredacademy.com/
License: Other
Generate descriptive statistics
Home Page: https://descriptr.rsquaredacademy.com/
License: Other
ftable
returned by freq_table()
must be a data.frame or tibble instead of matrix.
> ecom <- readr::read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/web.csv',
+ col_types = list(col_integer(),
+ col_factor(levels = c('bing', 'direct', 'google', 'social', 'yahoo')),
+ col_factor(levels = c('tablet', 'laptop', 'mobile')),
+ col_factor(levels = c('true', 'false')), col_integer(), col_double(),
+ col_double(), col_character(), col_factor(levels = c('true', 'false')),
+ col_double(), col_double())
+ )
> freq_table(ecom$referrer)
$ftable
Levels Frequency Cum Frequency Percent Cum Percent
[1,] "bing" "194" "194" "19.4" "19.4"
[2,] "direct" "191" "385" "19.1" "38.5"
[3,] "google" "208" "593" "20.8" "59.3"
[4,] "social" "200" "793" "20" "79.3"
[5,] "yahoo" "207" "1000" "20.7" "100"
$varname
[1] "referrer"
attr(,"class")
[1] "freq_table"
Redesign and rename
ds_oway_tables()
to ds_auto_tabulate()
ds_tway_tables()
to ds_auto_cross_table()
Both the functions should allow users to specify a subset of columns to be used.
Add the following:
In the presence of missing values, ds_summary_stats()
does not show the correct number of observations and corresponding missing values.
library(descriptr)
mt <- mtcarz
mt[2, 1] <- NA
ds_summary_stats(mt, mpg)
#> Univariate Analysis
#>
#> N 31.00 Variance 37.51
#> Missing 0.00 Std Deviation 6.12
#> Mean 20.06 Range 23.50
#> Median 19.20 Interquartile Range 7.45
#> Mode 10.40 Uncorrected SS 13601.31
#> Trimmed Mean 19.92 Corrected SS 1125.19
#> Skewness 0.68 Coeff Variation 30.53
#> Kurtosis -0.10 Std Error Mean 1.10
#>
#> Quantiles
#>
#> Quantile Value
#>
#> Max 33.90
#> 99% 33.45
#> 95% 31.40
#> 90% 30.40
#> Q3 22.80
#> Median 19.20
#> Q1 15.35
#> 10% 14.30
#> 5% 11.85
#> 1% 10.40
#> Min 10.40
#>
#> Extreme Values
#>
#> Low High
#>
#> Obs Value Obs Value
#> 14 10.4 19 33.9
#> 15 10.4 17 32.4
#> 23 13.3 18 30.4
#> 6 14.3 27 30.4
#> 16 14.7 25 27.3
Merge ds_multi_summary_stats()
into ds_summary_stats()
.
plot.ds_data_summary()
should create plots for the following data types:
numeric/integer
factors
tway_tables()
returns the following error:
> ecom <- readr::read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/web.csv',
+ col_types = list(col_integer(),
+ col_factor(levels = c('bing', 'direct', 'google', 'social', 'yahoo')),
+ col_factor(levels = c('tablet', 'laptop', 'mobile')),
+ col_factor(levels = c('true', 'false')), col_integer(), col_double(),
+ col_double(), col_character(), col_factor(levels = c('true', 'false')),
+ col_double(), col_double())
+ )
> tway_tables(ecom)
Cell Contents
|---------------|
| Frequency |
| Percent |
| Row Pct |
| Col Pct |
|---------------|
Total Observations: 1000
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
Called from: sort.list(y)
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252
[3] LC_MONETARY=English_India.1252 LC_NUMERIC=C
[5] LC_TIME=English_India.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] descriptr_0.1.1 readr_1.1.1 dplyr_0.7.2 bindrcpp_0.2
loaded via a namespace (and not attached):
[1] Rcpp_0.12.11 learnr_0.9 compiler_3.4.1
[4] git2r_0.15.0 bindr_0.1 tools_3.4.1
[7] digest_0.6.10 jsonlite_1.5 evaluate_0.10
[10] memoise_1.0.0.9001 tibble_1.3.4 pkgconfig_2.0.1
[13] rlang_0.1.2 rstudioapi_0.6 shiny_1.0.4
[16] curl_2.2 yaml_2.1.14 withr_1.0.2
[19] stringr_1.2.0 httr_1.3.1 knitr_1.17
[22] hms_0.2 htmlwidgets_0.9 devtools_1.13.3
[25] tidyselect_0.1.1 rprojroot_1.2-10 glue_1.1.1
[28] R6_2.2.2 rmarkdown_1.6 tidyr_0.7.0
[31] purrr_0.2.3 skimr_0.9000 magrittr_1.5
[34] backports_1.0.4 htmltools_0.3.6 colformat_0.0.0.9000
[37] rsconnect_0.8.5 assertthat_0.2.0 mime_0.5
[40] xtable_1.8-0 httpuv_1.3.5 stringi_1.1.2
[43] crayon_1.3.2.9000 markdown_0.8
Rename ds_multi_stats()
to ds_tidy_stats()
.
Soft deprecate all dist_*
functions.
Generate descriptive statistics for a set of continuous/numeric variables.
Generate automated report for descriptive statistics. Check the report package.
Prepare for release:
devtools::check_win_devel()
rhub::check_for_cran()
Perform release:
devtools::check_win_devel()
(again!)devtools::submit_cran()
pkgdown::build_site()
Wait for CRAN...
Template from r-lib/usethis#338
Remove shiny application from the inst
folder.
Rename ds_auto_summary()
to ds_auto_summary_stats()
.
Remove all functions deprecated in 0.4.1
This is a request for an enhancement in the ds_launch_shiny_app()
:
Users should be able to select plotting library from the following:
ds_multi_stats()
throws an error in the presence of missing values.
library(descriptr)
mt <- mtcarz
mt[c(3, 6, 9, 12), c(2, 3, 5, 6, 8, 10)] <- NA
ds_multi_stats(mt, disp, hp, mpg)
#> Error in summarise_impl(.data, dots): Evaluation error: missing values and NaN's not allowed if 'na.rm' is FALSE.
Example
example_ds <- data.frame(
col_integer = c(2L, 2L, 2L, 5L, 5L),
col_numeric = c(1.9, NA, 2.9, 9.1, 9.6),
col_ordinal = ordered(c("S", "V", "V", "S", NA)),
col_factor = factor(c("R", "G", "G", "B", "B")),
col_logical = c(FALSE, TRUE, TRUE, FALSE, TRUE),
col_character = c("-", "-", "some text", "-", "-"),
col_date = Sys.Date(),
col_time = Sys.time(),
stringsAsFactors = FALSE
)
descriptr::ds_screener(example_ds)
This example fais with error:
Error in max(sapply(lengths[[i]], nchar)) :
invalid 'type' (list) of argument
Update app with the new api
One common function for frequency tables. Merge ds_freq_cont()
into ds_freq_table()
.
Create kable()
friendly output for the following:
ds_summary_stats()
ds_screener()
ds_cross_table()
ds_freq_table()
ds_freq_cont()
ds_group_summary()
ds_oway_tables()
ds_tway_tables()
The ds_summary_stats()
function should be redesigned to:
Move shiny app to xplorerr package.
Redesign ds_multi_stats()
to work generate summary statistics for all continuous variables in the data set.
Improve code coverage by adding tests for plots.
Rename ds_auto_tabulation()
to ds_auto_freq_table()
.
ds_auto_summary()
should identify and generate appropriate summary for the following data types:
numeric/integer
factor
Remove redundant comments and commented out code from the following:
In ds_freq_table()
, add a row at the end of the frequency table to display the frequency of NA
values.
We are contacting you because you are the maintainer of descriptr, which imports ggplot2 and uses vdiffr to manage visual test cases. The upcoming release of ggplot2 includes several improvements to plot rendering, including the ability to specify lineend
and linejoin
in geom_rect()
and geom_tile()
, and improved rendering of text. These improvements will result in subtle changes to your vdiffr dopplegangers when the new version is released.
Because vdiffr test cases do not run on CRAN by default, your CRAN checks will still pass. However, we suggest updating your visual test cases with the new version of ggplot2 as soon as possible to avoid confusion. You can install the development version of ggplot2 using remotes::install_github("tidyverse/ggplot2")
.
If you have any questions, let me know!
The following should accept multiple arguments or detect all continuous variables in the data set and return
metrics for all of them.
ds_measures_location()
ds_measures_variation()
ds_measures_symmetry()
ds_percentiles()
Use rlang
equivalents for errors, warnings and messages.
User friendly error messages (check here).
The normal distribution plot does not change when the mean and standard deviation are changed.
Generate all plots using ggplot2.
Integrate the descriptive statistics report template from reportr.
Modify all ds_*
functions to handle missing values.
oway_tables()
returns the following error:
> ecom <- readr::read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/web.csv',
+ col_types = list(col_integer(),
+ col_factor(levels = c('bing', 'direct', 'google', 'social', 'yahoo')),
+ col_factor(levels = c('tablet', 'laptop', 'mobile')),
+ col_factor(levels = c('true', 'false')), col_integer(), col_double(),
+ col_double(), col_character(), col_factor(levels = c('true', 'false')),
+ col_double(), col_double())
+ )
> oway_tables(ecom)
Error in freq_table2.default(factors.df[, i], nam[i]) :
(list) object cannot be coerced to type 'double'
Called from: freq_table2.default(factors.df[, i], nam[i])
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252
[3] LC_MONETARY=English_India.1252 LC_NUMERIC=C
[5] LC_TIME=English_India.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] descriptr_0.1.1 readr_1.1.1 dplyr_0.7.2 bindrcpp_0.2
loaded via a namespace (and not attached):
[1] Rcpp_0.12.11 learnr_0.9 compiler_3.4.1
[4] git2r_0.15.0 bindr_0.1 tools_3.4.1
[7] digest_0.6.10 jsonlite_1.5 evaluate_0.10
[10] memoise_1.0.0.9001 tibble_1.3.4 pkgconfig_2.0.1
[13] rlang_0.1.2 rstudioapi_0.6 shiny_1.0.4
[16] curl_2.2 yaml_2.1.14 withr_1.0.2
[19] stringr_1.2.0 httr_1.3.1 knitr_1.17
[22] hms_0.2 htmlwidgets_0.9 devtools_1.13.3
[25] tidyselect_0.1.1 rprojroot_1.2-10 glue_1.1.1
[28] R6_2.2.2 rmarkdown_1.6 tidyr_0.7.0
[31] purrr_0.2.3 skimr_0.9000 magrittr_1.5
[34] backports_1.0.4 htmltools_0.3.6 colformat_0.0.0.9000
[37] rsconnect_0.8.5 assertthat_0.2.0 mime_0.5
[40] xtable_1.8-0 httpuv_1.3.5 stringi_1.1.2
[43] crayon_1.3.2.9000 markdown_0.8
ds_freq_cont()
should return a tibble for further usage.
Use the standard template for README:
Continue to launch shiny app from xplorerr package. Check whether suggested packages are available and offer to install missing packages.
Add an interactive tutorial using learnr.
Add a function to return the current version of the package on CRAN and GitHub.
Modify all ds_*
functions to handle missing values.
Error in 90th and 95th percentile in ds_summary_stats
.
Error in the grand total displayed in ds_cross_table()
in the presence of NA's.
library(descriptr)
mt <- mtcarz
mt[c(3, 6, 9, 12), c(2, 3, 5, 6, 8, 10)] <- NA
ds_cross_table(mt, gear, cyl)
#> Cell Contents
#> |---------------|
#> | Frequency |
#> | Percent |
#> | Row Pct |
#> | Col Pct |
#> |---------------|
#>
#> Total Observations: 32
#>
#> ----------------------------------------------------------------------------
#> | | cyl |
#> ----------------------------------------------------------------------------
#> | gear | 4 | 6 | 8 | Row Total |
#> ----------------------------------------------------------------------------
#> | 3 | 1 | 1 | 11 | 13 |
#> | | 0.031 | 0.031 | 0.344 | |
#> | | 0.08 | 0.08 | 0.85 | 0.41 |
#> | | 0.11 | 0.17 | 0.85 | |
#> ----------------------------------------------------------------------------
#> | 4 | 6 | 4 | 0 | 10 |
#> | | 0.188 | 0.125 | 0 | |
#> | | 0.6 | 0.4 | 0 | 0.31 |
#> | | 0.67 | 0.67 | 0 | |
#> ----------------------------------------------------------------------------
#> | 5 | 2 | 1 | 2 | 5 |
#> | | 0.062 | 0.031 | 0.062 | |
#> | | 0.4 | 0.2 | 0.4 | 0.16 |
#> | | 0.22 | 0.17 | 0.15 | |
#> ----------------------------------------------------------------------------
#> | Column Total | 9 | 6 | 13 | 32 |
#> | | 0.281 | 0.187 | 0.406 | |
#> ----------------------------------------------------------------------------
Return plot objects instead of printing. Use the argument print_plot
with the default value TRUE
.
Generate summary statistics for combination of levels of two or more categorical variables.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.