rubenarslan / codebook Goto Github PK

View Code? Open in Web Editor NEW

140.0 5.0 17.0 93.76 MB

Cook rmarkdown codebooks from metadata on R data frames

Home Page: https://rubenarslan.github.io/codebook/

License: Other

R 88.36% HTML 3.65% JavaScript 7.80% CSS 0.19%

codebook spss metadata r documentation formr webapp json-ld

codebook's People

Contributors

Stargazers

Watchers

Forkers

dubjay2k brovic romainfrancois chengchou gcmlw anhnguyendepocen lenamax2355 matherion tingli18 pherephobia nyuglobalties adamzammit martinzeschke mohammed-kj rovelazqu hadley jennybc

codebook's Issues

data frame attributes to get special columns

grouping variables (e.g. dplyr groups)
time variables (maybe some panel data frame format?)
duration/created/modified/ended/expired ?

Improvement: detailed_variables = FALSE should not print the header "Variables"

Hi,

first of all: I LOVE the codebook package you created, thank you for this.

Secondly, a minor improvement request pertaining to conditional headings: It would help to condition the printing of the headers

When switching off certain details, for example, detailed_variables = FALSE the knitted codebook in Html currently prints the header "Variables" without content beneath.
When there are no missing values the header "Missingness report" is also printed, without content beneath.

automatic ICCs/multilevel gens for single items?

or at least the possibility to add them

scale item sources

might include refs, with DOIs/links?
especially useful for metadata
OTOH most don't have a unique sourcelink and they are pretty much fully described by their item text

labelled ticks when numeric values start with 0 are shifted

Love this idea - sorely needed to have good meta-data, and will play with this and try it out. I tried to create a codebook for a large rather messy data I have, namely the GSS data (I checked with Release 2 and 3), which can be downloaded here: GSS_data
This is a rather large datafile. I ran the code below, and received an error (pasted below the code). I just wanted to let you know, but I don't find it unreasonable that codebook does not work on this datafile by default without changing anything. Nevertheless, I wanted to let you know in case the error is something worthwhile to fix.

library(haven)
library(codebook)
results = haven::read_dta("GSS7216_R2.DTA", encoding = "windows-1252")
codebook(results)

The error I get when generating a html markdown file is:

Quitting from lines 58-71 (codebook.Rmd)

Error in stringr::str_match(names(stats::na.omit(choices)), "\[?([0-9-]+)(\]|:)")[, :
subscript out of bounds
Calls: ... withCallingHandlers -> withVisible -> eval -> eval -> plot_labelled

'skim_with_defaults' is not an exported object from 'namespace:skimr' '

Thank you for the package. I usually use codebook_table(), this error was generated using codebook_browser(), but am receiving the above skimr error either way. Scrolling through issues there was mention of this, but I wasn't certain of the conclusion -- if any. Thank you, again; helpful package.

Codebook struggles with comma in character vector

---
title: "Codebook: Donations to JustGiving Fundraiser pages "
author: "David Reinstein"
output:
  html_document:
  toc: true
code_folding: 'hide'
self_contained: 
pdf_document:
  toc: yes
toc_depth: 4
latex_engine: xelatex
---


library(tidyverse)
library(codebook)


df <- tibble(x = 1:2, y = c("hello, i", "john"))

Add meta-data


metadata(df)$name <- "donation data"

Codebook




codebook(df, survey_repetition = "single", metadata_table = FALSE)

Knitting the above code (an Rmd file) in R-studio throws error:

Quitting from lines 25-76 (codebook_reprex.Rmd) 
Quitting from lines 41-46 (codebook_reprex.Rmd) 
Error in value[[3L]](cond) : 
 Could not summarise item y. Error in as.environment(where): using 'as.environment(NULL)' is defunct
Calls: <Anonymous> ... eval -> value -> value.Future -> resignalConditions
Execution halted

However, removing the comma in the first element of the y character vector in the tibble ...

df <- tibble(x = 1:2, y = c("hello i", "john"))

Does not throw this error.

item labels into cronbach's alpha summary knit_print.psych

compatibility with dplyr 1.0.0, vctrs 0.3.0, haven 2.3.1

Suggestion: Add pkgdown URL to repo URL

So that it might be easier to find the URL without scrolling down 😉

`rownames` error in `codebook_missingness`

In codebook 0.5.8 using the dev version of mice I get:

library(codebook)
data("bfi")
codebook_missingness(bfi)
Error in `rownames<-`(`*tmp*`, value = table(pat)) : 
  attempt to set 'rownames' on an object with no dimensions

Could you check?

weird mismatches between missing labels and missing tags

eg. in cognit.dta

mc_multiple tidyr::unnest

better plotting of csv values

auto-generate rdoc for datasets

Maybe this exists already. Find out:

how frequently R datasets are documented like this
existing readers
existing writers
existing translators.

@format A data frame with NNNN rows and NN variables:
\describe{
  \item{subject}{Anonymized Mechanical Turk Worker ID}
  \item{trial}{Trial number, from 1..NNN}
}

Empty html upon rendering Rmd with a codebook

Hi,
knitting the below Rmd file to HTML (copied from the tutorial) yields an empty HTML file without errors. The empty HTML occurs only for the codebook argument

   metadata_table = TRUE

and not otherwise

Specs:
Windows 10
codebook_0.8.2 as well as the github version
R version 3.6.3 (2020-02-29)
Sublimetext (not R Studio)

Rmarkdown file that I render

    ---
    title: "Test"
    author: "JBJ"
    date: "yyyy-mm-dd"
    output:
      html_document:
        toc: true
        toc_depth: 4
        toc_float: true
        code_folding: 'hide'
    ---
    ```{r setup, include=FALSE}
      library(codebook)
      knitr::opts_chunk$set(warning = FALSE, message = TRUE, error = FALSE, echo = FALSE)
    ```
    ## Demonstrating is.prime

    ```{r test-this, echo = FALSE}
      old_base_dir <- knitr::opts_knit$get("base.dir")
      knitr::opts_knit$set(base.dir = tempdir())
      on.exit(knitr::opts_knit$set(base.dir = old_base_dir))
      data("bfi")
      bfi <- bfi[, c("BFIK_open_1", "BFIK_open_1")]
    ```

    ```{r codebook}
      codebook(bfi,
        survey_repetition = "single",
        metadata_table = TRUE              # <---- causes the empty HTML
    )
    ```

This is the console log
There seems nothing wrong here AFAIS

    output file: test.knit.md

    "DIR:/Users/your_user_name/AppData/Local/Pandoc/pandoc" +RTS -K512m -RTS test.utf8.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash+smart --output test.html --email-obfuscation none --self-contained --standalone --section-divs --table-of-contents --toc-depth 4 --variable toc_float=1 --variable toc_selectors=h1,h2,h3,h4 --variable toc_collapsed=1 --variable toc_smooth_scroll=1 --variable toc_print=1 --template  
    "DIR:\Users\your_user_name\Documents\R\win-library\3.6\rmarkdown\rmd\h\default.html" --no-highlight --variable highlightjs=1 --variable "theme:bootstrap" --include-in-header
    "DIR:\Users\your_user_name\AppData\Local\Temp\RtmpIjYheO\rmarkdown-str64e045c43f64.html" --mathjax --variable "mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" --lua-filter
    "DIR:/Users/your_user_name/Documents/R/win-library/3.6/rmarkdown/rmd/lua/pagebreak.lua" --lua-filter
    "DIR:/Users/your_user_name/Documents/R/win-library/3.6/rmarkdown/rmd/lua/latex-div.lua" --variable code_folding=hide --variable code_menu=1 

    Output created: test.html
    > 
    > 
    [Finished in 4.3s]

items function/codebook table

maybe basically skimr merged with a df taken from attributes? not clear how to deal with nested attributes in this case, but whatever?

more informative error messages

My current recommendation is to set
opts_chunk$set(error = TRUE)
in the knitr chunk preceding the codebook call to find out with which variables the error happens, to make it easier to generate a reproducible example.
Unfortunately, if people don't set this, the error message they'll get will be highly unspecific and hard to trace except by divide-and-conquer. Plan: try to find out how to put the current variable name into the trace.

`detect_missing` does not work with integer columns

I was trying to use detect_missing to clean the missing data in a dataset, of which a few columns are integers. detect_missing cannot correctly label the missing values.

Consider the following dataset rd1:

rd1 <- tibble(
  x1 = haven::labelled(x = c(32L, 996L, 40L),
    labels = c("Refused to answer" = 996), label = "x1 variable (integer)"),
  x2 = haven::labelled(x = c(32, 996, 40),
    labels = c("Refused to answer" = 996), label = "x1 variable (double)")
)
# Here is the output of `rd1`:
# A tibble: 3 x 2
#                      x1                      x2
#               <int+lbl>               <dbl+lbl>
#  32                      32  
# 996 [Refused to answer] 996 [Refused to answer]
#  40                      40

The only difference between x1 and x2 is that x1 has only integers. Applying detect_missing will only affect x2, but 996 in x1 remains unchanged.

detect_missing(rd1, missing = c(996))
# # A tibble: 3 x 2
#                        x1                              x2
#                 <int+lbl>                       <dbl+lbl>
#    32                        32  
#   996 [Refused to answer] NA(a) [[996] Refused to answer]
#    40                        40  
# Warning message:
# In detect_missing(rd1, missing = c(996)) :
#   Cannot label missings for integers in variable x1

I looked into the codes of detect_missing and found that the problem was that the function haven::tagged_na does not work with vectors of integers. So you include the condition is.double in a few if statements.

If I modified these these lines (below) by removing the check for is.double, detect_missing will work for columns of integers by converting these columns of integers to column of double. I understand that converting integer to double could cause problems later, but it might not be a bad idea to add an option of letting users allow for the conversion so that missing values can be labelled correctly for integer columns.

detect_missing2 function below is a simple modification of the current detect_missing by adding an extra option force_integer = TRUE or FALSE. (The changes are highlighted.) When force_integer = TRUE, the integer columns will be converted to double and missing values will be labelled.

detect_missing2 <- function (data, only_labelled = TRUE, negative_values_are_missing = TRUE, 
    ninety_nine_problems = TRUE, learn_from_labels = TRUE, missing = c(), 
    non_missing = c(), vars = names(data), use_labelled_spss = FALSE, force_integer = FALSE) 
{
    for (i in seq_along(vars)) {
        var <- vars[i]
        if (is.numeric(data[[var]]) && any(!is.na(data[[var]]))) {
            potential_missing_values <- c()
            if (negative_values_are_missing) {
                potential_missing_values <- unique(data[[var]][data[[var]] < 
                  0])
            }
            labels <- attributes(data[[var]])$labels
            if (learn_from_labels && length(labels)) {
                numeric_representations <- as.numeric(stringr::str_match(names(labels), 
                  "\\[([0-9-]+)\\]")[, 2])
                potentially_untagged <- numeric_representations[is.na(labels)]
                potential_tags <- labels[is.na(labels)]
                if (is.double(data[[var]]) && !all(is.na(haven::na_tag(data[[var]]))) && 
                  length(intersect(potentially_untagged, data[[var]]))) {
                  # For integer vectors, their missing values cannot be tagged,
                  # so we don't need to modify the above if condition for
                  # integer vectors.
                  warning("Missing values were already tagged in ", 
                    var, ". Although", "there were further potential missing values as indicated by", 
                    "missing labels, this was not changed.")
                } else {
                  for (e in seq_along(potentially_untagged)) {
                    pot <- potentially_untagged[e]
                    data[[var]][data[[var]] == pot] <- potential_tags[e]
                  }
                }
            }
            if (ninety_nine_problems) {
                if (any(!is.na(data[[var]])) && (stats::median(data[[var]], 
                  na.rm = TRUE) + stats::mad(data[[var]], na.rm = TRUE) * 
                  5) < 99) {
                  potential_missing_values <- c(potential_missing_values, 
                    99)
                }
                if (any(!is.na(data[[var]])) && (stats::median(data[[var]], 
                  na.rm = TRUE) + stats::mad(data[[var]], na.rm = TRUE) * 
                  5) < 999) {
                  potential_missing_values <- c(potential_missing_values, 
                    999)
                }
            }
            potential_missing_values <- union(setdiff(potential_missing_values, 
                non_missing), missing)
            if ((!only_labelled || haven::is.labelled(data[[var]])) && 
                length(potential_missing_values) > 0) {
                if (only_labelled) {
                  potential_missing_values <- potential_missing_values[potential_missing_values %in% 
                    labels]
                  potential_missing_values <- union(potential_missing_values, 
                    setdiff(labels[is.na(labels)], data[[var]]))
                }
                potential_missing_values <- sort(potential_missing_values)
                with_tagged_na <- data[[var]]
                if (is.double(data[[var]])) {
                  free_na_tags <- setdiff(letters, haven::na_tag(with_tagged_na))
                } else {
                  free_na_tags <- letters
                }
                for (i in seq_along(potential_missing_values)) {
                  miss <- potential_missing_values[i]
                  if (!use_labelled_spss && !all(potential_missing_values %in% 
                    free_na_tags)) {
                    new_miss <- free_na_tags[i]
                  } else {
                    new_miss <- potential_missing_values[i]
                  }
                  that_label <- which(labels == miss)
################################################################################
                  # I replaced `is.double(data[[var]])` with `(force_integer |
                  # is.double(data[[var]]))` below
                  if (length(which(with_tagged_na == miss)) && 
                    (force_integer | is.double(data[[var]])) && !use_labelled_spss) {
                    with_tagged_na[which(with_tagged_na == miss)] <- haven::tagged_na(new_miss)
                  } else if (!force_integer & is.integer(data[[var]])) {
                    warning("Cannot label missings for integers in variable ", 
                      var, " let force_integer = TRUE if you want to label misssings for integers.")
                  }
                  if ((force_integer | is.double(data[[var]])) &&
                    length(that_label) && !use_labelled_spss) {
                    labels[that_label] <- haven::tagged_na(new_miss)
                    names(labels)[that_label] <- paste0("[", 
                      potential_missing_values[i], "] ", names(labels)[that_label])
                  }
################################################################################
                }
                if (use_labelled_spss) {
                  labels <- attributes(data[[var]])$labels
                  if (is.null(labels)) {
                    labels <- potential_missing_values
                    names(labels) <- "autodetected unlabelled missing"
                  }
                  data[[var]] <- haven::labelled_spss(data[[var]], 
                    label = attr(data[[var]], "label", TRUE), 
                    labels = labels, na_values = potential_missing_values, 
                    na_range = attr(data[[var]], "na_range", 
                      TRUE))
                } else if (haven::is.labelled(data[[var]])) {
                  data[[var]] <- haven::labelled(with_tagged_na, 
                    label = attr(data[[var]], "label", TRUE), 
                    labels = labels)
                } else {
                  data[[var]] <- with_tagged_na
                }
            }
        }
    }
    data
}

switch reliability comp to suggests

to decrease number of dependencies

Installation issue both mac and Windows machines

Hi, it's great package. But I have the following problem in installin it. I have used both Windows and Mac computers and get the message below and it doesn't install.

MacOS High Sierra and Windows 10 during the installation via remotes :

Error in utils::download.file(url, path, method = method, quiet = quiet,  : 
  cannot open URL 'https://api.github.com/repos/rubenarslan/codebook/tarball/master'

I really need to install this package.

My session info:

R version 3.5.3 (2019-03-11)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] tr_TR.UTF-8/tr_TR.UTF-8/tr_TR.UTF-8/C/tr_TR.UTF-8/tr_TR.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] remotes_2.1.0   latex2exp_0.4.0 forcats_0.4.0   stringr_1.4.0   dplyr_0.8.5     purrr_0.3.3    
 [7] readr_1.3.1     tidyr_1.0.0     tibble_2.1.3    ggplot2_3.3.0   tidyverse_1.3.0 gt_0.2.0.5     

loaded via a namespace (and not attached):
 [1] tinytex_0.18     kispaddins_0.1.0 tidyselect_1.0.0 xfun_0.12        haven_2.2.0     
 [6] lattice_0.20-38  colorspace_1.4-1 generics_0.0.2   vctrs_0.2.4      htmltools_0.4.0 
[11] yaml_2.2.1       rlang_0.4.5      pillar_1.4.3     withr_2.1.2      glue_1.3.2      
[16] DBI_1.1.0        dbplyr_1.4.2     modelr_0.1.5     readxl_1.3.1     lifecycle_0.2.0 
[21] munsell_0.5.0    gtable_0.3.0     cellranger_1.1.0 rvest_0.3.5      evaluate_0.14   
[26] knitr_1.28       curl_4.3         fansi_0.4.1      broom_0.5.3      Rcpp_1.0.3      
[31] checkmate_2.0.0  scales_1.1.0     backports_1.1.5  jsonlite_1.6     fs_1.4.1        
[36] hms_0.5.3        packrat_0.5.0    digest_0.6.25    stringi_1.4.6    bookdown_0.17.2 
[41] grid_3.5.3       cli_2.0.2        tools_3.5.3      magrittr_1.5     crayon_1.3.4    
[46] pkgconfig_2.0.3  xml2_1.2.2       reprex_0.3.0     lubridate_1.7.4  assertthat_0.2.1
[51] rmarkdown_2.0    httr_1.4.1       rstudioapi_0.11  R6_2.4.1         nlme_3.1-143    
[56] compiler_3.5.3

terms attributed to pending.schema.org aren't found

The JSON-LD generated for the vignette lists several terms as being in the context of pending.schema.org that do not actually appear to be part of pending. Perhaps you meant to define a custom context outside of schema.org to extend these terms? e.g.

 "type": "http://pending.schema.org/propertyValue",
      "http://pending.schema.org/data_summary": {
        "type": "http://formr.org/codebook/SummaryStatistics",
        "http://pending.schema.org/complete": "28",
        "http://pending.schema.org/missing": "0",
        "http://pending.schema.org/n": "28",
        "http://pending.schema.org/n_unique": "4",
        "http://pending.schema.org/ordered": "FALSE",
        "http://pending.schema.org/top_counts": "4: 15, 5: 10, 3: 2, 2: 1"
      },

(Also note some logicals and integers being typed as characters).

See on the playground

codebook() function throws an error

When I knit a codebook with the codebook() function I get the following error message (shown in the HTML output):

Error in env_bind(mask, ... = env_get(current_env(), "...")): could not find function "env_bind"

This seems to be an issue with the new version of rlang (I have 0.4.0 installed).

Alt text for figures

Can you automatically add alt-text to the distribution plots to meet WCAG accessibility standards? An alt-text of "Distribution of var" should be sufficient.

FYI, The other violations per WAVE are missing form labels in the codebook table, document language missing, and an empty table header in the missingness table.

Failure with dev version of haven

library(codebook)

packageVersion("haven")
packageVersion("codebook")

data("bfi")
bfi <- bfi[,c("BFIK_open", paste0("BFIK_open_", 1:4))]
codebook_component_scale(bfi[,1], "BFIK_open", bfi[,-1],
   reliabilities = list(BFIK_open = psych::alpha(bfi[,-1])))
#> Error: C stack usage  7969280 is too close to the limit

I'm not sure exactly what is causing this, but it's probably related to the changes to the labelled class.

Could you please take a look? I'm planning on submitting haven 2.0 to CRAN on November 7

Enhancment for survey overview documentation: Clarify that all variables must be present for summary

Small issue: It would be nice to change the Rdocumentation for survey_overview to make it explicit that all the specified variables need to be present in the data to print a summary. (For example, I usually have created and ended in my data, but only in multi-session surveys I have session, otherwise not, and I erroneously expected the summary to be printed)

codebook/R/codebook.R

Line 23 in 0cf7c55

 #' @param survey_overview whether to print an overview of survey entries, durations (depends on presence of columns session, created, modified, ended, expired) 

#' @param survey_overview whether to print an overview of survey entries and durations (only printed if the data contains five variables named session, created, modified, ended, expired)

Alternatively it would be nice to condition the summary on the columns present?

Escaping

escape html chars in item labels
escape html chars in value labels
escape html chars in item names
when using item names as anchors, make them valid
escape html in all attributes

selecting columns of codebook table

Hey, great package! Thank you for all your work!

I would like to select in the function codebook_table the columns of the table generated.

order <- c("name", "label", "type", "type_options", "data_type", "ordered",
"value_labels", "optional", "showif",
"scale_item_names",
"value", "item_order", "block_order", "class",
"missing", "complete", "n", "empty", "n_unique",
"top_counts", "count", "median", "min", "max",
"mean", "sd", "p0", "p25", "p50", "p75", "p100", "hist")

Is there already a way to do it?

Thanks again!

qualtRics support?

https://github.com/JasperHG90/qualtRics/issues/74

hide metadata display in <details>

Error compiling codebook, "Error in if (options$engine != "R") options$cache = (options$cache > 0) * : argument is of length zero"

We discussed this issue on Twitter. That thread is here.

I am trying to compile a codebook for a relatively large data file (511 variables, 23,000+ observations). There is a lot of missing data due to the design of the study.

I am trying to follow the instructions from the github page. I receive the following error.

dput() for that variable gives me a screen full of NA and the following attributes associated with the variable.

I hope this helps. Thanks!

goodpractice::gp()

test detect_missing on SOEP data

doesn't seem to work as expected

DDI

Is DDI on your radar? https://www.ddialliance.org

Over 10K databases on ICPSR alone. https://www.icpsr.umich.edu/icpsrweb/

switch behaviour based on knit child

Maybe just generate a whole document if it's not a knit child rn

safe_name dataset

otherwise, cant use this for fig/cache paths if e.g. bfi %>% select(1:3) is the df name

Error building codebook

Hello @rubenarslan

Hitherto the function codebook worked well for me, but now I'm getting an error. I suppose it must be linked to a package dependency, since I recently updated my project. The error message is as follows:

Error: No common type for `..1$by_variable$numeric.min` <labelled> and `..2$by_variable$numeric.min` <labelled>.

I'm not sure, but it looks like its having difficulty dealing with variables of type numeric.

Here's the traceback:

33. stop(cnd)
32. abort(message, .subclass = c(.subclass, "vctrs_error"), ...)
31. stop_vctrs(message, .subclass = c(.subclass, "vctrs_error_incompatible"), x = x, y = y, details = details, ...)
30. stop_incompatible(x, y, x_arg = x_arg, y_arg = y_arg, details = details, ..., message = message, .subclass = c(.subclass, "vctrs_error_incompatible_type"))
29. stop_incompatible_type(x, y, x_arg = x_arg, y_arg = y_arg)
28. vec_ptype2.default(x = x, y = y, x_arg = x_arg, y_arg = y_arg)
27. vec_type2_dispatch(x = x, y = y, x_arg = x_arg, y_arg = y_arg)
26. vec_rbind(!!!x, .ptype = ptype)
25. unchop(data, !!cols, keep_empty = keep_empty, ptype = ptype)
24. unnest.data.frame(out, .data$by_variable)
23. tidyr::unnest(out, .data$by_variable)
22. build_results(skimmed, variable_names, NULL)
21. skim_by_type.data.frame(.x[[1L]], .y[[1L]], ...)
20. .f(.x[[1L]], .y[[1L]], ...)
19. purrr::map2(.data$skimmers, .data$skim_variable, skim_by_type, data)
18. summarise_impl(.data, dots, environment(), caller_env())
17. summarise.tbl_df(grouped, skimmed = purrr::map2(.data$skimmers, .data$skim_variable, skim_by_type, data))
16. dplyr::summarize(grouped, skimmed = purrr::map2(.data$skimmers, .data$skim_variable, skim_by_type, data))
15. skim_codebook(x)
14. "skim_type" %in% names(object)
13. has_type_column(object)
12. stopifnot(has_type_column(object), has_variable_column(object), has_skimr_attributes(object), nrow(object) > 0)
11. assert_is_skim_df(data)
10. skimr::partition(skim_codebook(x))
9. exists("POSIXct", df)
8. coerce_skimmed_summary_to_character(skimr::partition(skim_codebook(x)))
7. dots_values(...)
6. flatten_bindable(dots_values(...))
5. dplyr::bind_rows(coerce_skimmed_summary_to_character(skimr::partition(skim_codebook(x))), .id = "data_type")
4. skim_to_wide_labelled(results)
3. codebook_table(results)
2. codebook_items(results, indent = indent)
1. codebook(codebook_data)

Thanks in advance!

hide JSON-LD behind <details>

Error in .f(.x[[i]], ...) : Names missing from the following functions: top_counts

Hi Ruben,

when I use codebook() on my formr data I get the following error:

Error in .f(.x[[i]], ...) : 
Names missing from the following functions:  top_counts

This seems to be a problem with factors with no label (which should be read as strings instead?), but I can't figure out why / which variables are affected / how to approach this issue.

Skimr v2 release soon

Skimr v2 is going to be released very soon. You use skim_to_wide() but actually with the new API the object defaults to wide. However there are other changes that might break things for codebook. Please take a look and let us know if there are problems you can't solve.

plans

report better psychometrics, not just alpha. allow customisation?
correlation graphs, maybe network graphs?
fix label wrapping automagically?
use survey data to figure out which missings are structural (i.e. relationship items for single people), which are due to interrupting survey, which are due to skipping optional items, etc.
document expected attributes structure
simplify missingness patterns for variables which hang together
not just scales, also subscales? the formr pkg function aggregate_and_document_scale allows users to aggregate scales but also mark them up in a codebook-friendly way.

won't do

english item translations via google translate, if not set? (not free)
scale and item sources/citations can just be specified via custom attributes #5

better default plots

plot something for items with a lot of responses, only some of which are rare (other group), maybe group all things that are <1% or n==1
string lengths etc. for characters

Online version codebook dependency error

@rubenarslan - hey dude - know you have a lot going on right now, but just a note that the web interface is giving me this error:

Also, I can't currently generate a codebook at all - I think it's the skimr issue mentioned #40.

> meta_data = codebook(mtcars)
No missing values.
Error: 'skim_with_defaults' is not an exported object from 'namespace:skimr'
In addition: Warning message:
'skimr::skim_to_wide' is deprecated.
Use 'skim()' instead.
See help("Deprecated")