The labelled from larmarange

Strange behavior when creating labelled variables in data frames

I want to create variable foo but x is created (with value labels):

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
data.frame(
  foo = labelled::labelled(1:5, c(a=1, b=2))
) %>%
  str()
#> 'data.frame':    5 obs. of  1 variable:
#>  $ x:Class 'labelled'  atomic [1:5] 1 2 3 4 5
#>   .. ..- attr(*, "labels")= Named num [1:2] 1 2
#>   .. .. ..- attr(*, "names")= chr [1:2] "a" "b"

But for tibbles it is OK:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
tibble(
  foo = labelled::labelled(1:5, c(a=1, b=2))
) %>%
  str()
#> Classes 'tbl_df', 'tbl' and 'data.frame':    5 obs. of  1 variable:
#>  $ foo:Class 'labelled'  atomic [1:5] 1 2 3 4 5
#>   .. ..- attr(*, "labels")= Named num [1:2] 1 2
#>   .. .. ..- attr(*, "names")= chr [1:2] "a" "b"

I'm not yet sure why it happens.

lookfor detail = TRUE not working with haven_labelled

The if check only looks for "labelled" class and misses "haven_labelled".

Corrected in this fork: NoahMarconi@4001267

I also needed a JSON format. If you're open to a commit like that I can edit to make the JSON optional (e.g. prefixed or JSON) and submit a pull request.

remove_attributes() converts character columns to factors

r <- data_frame( ch = structure(letters[1:2], some_attribute=TRUE) ) %>%
  labelled::remove_attributes("some_attribute")
is.factor(r$ch)

I believe

labelled/R/remove_attributes.R

Line 31 in 650e920

x <- as.data.frame(

should have the infamous stringsAsFactors=FALSE.

dplyr::recode with labelled data

tidyverse/haven#400

Update labelled for taking into account missing values

Import tagged_na functions from haven

https://haven.tidyverse.org/reference/tagged_na.html

Release labelled 2.0.1

Prepare for release:

devtools::check_win_devel()
rhub::check_for_cran()
Polish NEWS

Perform release:

Wait for CRAN...

Tag release
Bump dev version

Template from r-lib/usethis#338

Release labelled 2.1.0

Prepare for release:

Submit to CRAN:

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
Remove file CRAN-RELEASE
usethis::use_dev_version()

Adding a to_character function

To convert labelled data to character

Add a strict argument to to_factor.data.frame

When converting a data.frame, a strict argument (checking if all values in the vector have a label) could be relevant to convert only those factors.

bind_rows(), list columns and var_labels

I need to bind_rows() of two tibbles that contain labelled data and list columns. dplyr is dropping the labels with a warning about "Vectorizing labelled data". To circumvent this I am trying to extract the lists of variable labels and value labels and re-applying them to the binded tibble. This does not work for variable labels and list columns unfortunately (see the reprex below).

What do you think about:

var_label() actually does not need to check whether x is atomic, doesn't it? The test could be dropped IMHO.
a more general solution for binding rows of labelled data. The question is of course what if value labels for the same variables in both data frames are different. Perhaps it could be handled similarly to factor levels. This would be a dplyr issue anyway...

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(haven)
library(labelled)

d <- data_frame(
  x = labelled(1:5, c(a=1, b=5)),
  lc = as.list(1:5)
)
var_label(d$x) <- "This is x"

# Can't apply variable label to a list column
var_label(d$lc) <- "This is lc" # Why not actually?
#> Error: `x` should be atomic



# Extract value labels
vl <- val_labels(d)

# Bind rows and re-apply value labels
dd <- bind_rows(d, d, .id="copy") 
#> Warning in bind_rows_(x, .id): Vectorizing 'labelled' elements may not
#> preserve their attributes

#> Warning in bind_rows_(x, .id): Vectorizing 'labelled' elements may not
#> preserve their attributes
val_labels(dd) <- vl
dd$x # OK!
#> <Labelled integer>
#>  [1] 1 2 3 4 5 1 2 3 4 5
#> 
#> Labels:
#>  value label
#>      1     a
#>      5     b


# Can't extract variable labels
# because d$lc is not atomic
var_label(d)
#> Error: `x` should be atomic
# This can be done "manually" along the following lines
varlabs <- lapply(d, attr, "label")
var_label(dd) <- varlabs[1] # skip the list column
lapply(dd, attr, "label")
#> $copy
#> NULL
#> 
#> $x
#> [1] "This is x"
#> 
#> $lc
#> NULL

Functions to generate SPSS / Stata syntax files

update foreign_to_labelled to handle missing values

mean method for labelled vector

cf. strengejacke/sjmisc#13

Test if labelled works properly with haven 2.2.0

Release labelled 2.0.2

Prepare for release:

devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::revdep_check(num_workers = 4)
Polish NEWS

Perform release:

Wait for CRAN...

Tag release
Remove file CRAN-RELEASE
Bump dev version

Template from r-lib/usethis#338

Release labelled 2.3.0

Prepare for release:

Submit to CRAN:

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
Remove file CRAN-RELEASE
usethis::use_dev_version()

labels with no label

Check tidyverse/haven#108

Release labelled 2.0.0

Prepare for release:

haven 2.0.0 released on CRAN (required for the different tests)
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::revdep_check(num_workers = 4)
Polish NEWS
If new failures, update email.yml then revdepcheck::revdep_email_maintainers()

Perform release:

Wait for CRAN...

Tag release
Bump dev version

Template from r-lib/usethis#338

as_factor applyied to an integer vector

e.g.

as_factor(1:4, "prefixed")
[1]
Levels: prefixed

Trimming "format.*" attributes

Would it be interesting to have a function (or an option) to trim out the "format.*" (e.g. format.stata, etc...) attributes of the variables?

to_factor for data.frame doesn't recognise haven's new haven_labelled class

Seems to be caused by this line checking for the old class name.

labelled/R/to_factor.R

Line 144 in 358b2d9

if (inherits(x, "labelled"))

Maybe it should be a call to is.labelled() instead.

Here's a minimal example illustrating the unexpected behaviour:

> library(labelled)
[...]
> df <- data.frame(x=labelled(1:3, labels=c(a=1, b=2, c=3)))
> str(df)
'data.frame':	3 obs. of  1 variable:
 $ x: 'haven_labelled' int  1 2 3
  ..- attr(*, "labels")= Named num  1 2 3
  .. ..- attr(*, "names")= chr  "a" "b" "c"
> to_factor(df)  # Unexpected: makes no change to labelled column `x`
  x
1 1
2 2
3 3
> to_factor(df$x)  # Expected: changes levels to factor
[1] a b c
Levels: a b c
> as.data.frame(lapply(df, to_factor))  # Expected behaviour of calling to_factor() on a data.frame
  x
1 a
2 b
3 c

The following snippet shows that changing the line to call is.labelled() instead fixes the behaviour.

> # Patch suspect function with call to `is.labelled()` instead
> utils::assignInNamespace(
+   '.to_factor_col_data_frame',
+   function (x, levels = c("labels", "values", "prefixed"), ordered = FALSE, 
+     nolabel_to_na = FALSE, sort_levels = c("auto", "none", "labels", 
+       "values"), decreasing = FALSE, labelled_only = TRUE, 
+     drop_unused_labels = FALSE, strict = FALSE, ...) 
+   {
+     if (is.labelled(x)) # <-- Change is here
+       x <- to_factor(x, levels = levels, ordered = ordered, 
+         nolabel_to_na = nolabel_to_na, sort_levels = sort_levels, 
+         decreasing = decreasing, drop_unused_labels = drop_unused_labels, 
+         strict = strict, ...)
+     else if (!labelled_only) 
+       x <- to_factor(x)
+     x
+   },
+   'labelled'
+ )
> to_factor(df)  # Now follows expected behaviour
  x
1 a
2 b
3 c

Tested with package labelled version 2.0.1

How does `val_labels(v) <- value` work?

labels not mandatory for labelled_spss()

cf. tidyverse/haven#219

Other forms of attributes

The labelled and sjlabelled packages are especially useful when automating the production of tables, graphs, and other results from real data. However, I have a few ideas for in-house (possibly worthy of sharing with others) functions that are analogous to the labels, but instead specify whether the columns in a data set are dependent, mediator, or independent variables; and whether they are ordinal or nominal (if categorical/labelled). With some easy-to-use functions that keep these attributes when performing e.g. tidyverse-operations, it would be very easy to produce large amounts of graphs where some functions down the pipeline "understand" what should go on the x-axis, y-axis, caption, etc.
Sure, it is easy to add regular attributes with attr(df$my_var, "type_of_variable") <- "independent" , but the ecosystem of functions in this/these packages seem convenient for the same purpose. Though, I am not sure whether there should be a fixed attribute that is called "vartype", or just some generic functions for the user to define one's own attributes.

Bug with update_labelled

When applied to a data.frame, non labelled variable are droped!

What does `val_label(v, 2)` exactly do?

See examples in val_label. Have you been planning a feature, which was changed afterwards?

Release labelled version 2.2.2

Prepare for release:

Submit to CRAN:

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
Remove file CRAN-RELEASE
usethis::use_dev_version()

plans for extending support of formats

Hi,

I recently started to use labelled more frequently and find it a bit difficult to switch between representation as labelled vector and factors. Internally, many functions require factors, but factors are not as flexible as labelled vectors. So it would be nice to define a new format class for converting between the two. I.e., have something like

x <- labelled(c(1,2,2), labels = c(1 = "x", 2 = "y"))
fmt <- format(x)
x_fct <- as_factor(x)
xx <- as_labelled(x_fct, fmt)

where xx == x holds. This would allow keeping the data in as labelled vectors with values as specified in the database and switching back and forth between factors and labelled vectors as needed. Are there any plans in this direction or would you accept a pull request?

Best,

Kevin

labelled CheatSheet

Develop a cheatsheet for labelled

cf. https://www.rstudio.com/resources/cheatsheets/how-to-contribute-a-cheatsheet/

Add an `unclass` option to `to_factor` to be used when `strict = TRUE`

In that case, if the labelled vector is not converted to a factor, it will be converted to a character or a numeric vector, not kept as a labelled vector

Non unique value labels?

Below d is a tibble read from an SPSS file with haven::read_spss(). I am getting:

print(d)
Error in `levels<-`(`*tmp*`, value = as.character(levels)) :
  factor level [9] is duplicated
> traceback()
19: factor(x, labs, ordered = ordered)
18: as_factor.haven_labelled(x, "labels")
17: as_factor(x, "labels")
16: lbl_pillar_info(x)
15: pillar_shaft.haven_labelled(X[[i]], ...)
14: FUN(X[[i]], ...)
13: lapply(.x, .f, ...)
12: map(x[pillar_shown], pillar_shaft)
11: colonnade_get_width(x, width, rowid_width)
10: pillar::squeeze(x$mcf, width = width)
9: format.trunc_mat(mat)
8: format(mat)
7: format.tbl(x, ..., n = n, width = width, n_extra = n_extra)
6: format(x, ..., n = n, width = width, n_extra = n_extra)
5: paste0(..., "\n")
4: cat(paste0(..., "\n"), sep = "")
3: cat_line(format(x, ..., n = n, width = width, n_extra = n_extra))
2: print.tbl(x)
1: (function (x, ...)
   UseMethod("print"))(x)

I suppose the print method makes a factor out of labelled variable for printing assuming that value labels are unique. There is no such restriction in, say, SPSS. Sometimes ppl take advantage of it.

Value Label in Machine Readable Format (lookfor details = TRUE)

Currently levels, value_labels, na_values, and na_range are converted to a string e.g.: https://github.com/larmarange/labelled/blob/master/R/lookfor.R#L96

The current functionality is useful for Viewing but less useful when the labels are needed for further processing (e.g. to display labels in a chart or graphic).

Could we add the option to use a machine readable format like json, or to preserve the original vectors by storing them in a column of type <list>?

jsonlite::toJSON can be imported lazily using Suggests in the Description file, or no additional dependencies are needed if a flag is added to preserve the original vectors.

variable labels are not preserved with dplyr functions

If I set a variable label with set_variable_labels and later apply dplyr::filter, the variable label is removed. Here's a small example:

library(dplyr)
library(labelled)

df <- tibble(id = 1:2, can = factor(c('yes', 'no'))) %>% 
  set_variable_labels(can = 'Cannabis use')

#variable label is there
df$can

#variable label is not there
filter(df, id == 1)$can

I'm not sure if this is a bug of dplyr or of the labelled package. It seems to have been introduced with dplyr version 0.8

Different classes for lbl+num and lbl+chr?

@elinw suggested here ropensci/skimr#296 that it might be beneficial for different labelled classes to exist, for the different underlying types. This seems to make a lot of sense to me, because this would make it easier (possible?) to write appropriate summary, skim, print methods etc.
Do you agree or is it actually possible now too?

Release labelled 2.2.1

Prepare for release:

Submit to CRAN:

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
Remove file CRAN-RELEASE
usethis::use_dev_version()

as.data.frame method for labelled var

functions compatible with %>%

Following tidyverse/haven#185, some functions for variable and value labels that could be used with %>% operator.

Inverted names of contributors

Hey Joseph,

I think the first and family names are inverted here:

labelled/DESCRIPTION

Lines 10 to 11 in 5ae5354

 person("Bojanowski", "Michal", role = "ctb"), 

 person("Briatte", "François", role = "ctb")

Don't ask me why I'm noticing that now and here!

Hope you're good :)

Include lookfor in labelled

Hey @juba and @larmarange,

I have been working with labelled survey data lately. Every time I do so, I find Stata far superior to R when it comes to doing some of the most basic things that we need to do when exploring that kind of data…

Problem

Take variable labels, which are essential to get a grip of any new survey dataset. How is the user supposed to list them all? Variable labels being in the attributes, users might want to do this:

for each column:
  list variable name and label

Unless I am mistaken, this is not easily doable. The user might try this, which won't work:

attr(labelled_data, "label")
apply(labelled_data, 2, attr, "label")

What the user actually needs is:

vapply(labelled_data, attr, character(1), "label", exact = TRUE) # or...
sapply(labelled_data, attr, "label") # ... but non-strict and risky: might return partial matches

So, to get all variable labels in a easily-searchable format like a data frame, the user needs, at the very least (and these examples do not even preclude partial matching):

data.frame(vapply(labelled_data, attr, character(1), "label"))
tibble::enframe(vapply(labelled_data, attr, character(1), "label"))

In all cases above, the user needs to be fairly familiar with R to get the labels. Furthermore, a single missing variable label will kill the function with a cryptic message:

Error in vapply(labelled_data, attr, character(1), "label") : 
  values must be length 1,
 but FUN(X[[4]]) result is length 0

Here, [[4]] is the column (variable) where the variable no variable label (NULL).

Solution

I wrote a short function to list and search variable labels.

It is named var_labels in the spirit of the labelled package by Joseph, from which I took some code to write the show_values argument, and it is similar to the lookfor function that I wrote for questionr many years ago (thanks for improving it, Julien!):

#' @param data a labelled data frame
#' @param show_values add a column showing labelled values
#' @param ... character string(s) to match in the variable names or labels
#' @param ignore.case whether to ignore case when matching variable names or labels
#' @return a tibble
var_labels <- function(data, show_values = FALSE, ..., ignore.case = TRUE) {
  
  require(magrittr) # can easily be removed if need be
  require(tibble)   # preferrable in my view to returning a data.frame
  
  # variable labels -> tibble
  vars <- names(data)
  lbls <- tibble::tibble(variable = vars) %>% 
    tibble::add_column(
      label = vapply(vars, function(x) {
        # similar to labelled:::var_label.default
        x <- attr(data[[ x ]], "label", exact = TRUE)
        # handle missing variable labels
        ifelse(is.null(x), NA_character_, x)
      }, character(1))
    )
  
  # add labelled values
  if (show_values) {
    # similar to labelled:::val_labels.haven_labelled
    lbls <- tibble::add_column(
      lbls,
      values = vapply(vars, function(x) {
        x <- attr(data[[ x ]], "labels", exact = TRUE)
        # handle missing no value labels
        if (is.null(x)) {
          NA_character_
        } else {
          x <- paste0("[", x, "] ", names(x))
          paste(x, collapse = " ")
        }
      }, character(1))
    )
  }
  
  # subset to matching rows (a more complex option would be to use `tidyselect`)
  find <- c(...)
  if (length(find)) {
    find <- paste(find, collapse = "|")
    find <- grepl(find, lbls$variable, ignore.case = ignore.case) |
      grepl(find, lbls$label, ignore.case = ignore.case)
    lbls[ find, ]
  } else {
    lbls
  }
  
}

(The vapply part cannot be written more efficiently due to the possibility of missing values. Using purrr::attr_getter does not solve the issue, as attr_getter simply wraps around attr.)

Example, using some labelled data included in questionr:

library(questionr)

data(fertility)
women$unlabelled_test_variable <- 1L

var_labels(women)
var_labels(women, show_values = TRUE)
var_labels(women, "weight", "child") # Stata equivalent: lookfor weight child
var_labels(women, "hiv", show_values = TRUE)

Now, I do not know where to submit that function: are any of you interested in including it in questionr or labelled?

I also submitted a simpler function to haven, and opened another issue to discuss its search support.

Add a set_variable_labels_all function to create labels from column names using a specified function

It would be very useful to have a function that automatically sets all data.frame labels as transformed versions of the column names. Similar to the janitor package's clean_names() function that creates and sets snakecase column names, I would like to be able to set all column labels to a readable version of the column names from within a pipe. (Usually I am transforming from snakecase to title case and replacing "_" with spaces).

I could see two approaches to this:

The more straight forward but less flexible approach would be to allow the user a limited set of pre-defined transformation options (e.g. title case, all caps, replace "_" with " ").
Allow a user to use any function to transform. I'm not sure the best way to do this, but perhaps it could employ some of the tools underlying rename_all() in dplyr: (https://github.com/tidyverse/dplyr/blob/master/R/funs.R)

Use setattr for data.table

Support for the `Date` class

I'd like to include tagged missings (NAs) in the Date variable. But when I do the following

x <- rep(c(1,2),5)
x[[4]] <- tagged_na('a')
y <- as.Date(x, origin = '1992-01-01')
class(y)
#[1] "Date"
labelled(y, c("NA"=tagged_na('a')))
#Error: 'y' must be a numeric or a character vector

Is this behavior by design, or do I write a valid feature request? :-)

first, last and nth (dplyr) method for labelled vector

Release labelled 2.2.0

Prepare for release:

Submit to CRAN:

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
Remove file CRAN-RELEASE
usethis::use_dev_version()

Sort labels

Function sort_val_labels to sort labels according to value or according to label.

Working with haven_labelled_spss - data looses “label” after setting NA values

Sorry to bother, may I point you to an unanswered question at stackoverflow. I am not sure whether I make a mistake or there is an unwanted behavior in the package, but haven_labelled_spss data seems to less the label attribute after using any form of na_values. Many thanks!

https://stackoverflow.com/questions/56235144/working-with-haven-labelled-spss-data-looses-label-after-setting-na-values

set_variable_labels list problem

Hi!

If i provide a named list of values to set_variable_labels(), it does not work because it converts the list into a list. This makes it impossible to follow the efficient workflow of...

get variable labels via var_label(data)
manipulate data, lose the variable labels in the process
reapply variable names via set_variable_labels

The problem is the first line in set_variable_labels: values <- list(...)

Check matching type

This is a little annoying, especially because most haven functions do not complain. Maybe you could check the type in val_labels and cast it correctly if possible or throw an error if not?

x = 1L:5L
labelled::val_labels(x) <-  c("low" = 1)
haven::na_tag(x)

Error: x must be a double vector

Working with old data labels saved with labelled class

I was using remove_val_label to remove labels of some data saved a months ago under labelled class, but since val_label.labelled method was deleted, it does not work. I do not know if it would be a good option to include this method again, or there would be another way to remove labels.

	person("Bojanowski", "Michal", role = "ctb"),
	person("Briatte", "François", role = "ctb")

larmarange / labelled Goto Github PK

labelled's People

Stargazers

Watchers

Forkers

labelled's Issues

Problem

Solution

Recommend Projects

Recommend Topics

Recommend Org

Jobs