GithubHelp home page GithubHelp logo

bespoke skim no longer working about skimr HOT 16 OPEN

blueja5 avatar blueja5 commented on June 2, 2024
bespoke skim no longer working

from skimr.

Comments (16)

elinw avatar elinw commented on June 2, 2024

Okay the other recent issue also seems to involve purrr.
Can you please let me know what version of purr and dplyr you are on?
It would be great if you could reproduce the problem in a simple example in which piping stops when the error occurs and using a data set like iris.

Is there any chance that you have haven labelled data?

Also can you please confirm that plain skim works with your data?

@michaelquinn32

from skimr.

blueja5 avatar blueja5 commented on June 2, 2024

from skimr.

elinw avatar elinw commented on June 2, 2024

I have a feeling that we were on dangerous ground using some internal functions.
dplyr:::expand_across(dot)

across() has had an API change that impacts the use of ... and requires an anonymous function instead ... that I saw other people online complaining about.

! .fns must be a function, a formula, or a list of functions/formulas.

\() mean(.x, na.rm = TRUE)

from skimr.

elinw avatar elinw commented on June 2, 2024

Hmm your example works for me and I am on r 4.2.2. I think the native short cut for anonymous functions was introduced in 4.1 so you should have that. But i wonder if there was a bug fix of some kind since 4.1.3. I'm going to look at the change log.

from skimr.

blueja5 avatar blueja5 commented on June 2, 2024

from skimr.

elinw avatar elinw commented on June 2, 2024

Okay comparing your two bespoke skims, which both work for me without error, I notice a few issues.

First in the one that doesn't work you have range = NULL but range returns two values and thus can't be used in skimr without modification to pick one or the other. It is not in the default list of numeric skimmers for this reason. What happens if you take out just range = NULL?

Second I notice that in the custom skim that doesn't work you have
base = sfl(complete = n_complete), numeric = sfl( ... but in the one that does work you have
numeric = sfl(complete = n_complete ...

Could you try changing those one at a time and see if one or both of them fixes the issue?

from skimr.

elinw avatar elinw commented on June 2, 2024

Yes I can confirm that for me, including range causes the error. I think we should come up with a more graceful message.

from skimr.

elinw avatar elinw commented on June 2, 2024

See this post: https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-pick-reframe-arrange/

There was 1 warning in dplyr::summarize().
ℹ In argument: skimmed = purrr::map2(...).
ℹ In group 2: skim_type = "numeric".
Caused by warning:
! Returning more (or less) than 1 row per summarise() group was
deprecated in dplyr 1.1.0.
ℹ Please use reframe() instead.
ℹ When switching from summarise() to reframe(), remember that
reframe() always returns an ungrouped data frame and adjust
accordingly.

So we would need to find a way to identify functions that return multiple values.

from skimr.

elinw avatar elinw commented on June 2, 2024

@michaelquinn32 Since this is actually working as documented (only functions with a return of length 1 are allowed) what about catching the error and using our own error message?

from skimr.

blueja5 avatar blueja5 commented on June 2, 2024

from skimr.

elinw avatar elinw commented on June 2, 2024

Okay so still, I think there is the issue that the tidyverse made an important change to handling of statistics that return multiple values. For me range "works" in the sense that I get multiple almost identical rows with the exception that they have two different values of range. But I'm not sure why it doesn't error given the tidyverse change (I did get the error one time but can't reproduce it).

So overall it really looks like something is going wrong with the NULLs. Can you try not including NULL setting for statistics that are not part of the default (mad, empty and range are not in the default numeric skimmers).

One thing is that these are the default numerics

mean sd p0 p25 p50 p75 p100 hist

I don't think you should be using NULL on anything besides them.
I also am concerned about doing much of anything with the base sfl since the columns defined by that are used for duck typing skim objects.

I'm assuming that using skim() unmodified works, correct? And also skim_without_charts()?

What I think would be helpful in identifying why you are getting this error is probably start from scratch with creating a skimmer by making one modification at a time and seeing if there is a specific one that throws the error.

If none by itself is causing it, then the next question is what combination is the trigger.

from skimr.

blueja5 avatar blueja5 commented on June 2, 2024

from skimr.

blueja5 avatar blueja5 commented on June 2, 2024

from skimr.

elinw avatar elinw commented on June 2, 2024

What would be great is if you could make a minimal reproducible example, meaning do the smallest, simplest code ... no functions from other packages, no using purrr partials . That will really help us isolate the problem.
So starting with NULL and numeric

my_skim <- skim_with(numeric = sfl(mean = NULL))
my_skim(iris)

And keep changing what you NULLs until you have been through them all
or you get an error.

Then if you don't trigger the error, you should start adding some base functions like MAD etc. But only ones that only return a single value (not range).

my_skim <- skim_with(numeric = sfl(mad = mad))
my_skim(iris)

from skimr.

blueja5 avatar blueja5 commented on June 2, 2024

from skimr.

elinw avatar elinw commented on June 2, 2024

Okay it's strange but I'll keep trying to reproduce.

from skimr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.