GithubHelp home page GithubHelp logo

Comments (11)

pdeffebach avatar pdeffebach commented on September 26, 2024 1

Okay I'm back.

Here's whats going on. The transformation

([:b, :a]) => ((bat, falcon)->(falcon, bat))

creates a tuple of vectors. Because a tuple of vectors is functionally a scalar in DataFrames.jl's transformation language (i.e. not a table or a vector), it gets "spread" across rows, the same way

julia> df = DataFrame(a = [1, 2]);

julia> transform(df, :a => (t -> 1) => :b)
2×2 DataFrame
 Row │ a      b     
     │ Int64  Int64 
─────┼──────────────
   1 │     1      1
   2 │     2      1

works.

So it's creating the data frame

3×1 DataFrame
 Row │ ##257
     │ Tuple…
─────┼────────────────────────
   1 │ ([3, 1, 2], [5, 4, 6])
   2 │ ([3, 1, 2], [5, 4, 6])
   3 │ ([3, 1, 2], [5, 4, 6])

and then sorting by the ##257 column. Which means the order doesn't change (or is undefined ?) after the sort happens.

I agree that it's very confusing that @orderby df :a, :b looks so similar to @orderby(df, :a, :b).

One potential solution is to disallow expressions that look like Expr(:tuple, ...) to avoid this problem... but I'm worried that could have unforseen side effects.

Now that you understand what DataFramesMeta.jl is doing behind the scenes, do you have more thoughts on how to approach this issue?

from dataframesmeta.jl.

bkamins avatar bkamins commented on September 26, 2024 1

So in essence:

@orderby df :a, :b

is the same as

@orderby df tuple(:a, :b)

due to how Julia parser works.

Right?

from dataframesmeta.jl.

pdeffebach avatar pdeffebach commented on September 26, 2024 1

Yes, exactly.

from dataframesmeta.jl.

jkrumbiegel avatar jkrumbiegel commented on September 26, 2024 1

I think the way forward is to disallow @rtransform :a, :b so users dont get confused.

I'm not sure that helps, as you can of course disallow a tuple, but you would also need to disallow macros that work on tuples. This is because @rtransform @byrow something, other_thing would resolve to one macro expression as an argument, and I'm not sure that you don't get into trouble if you also disallow tuples within a macro as well.

One solution would be to disallow multiple arguments except listed in a begin end block, but that feels overly restrictive.

The premise of this whole issue is that there is a silent "error", but of course it's not that, it's just confusing syntax. But given that Julia's macro parsing rules are fixed and you can't discriminate between a call with or without parentheses inside the macro, I think this is just not avoidable.

from dataframesmeta.jl.

pdeffebach avatar pdeffebach commented on September 26, 2024

Weird, thanks for the issue report. I would have expected the last one to sort the tuples lexicographically, but it does not seem to do that.

from dataframesmeta.jl.

pdeffebach avatar pdeffebach commented on September 26, 2024

oh wait, it is sorting the tuple lexicographically. But within this it's sorting the arrays lexicographically. This is "right" in the sense that it's behaving as Julia behaves, but it is unintuitive behavior.

MacroTools.@macroexpand(@orderby dd :a, :b) |> MacroTools.prettify
:((DataFramesMeta).orderby(dd, DataFramesMeta.make_source_concrete([:b, :a]) => (((bat, falcon)->(falcon, bat)) => Symbol("##257"))))

with

function orderby(x::SubDataFrame, @nospecialize(args...))
    t = DataFrames.select(x, args...)
    x[sortperm(t), :]
end

so we see

julia> select(dd, DataFramesMeta.make_source_concrete([:b, :a]) => (((bat, falcon)->(falcon, bat)) => Symbol("##257")))
3×1 DataFrame
 Row │ ##257
     │ Tuple…
─────┼────────────────────────
   1 │ ([3, 1, 2], [5, 4, 6])
   2 │ ([3, 1, 2], [5, 4, 6])
   3 │ ([3, 1, 2], [5, 4, 6])

Sorry, i'll finish writing this up tomorrow

from dataframesmeta.jl.

vjd avatar vjd commented on September 26, 2024

I'm not sure what the correct solution here would be. Maybe we document the use of tuple in such cases, which is essentially equivalent to writing the syntax with the parenthesis. So maybe just Advocate the use of one common syntax rather than two different ways of writing? Given a choice, to be consistent with dataframes.jl, it is probably easier to write the macros with parentheses and start deprecating the one without. It will be consistent also with the syntax that is used in R where most of the users are coming from

from dataframesmeta.jl.

bkamins avatar bkamins commented on September 26, 2024

it is probably easier to write the macros with parentheses and start deprecating the one without

This is what I do (i.e. I always use a function call style). But we cannot "deprecate it" - this is an intentional feature of Julia not of DataFramesMeta.jl.
Probably we should highlight to users that there are two styles allowed by Julia and explained them better.

from dataframesmeta.jl.

pdeffebach avatar pdeffebach commented on September 26, 2024

So maybe just Advocate the use of one common syntax rather than two different ways of writing?

Yes, this doesn't fully solve the problem, since @mymacro a, b is always going to be parsed as a tuple by default. Even if the recommended way is @mymacro(a, b).

it is probably easier to write the macros with parentheses and start deprecating the one without.

The reason we support the non-parentheses version is because of @passmissing and other macro flags. Ironically, for the same problems that we are writing out here.

@rorderby(df, @passmissing :y, :b)

will get parsed as

@rorderby(df, @passmissing(tuple(:y, :b))

the solution to this problem was to allow multiple lines so the macro-flags could get properly parsed

@rorderby df begin 
    @passmissing :y 
    :b 
end

It will be consistent also with the syntax that is used in R where most of the users are coming from

Yes but @rtransform df :a or inside a chain block is closer to Stata, which is also a market-share to appeal to. Plus, I dislike the requirement of parentheses in dplyr, because I think they discourage complicated transformations. I was actually hoping to get rid of all the parentheses in documentation to get people to use the begin ... end version.

I think the way forward is to disallow @rtransform :a, :b so users dont get confused.

from dataframesmeta.jl.

bkamins avatar bkamins commented on September 26, 2024

But given that Julia's macro parsing rules are fixed and you can't discriminate between a call with or without parentheses inside the macro

This was my point above. I think it is best to very clearly explain the options and their consequences in the manual. @pdeffebach - if you prefer I can submit a PR.

from dataframesmeta.jl.

pdeffebach avatar pdeffebach commented on September 26, 2024

Maybe we need a "Gotcha's" section of the docs? Could be a good opportunity to beef up the docs before 1.0.

from dataframesmeta.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.