Comments (11)
Okay I'm back.
Here's whats going on. The transformation
([:b, :a]) => ((bat, falcon)->(falcon, bat))
creates a tuple of vectors. Because a tuple
of vectors is functionally a scalar in DataFrames.jl's transformation language (i.e. not a table
or a vector
), it gets "spread" across rows, the same way
julia> df = DataFrame(a = [1, 2]);
julia> transform(df, :a => (t -> 1) => :b)
2×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 1
2 │ 2 1
works.
So it's creating the data frame
3×1 DataFrame
Row │ ##257
│ Tuple…
─────┼────────────────────────
1 │ ([3, 1, 2], [5, 4, 6])
2 │ ([3, 1, 2], [5, 4, 6])
3 │ ([3, 1, 2], [5, 4, 6])
and then sorting by the ##257
column. Which means the order doesn't change (or is undefined ?) after the sort happens.
I agree that it's very confusing that @orderby df :a, :b
looks so similar to @orderby(df, :a, :b)
.
One potential solution is to disallow expressions that look like Expr(:tuple, ...)
to avoid this problem... but I'm worried that could have unforseen side effects.
Now that you understand what DataFramesMeta.jl is doing behind the scenes, do you have more thoughts on how to approach this issue?
from dataframesmeta.jl.
So in essence:
@orderby df :a, :b
is the same as
@orderby df tuple(:a, :b)
due to how Julia parser works.
Right?
from dataframesmeta.jl.
Yes, exactly.
from dataframesmeta.jl.
I think the way forward is to disallow @rtransform :a, :b so users dont get confused.
I'm not sure that helps, as you can of course disallow a tuple, but you would also need to disallow macros that work on tuples. This is because @rtransform @byrow something, other_thing
would resolve to one macro expression as an argument, and I'm not sure that you don't get into trouble if you also disallow tuples within a macro as well.
One solution would be to disallow multiple arguments except listed in a begin end
block, but that feels overly restrictive.
The premise of this whole issue is that there is a silent "error", but of course it's not that, it's just confusing syntax. But given that Julia's macro parsing rules are fixed and you can't discriminate between a call with or without parentheses inside the macro, I think this is just not avoidable.
from dataframesmeta.jl.
Weird, thanks for the issue report. I would have expected the last one to sort the tuples lexicographically, but it does not seem to do that.
from dataframesmeta.jl.
oh wait, it is sorting the tuple lexicographically. But within this it's sorting the arrays lexicographically. This is "right" in the sense that it's behaving as Julia behaves, but it is unintuitive behavior.
MacroTools.@macroexpand(@orderby dd :a, :b) |> MacroTools.prettify
:((DataFramesMeta).orderby(dd, DataFramesMeta.make_source_concrete([:b, :a]) => (((bat, falcon)->(falcon, bat)) => Symbol("##257"))))
with
function orderby(x::SubDataFrame, @nospecialize(args...))
t = DataFrames.select(x, args...)
x[sortperm(t), :]
end
so we see
julia> select(dd, DataFramesMeta.make_source_concrete([:b, :a]) => (((bat, falcon)->(falcon, bat)) => Symbol("##257")))
3×1 DataFrame
Row │ ##257
│ Tuple…
─────┼────────────────────────
1 │ ([3, 1, 2], [5, 4, 6])
2 │ ([3, 1, 2], [5, 4, 6])
3 │ ([3, 1, 2], [5, 4, 6])
Sorry, i'll finish writing this up tomorrow
from dataframesmeta.jl.
I'm not sure what the correct solution here would be. Maybe we document the use of tuple in such cases, which is essentially equivalent to writing the syntax with the parenthesis. So maybe just Advocate the use of one common syntax rather than two different ways of writing? Given a choice, to be consistent with dataframes.jl, it is probably easier to write the macros with parentheses and start deprecating the one without. It will be consistent also with the syntax that is used in R where most of the users are coming from
from dataframesmeta.jl.
it is probably easier to write the macros with parentheses and start deprecating the one without
This is what I do (i.e. I always use a function call style). But we cannot "deprecate it" - this is an intentional feature of Julia not of DataFramesMeta.jl.
Probably we should highlight to users that there are two styles allowed by Julia and explained them better.
from dataframesmeta.jl.
So maybe just Advocate the use of one common syntax rather than two different ways of writing?
Yes, this doesn't fully solve the problem, since @mymacro a, b
is always going to be parsed as a tuple by default. Even if the recommended way is @mymacro(a, b)
.
it is probably easier to write the macros with parentheses and start deprecating the one without.
The reason we support the non-parentheses version is because of @passmissing
and other macro flags. Ironically, for the same problems that we are writing out here.
@rorderby(df, @passmissing :y, :b)
will get parsed as
@rorderby(df, @passmissing(tuple(:y, :b))
the solution to this problem was to allow multiple lines so the macro-flags could get properly parsed
@rorderby df begin
@passmissing :y
:b
end
It will be consistent also with the syntax that is used in R where most of the users are coming from
Yes but @rtransform df :a
or inside a chain block is closer to Stata, which is also a market-share to appeal to. Plus, I dislike the requirement of parentheses in dplyr, because I think they discourage complicated transformations. I was actually hoping to get rid of all the parentheses in documentation to get people to use the begin ... end
version.
I think the way forward is to disallow @rtransform :a, :b
so users dont get confused.
from dataframesmeta.jl.
But given that Julia's macro parsing rules are fixed and you can't discriminate between a call with or without parentheses inside the macro
This was my point above. I think it is best to very clearly explain the options and their consequences in the manual. @pdeffebach - if you prefer I can submit a PR.
from dataframesmeta.jl.
Maybe we need a "Gotcha's" section of the docs? Could be a good opportunity to beef up the docs before 1.0.
from dataframesmeta.jl.
Related Issues (20)
- operators do not work inside function call inside macros HOT 3
- typos HOT 3
- Macro @rolling for scrolling through a column or columns of values? HOT 3
- Add a `@bycol` macro-flag HOT 5
- Add metadata for working with DataFrames HOT 1
- Access subdf in @by and @combine HOT 7
- Request - grouped by columns available as single values rather than vectors HOT 5
- Request: `@order` to mimic `DataFrames.order` in `@orderby` HOT 2
- Very slow `@astable` macro outside a function HOT 4
- `@with` macro clashes with `Base.@with` in Julia 1.11+ HOT 8
- `ByRow` not defined when importing DataFramesMeta HOT 1
- docs question HOT 7
- Request @rsubset_rtransform HOT 7
- Special-case `==` as with other one-argument functions HOT 2
- Add an alternative syntax escaping than `$` HOT 1
- MethodError occurred when broadcasting a string inside @astable HOT 3
- Speculative future of `@groupby` macro
- Allow reference to previously defined columns in @transform HOT 7
- `groupby` derived columns
- Add convenience function to look up a single value in a `DataFrame` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dataframesmeta.jl.