GithubHelp home page GithubHelp logo

Comments (6)

tshort avatar tshort commented on May 13, 2024

You can do the following:

@byrow! df begin
    :now_works = mean(Array(_I_(vars)[row,:]))
end

That's clumsy, not obvious, and it's likely to be inefficient. So, there are opportunities to make escaping work better here.

from dataframesmeta.jl.

pdeffebach avatar pdeffebach commented on May 13, 2024

Looking at the implementation of @byrow!, I would have expected it to be a wrapper for eachrow(df). Instead, its vectorized replace operations. Is there a reason for this? I think I read somewhere that DataFrameRows aren't that performant.

from dataframesmeta.jl.

tshort avatar tshort commented on May 13, 2024

@byrow! produces a for loop over the rows of the DataFrame. You can see this using macroexpand as follows (I stripped the comments in the output):

julia> macroexpand(:(@byrow! df begin
           :works = mean([:x1, :x2, :x3])
       end))

quote  
    _N = length(df[1]) 
    _DF = begin 
            (DataFramesMeta.transform)(df)
        end 
    begin  
        function ##666(##669, ##668, ##667, ##670) 
            for row = 1:_N
                begin  
                    ##667[row] = mean([##668[row], ##669[row], ##670[row]])
                end
            end
        end 
        ##666(_DF[:x2], _DF[:x1], _DF[:works], _DF[:x3])
    end 
    _DF
end

I don't think it's possible to make an efficient mean operation on columns specified by a variable vars. With the current DataFrames/DataFramesMeta design, you can only have type-stable operations if you specify the columns ahead of time.

from dataframesmeta.jl.

pdeffebach avatar pdeffebach commented on May 13, 2024

I hadn't used macroexpand! that is a very elegant solution.

I don't think it's possible to make an efficient mean operation on columns specified by a variable vars.

Is this just for row-wise operations? Or are you referring in general to _I_(vars) ^[foot] use in any @with macro.

I am kind of playing with a DataFrameRow to behave more like a vector, and can see that type stability is an issue for that as well.

Generic row-wise operations on vectors are something I use a lot in Stata, so they are an important part of my transition away from stata, so I would be interested in spending more effort to try and get this working.

foot: Which we should rename to cols() imo cause its prettier.

from dataframesmeta.jl.

nalimilan avatar nalimilan commented on May 13, 2024

I think we could get good performance by making eachrow return a special iterator type parameterized on the types of the columns. It would be guaranteed to yield TypedDataFrameRow{C} rows. That would be interesting to experiment. One issue is that it wouldn't make things faster if you just call eachrow inside a function, so I'm not sure it makes sense to change the behavior of eachrow in DataFrames. Maybe if the compiler becomes smarter to specialize loops automatically.

Regarding calling mean on DataFrameRow, JuliaData/DataFrames.jl#1449 deprecates the current behavior which prevents this, so that we can implement it later.

from dataframesmeta.jl.

pdeffebach avatar pdeffebach commented on May 13, 2024

Closed in favor of #229

from dataframesmeta.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.