GithubHelp home page GithubHelp logo

Comments (13)

tshort avatar tshort commented on May 11, 2024

I like the SubDataFrame idea. We could use nafilter and nareplace for that. Or, maybe those should generate a new df, and nafilter_sub and nareplace_sub could return SubDataFrames. complete_cases(df) could return the row index of complete cases. Maybe that's all we need. Then, the user could do sub(df, complete_cases(df)) or df[complete_cases(df),:].

from dataframes.jl.

ViralBShah avatar ViralBShah commented on May 11, 2024

I often find myself wanting filter for DataFrames. I basically want to create a new DataFrame (but it could even be a SubDataFrame) by filtering rows of an existing DataFrame. Currently, I roll out my own code, but a filter interface would be really handy.

from dataframes.jl.

johnmyleswhite avatar johnmyleswhite commented on May 11, 2024

This is basically what subset does. We could rename it to filter.

-- John

On Jun 28, 2013, at 9:41 AM, "Viral B. Shah" [email protected] wrote:

I often find myself wanting filter for DataFrames. I basically want to create a new DataFrame (but it could even be a SubDataFrame) by filtering rows of an existing DataFrame. Currently, I roll out my own code, but a filter interface would be really handy.


Reply to this email directly or view it on GitHub.

from dataframes.jl.

ViralBShah avatar ViralBShah commented on May 11, 2024

I wonder how I missed subset. I now see that it is not in the function reference. When I come across such things, should I just go ahead and add to the existing function reference documentation? I am hoping that we will be able to convert it to the helpdb format and have it accessible with help soon.

It would be nice to rename it to filter, with the slight caveat that the behaviour is different from matrices. However, it still seems like it is the right name for this operation.

from dataframes.jl.

johnmyleswhite avatar johnmyleswhite commented on May 11, 2024

We need to have a big conversation about documentation formats next week.

from dataframes.jl.

tshort avatar tshort commented on May 11, 2024

I like the name subset better than filter.

Also, just plain row indexing gives you a copy of a subset of a DataFrame.

On Fri, Jun 28, 2013 at 2:46 PM, John Myles White
[email protected]:

We need to have a big conversation about documentation formats next week.


Reply to this email directly or view it on GitHubhttps://github.com//issues/43#issuecomment-20206834
.

from dataframes.jl.

ViralBShah avatar ViralBShah commented on May 11, 2024

Now that I know about subset, I am ok with that. Perhaps just mark this as a doc issue?

from dataframes.jl.

johnmyleswhite avatar johnmyleswhite commented on May 11, 2024

We could fix the docs.

I do kind of like only having filter: one of the things I like Julia is the possibility that multiple dispatch can shrink the language's vocabulary to a very small number of basic abstractions that apply in all domains.

from dataframes.jl.

StefanKarpinski avatar StefanKarpinski commented on May 11, 2024

one of the things I like Julia is the possibility that multiple dispatch can shrink the language's vocabulary to a very small number of basic abstractions that apply in all domains.

THIS. We should work that into our manual / philosophy somewhere.

from dataframes.jl.

johnmyleswhite avatar johnmyleswhite commented on May 11, 2024

Hopefully we can fix the typo before we do.

from dataframes.jl.

StefanKarpinski avatar StefanKarpinski commented on May 11, 2024

I can't even spot the typo after reading it multiple times...

from dataframes.jl.

johnmyleswhite avatar johnmyleswhite commented on May 11, 2024

"one of the things I like Julia" -> one of the things I like ABOUT Julia"

from dataframes.jl.

sbromberger avatar sbromberger commented on May 11, 2024

I think this is related, but I can't figure out how to use filter (or sub) on a DataFrame/DataArray:

filter with regular DataArray fails (though I don't know why this constructor doesn't work: filter(f::Function,As::AbstractArray{T,N}) at array.jl:1209)

julia> df[:net]
5-element DataArray{IPv4net,1}:
 IPv4net(ip"1.2.3.0",ip"255.255.255.0")
 IPv4net(ip"4.5.6.7",ip"255.255.255.0")
 IPv4net(ip"1.2.3.0",ip"255.255.0.0")
 IPv4net(ip"4.5.6.7",ip"255.255.0.0")
 IPv4net(ip"1.2.3.0",ip"255.0.0.0")

julia> filter(x->IPnetwork.contains(x,a), df[:net])
ERROR: type: typeassert: expected AbstractArray{Bool,N}, got DataArray{Any,1}
 in filter at array.jl:1209

Changing the DataArray to an Array works:

julia> z = [x for x in df[:net]]
5-element Array{Any,1}:
 IPv4net(ip"1.2.3.0",ip"255.255.255.0")
 IPv4net(ip"4.5.6.7",ip"255.255.255.0")
 IPv4net(ip"1.2.3.0",ip"255.255.0.0")
 IPv4net(ip"4.5.6.7",ip"255.255.0.0")
 IPv4net(ip"1.2.3.0",ip"255.0.0.0")

julia> filter(x->IPnetwork.contains(x,a), z)
3-element Array{Any,1}:
 IPv4net(ip"1.2.3.0",ip"255.255.255.0")
 IPv4net(ip"1.2.3.0",ip"255.255.0.0")
 IPv4net(ip"1.2.3.0",ip"255.0.0.0")

sub doesn't apparently like functions:

julia> sub(x->IPnetwork.contains(x,a), df[:net])
ERROR: `sub` has no method matching sub(::Function, ::DataArray{IPv4net,1})

My recommendation would be to add a constructor to filter to accept DataArrays.

from dataframes.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.