Comments (13)
I like the SubDataFrame idea. We could use nafilter and nareplace for that. Or, maybe those should generate a new df, and nafilter_sub and nareplace_sub could return SubDataFrames. complete_cases(df) could return the row index of complete cases. Maybe that's all we need. Then, the user could do sub(df, complete_cases(df)) or df[complete_cases(df),:].
from dataframes.jl.
I often find myself wanting filter
for DataFrames. I basically want to create a new DataFrame (but it could even be a SubDataFrame) by filtering rows of an existing DataFrame. Currently, I roll out my own code, but a filter
interface would be really handy.
from dataframes.jl.
This is basically what subset does. We could rename it to filter.
-- John
On Jun 28, 2013, at 9:41 AM, "Viral B. Shah" [email protected] wrote:
I often find myself wanting filter for DataFrames. I basically want to create a new DataFrame (but it could even be a SubDataFrame) by filtering rows of an existing DataFrame. Currently, I roll out my own code, but a filter interface would be really handy.
—
Reply to this email directly or view it on GitHub.
from dataframes.jl.
I wonder how I missed subset
. I now see that it is not in the function reference. When I come across such things, should I just go ahead and add to the existing function reference documentation? I am hoping that we will be able to convert it to the helpdb format and have it accessible with help
soon.
It would be nice to rename it to filter
, with the slight caveat that the behaviour is different from matrices. However, it still seems like it is the right name for this operation.
from dataframes.jl.
We need to have a big conversation about documentation formats next week.
from dataframes.jl.
I like the name subset better than filter.
Also, just plain row indexing gives you a copy of a subset of a DataFrame.
On Fri, Jun 28, 2013 at 2:46 PM, John Myles White
[email protected]:
We need to have a big conversation about documentation formats next week.
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/43#issuecomment-20206834
.
from dataframes.jl.
Now that I know about subset
, I am ok with that. Perhaps just mark this as a doc issue?
from dataframes.jl.
We could fix the docs.
I do kind of like only having filter
: one of the things I like Julia is the possibility that multiple dispatch can shrink the language's vocabulary to a very small number of basic abstractions that apply in all domains.
from dataframes.jl.
one of the things I like Julia is the possibility that multiple dispatch can shrink the language's vocabulary to a very small number of basic abstractions that apply in all domains.
THIS. We should work that into our manual / philosophy somewhere.
from dataframes.jl.
Hopefully we can fix the typo before we do.
from dataframes.jl.
I can't even spot the typo after reading it multiple times...
from dataframes.jl.
"one of the things I like Julia" -> one of the things I like ABOUT Julia"
from dataframes.jl.
I think this is related, but I can't figure out how to use filter
(or sub
) on a DataFrame
/DataArray
:
filter
with regular DataArray
fails (though I don't know why this constructor doesn't work: filter(f::Function,As::AbstractArray{T,N}) at array.jl:1209
)
julia> df[:net]
5-element DataArray{IPv4net,1}:
IPv4net(ip"1.2.3.0",ip"255.255.255.0")
IPv4net(ip"4.5.6.7",ip"255.255.255.0")
IPv4net(ip"1.2.3.0",ip"255.255.0.0")
IPv4net(ip"4.5.6.7",ip"255.255.0.0")
IPv4net(ip"1.2.3.0",ip"255.0.0.0")
julia> filter(x->IPnetwork.contains(x,a), df[:net])
ERROR: type: typeassert: expected AbstractArray{Bool,N}, got DataArray{Any,1}
in filter at array.jl:1209
Changing the DataArray
to an Array
works:
julia> z = [x for x in df[:net]]
5-element Array{Any,1}:
IPv4net(ip"1.2.3.0",ip"255.255.255.0")
IPv4net(ip"4.5.6.7",ip"255.255.255.0")
IPv4net(ip"1.2.3.0",ip"255.255.0.0")
IPv4net(ip"4.5.6.7",ip"255.255.0.0")
IPv4net(ip"1.2.3.0",ip"255.0.0.0")
julia> filter(x->IPnetwork.contains(x,a), z)
3-element Array{Any,1}:
IPv4net(ip"1.2.3.0",ip"255.255.255.0")
IPv4net(ip"1.2.3.0",ip"255.255.0.0")
IPv4net(ip"1.2.3.0",ip"255.0.0.0")
sub
doesn't apparently like functions:
julia> sub(x->IPnetwork.contains(x,a), df[:net])
ERROR: `sub` has no method matching sub(::Function, ::DataArray{IPv4net,1})
My recommendation would be to add a constructor to filter
to accept DataArrays
.
from dataframes.jl.
Related Issues (20)
- Segmentation Fault when reading compressed file HOT 1
- Revisit spreading for `AsTable` output` HOT 6
- Better error message when forming a DataFrame from a vector of dictionaries with missing data. HOT 2
- `describe` is slow HOT 3
- CartesianIndex error in Julia 1.11 HOT 4
- `DataFrame(x=Int[], y=Int)` HOT 3
- Add comparison function for dataframes which can handle both isapprox and isequal column types HOT 2
- unique fails with column-type FixedDecimal HOT 5
- mapcols! should modify the parent of a SubDataFrame HOT 11
- Feature request: Pairs in stack HOT 2
- Grouped DataFrame with array elements fails to combine HOT 4
- error when combining a grouped empty dataframe using `first` HOT 6
- Short circuit && on subset? HOT 1
- Integer strings as colnames/selectors are error prone HOT 2
- Suggestion - Matrix Syntax for hcat (as well as vcat) HOT 4
- Document custom generation of column names in manual HOT 9
- `join` should not introduce `Missing` types to schema HOT 1
- Consider removing Tables.allocatecolumn in vcat
- DataFrame(t::Table) converts PooledVector columns HOT 2
- Sampling GroupedDataFrames (rand) HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dataframes.jl.