Comments (4)
Do you happen to know what version this worked on?
Going backwards, I get a different error on 0.20.21
TypeError: invalid literal value: 'State.VIC'
But it fails on every version I've tried.
from polars.
I'm not even sure if this is actually a bug.
from polars.
We do not automatically convert from python Enum types to polars Enum Series; you can put in a feature request for this. Note that the dtype of your dataframe is simply a string:
>>> data
shape: (6, 1)
┌─────────────────┐
│ state │
│ --- │
│ str │
╞═════════════════╡
│ victoria │
│ victoria │
│ victoria │
│ new south wales │
│ new south wales │
│ new south wales │
└─────────────────┘
type(State.VIC)
is <enum 'State'>
, so polars is trying to filter a string column based on an object, and doesn't like it. State.VIC.value
is a string, and so the filter works.
So tl;dr you are not actually performing enum filtering, but trying to filter a string column based on a python Enum object, which polars does not recognize.
from polars.
Sorry guys, I just re-tested on previous versions and found I missed something key. The enum class was a string enum, not just an enum.
from enum import Enum
import polars as pl
class State(str, Enum): # NOTE: missed the `str` before the enum
VIC = "victoria"
NSW = "new south wales"
data = pl.DataFrame({
'state': [State.VIC] * 3 + [State.NSW] * 3
})
print(data)
prints out: (note: correctly converts the 'enum' to a string)
shape: (6, 1)
┌─────────────────┐
│ state │
│ --- │
│ str │
╞═════════════════╡
│ victoria │
│ victoria │
│ victoria │
│ new south wales │
│ new south wales │
│ new south wales │
└─────────────────┘
Filtering if on 0.20.31:
>>> print(data.filter(pl.col('state') == State.VIC)) # Does not filter
shape: (0, 1)
┌───────┐
│ state │
│ --- │
│ str │
╞═══════╡
└───────┘
>>> print(data.filter(pl.col('state') == State.VIC.value)) # Adding the .value, filters correctly
shape: (3, 1)
┌──────────┐
│ state │
│ --- │
│ str │
╞══════════╡
│ victoria │
│ victoria │
│ victoria │
└──────────┘
However, back in 0.20.25
>>> print(data.filter(pl.col('state') == State.VIC)) # With or without the .value, filters correctly
shape: (3, 1)
┌──────────┐
│ state │
│ --- │
│ str │
╞══════════╡
│ victoria │
│ victoria │
│ victoria │
└──────────┘
This string enum also works in other libraries, so there's precedent for it to also work in polars:
assert State.VIC == "victoria"
my_dict = {}
my_dict[State.VIC] = 'foo'
print(my_dict['victoria']) # prints out 'foo'
# Or in pandas
import pandas as pd
data_pd = pd.DataFrame({'state': [State.VIC] * 3 + [State.NSW] * 3})
print(data_pd.loc[lambda df: df['state'] == State.VIC])
# Prints:
# state
# 0 State.VIC
# 1 State.VIC
# 2 State.VIC
from polars.
Related Issues (20)
- `implode` results in extra level of nesting when run within a `group_by(...).agg` HOT 2
- Support reading directly from zipfile.Path objects.
- Read_json panics when infer_schema_length = 0
- `explain(streaming=True)` isn't showing correct plan
- Data in csv files with less columns than schema shifts data. HOT 4
- Add the argument `ignore_nulls` in `.arr.all()`, `.arr.any()`, `.list.all()` and `.list.any()`
- read_database_uri panics for dates beyond 2262.04.11 HOT 2
- Move streaming engine original plan to separate field on the `IRPlan`
- Write upgrade guide for 1.0.0
- Polars is unable to parse dates beyond 2262.04.11 HOT 1
- Make a ParquetWriter context handler and/or more control over row group creation
- Casting to float32, int32, int16 and int8 in polars is slower than pandas in larger dfs HOT 4
- Interpolate based on other Float64 column HOT 3
- Comparing 0 with UInt64 values larger than Int64::MAX incorrectly return NULL
- `read_csv` ignores the `columns` parameter when reading an empty CSV file with header HOT 1
- Inconsistent XOR when using literals
- `pl.concat_str(...)`'s `ignore_nulls` arg field
- In `pl.read_csv(...)`, allow `separator=None` in order to read everything into only a single column
- pl.DataFrame loads in 2D lists in unexpected way HOT 5
- `join_asof` breaks with certain parquet files (I think due to memory layout or something?) HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.