GithubHelp home page GithubHelp logo

Enum filtering has broken. about polars HOT 4 OPEN

Andre-Medina avatar Andre-Medina commented on June 27, 2024
Enum filtering has broken.

from polars.

Comments (4)

cmdlineluser avatar cmdlineluser commented on June 27, 2024 1

Do you happen to know what version this worked on?

Going backwards, I get a different error on 0.20.21

TypeError: invalid literal value: 'State.VIC'

But it fails on every version I've tried.

from polars.

orlp avatar orlp commented on June 27, 2024 1

I'm not even sure if this is actually a bug.

from polars.

mcrumiller avatar mcrumiller commented on June 27, 2024 1

We do not automatically convert from python Enum types to polars Enum Series; you can put in a feature request for this. Note that the dtype of your dataframe is simply a string:

>>> data
shape: (6, 1)
┌─────────────────┐
│ state           │
│ ---             │
│ str             │
╞═════════════════╡
│ victoria        │
│ victoria        │
│ victoria        │
│ new south wales │
│ new south wales │
│ new south wales │
└─────────────────┘

type(State.VIC) is <enum 'State'>, so polars is trying to filter a string column based on an object, and doesn't like it. State.VIC.value is a string, and so the filter works.

So tl;dr you are not actually performing enum filtering, but trying to filter a string column based on a python Enum object, which polars does not recognize.

from polars.

Andre-Medina avatar Andre-Medina commented on June 27, 2024

Sorry guys, I just re-tested on previous versions and found I missed something key. The enum class was a string enum, not just an enum.

from enum import Enum
import polars as pl

class State(str, Enum):  # NOTE: missed the `str` before the enum
    
    VIC = "victoria"
    NSW = "new south wales"

data = pl.DataFrame({
    'state': [State.VIC] * 3 + [State.NSW] * 3  
})

print(data)

prints out: (note: correctly converts the 'enum' to a string)

shape: (6, 1)
┌─────────────────┐
│ state           │
│ ---             │
│ str             │
╞═════════════════╡
│ victoria        │
│ victoria        │
│ victoria        │
│ new south wales │
│ new south wales │
│ new south wales │
└─────────────────┘

Filtering if on 0.20.31:

>>> print(data.filter(pl.col('state') == State.VIC)) # Does not filter
shape: (0, 1)
┌───────┐
│ state │
│ ---   │
│ str   │
╞═══════╡
└───────┘
>>> print(data.filter(pl.col('state') == State.VIC.value)) # Adding the .value, filters correctly
shape: (3, 1)
┌──────────┐
│ state    │
│ ---      │
│ str      │
╞══════════╡
│ victoria │
│ victoria │
│ victoria │
└──────────┘

However, back in 0.20.25

>>> print(data.filter(pl.col('state') == State.VIC)) # With or without the .value, filters correctly 
shape: (3, 1)
┌──────────┐
│ state    │
│ ---      │
│ str      │
╞══════════╡
│ victoria │
│ victoria │
│ victoria │
└──────────┘

This string enum also works in other libraries, so there's precedent for it to also work in polars:

assert State.VIC == "victoria"
my_dict = {}
my_dict[State.VIC] = 'foo'
print(my_dict['victoria']) # prints out 'foo'

# Or in pandas
import pandas as pd
data_pd = pd.DataFrame({'state': [State.VIC] * 3 + [State.NSW] * 3})
print(data_pd.loc[lambda df: df['state'] == State.VIC])
# Prints:
#        state
# 0  State.VIC
# 1  State.VIC
# 2  State.VIC

from polars.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.