GithubHelp home page GithubHelp logo

Comments (12)

simonjayhawkins avatar simonjayhawkins commented on April 26, 2024 1

FYI this is not only related to QList.

Thanks @bcmyguest1 for this example.

It does raise a warning FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison, so I guess this issue will resolve itself in a future pandas.

However, this example did work in pandas 1.3.5, so labelling as regression pending further investigation.

import pandas as pd

# case 1
df = pd.DataFrame({"d": [pd.NA]})
print(df.replace("", pd.NA))
      d
0  <NA>

Note: the FutureWarning was also given in pandas 1.3.5, but I think that it should probably have been suppressed.

from pandas.

simonjayhawkins avatar simonjayhawkins commented on April 26, 2024

Thanks @rainnnnny for the report.

labelling as constructor issue as I believe that the pd.Series constructor should be converting/materializing array-likes passed as data into standard numpy arrays for the in-memory storage and not storing the Qlist object directly.

>>> res._mgr.array
QList([ True,  True, False, False])

as a workaround

pandas.Series(
    data=np.asarray(qlist([True, True, False, False], qtype=1, adjust_dtype=False))
).replace(False, numpy.NaN)

gives

0    True
1    True
2     NaN
3     NaN
dtype: object

If there is a reason that you would want to retain the QList object in memory, should investigate creating an Extension Array https://pandas.pydata.org/pandas-docs/dev/development/extending.html#extensionarray

from pandas.

bcmyguest1 avatar bcmyguest1 commented on April 26, 2024
import pandas as pd
# case 1
df = pd.DataFrame({"d":[pd.NA]})
df.replace('',pd.NA) # throws the same error
# case 2
df = pd.DataFrame({"d":[None]})
df.replace('',pd.NA) # works fine

FYI this is not only related to QList.

new_mask ends up being [[False]] in case 2, but False in case 1.

from pandas.

simonjayhawkins avatar simonjayhawkins commented on April 26, 2024

However, this example did work in pandas 1.3.5, so labelling as regression pending further investigation.

first bad commit: [b2d54d9] BUG: IntegerArray/FloatingArray ufunc with 'out' kwd (#45122)

cc @jbrockmendel

from pandas.

jbrockmendel avatar jbrockmendel commented on April 26, 2024
  1. agreed with Simon that qlist shouldn't be held directly, or should be made an EA
  2. if it is made an EA, arr == x should be array-like
  3. we can change the check for ndarray to be specific to EA/BooleanArray
    3b) this goes in the pile of bugs created indirectly by pd.NA-workarounds

from pandas.

simonjayhawkins avatar simonjayhawkins commented on April 26, 2024

moving to 1.4.4

from pandas.

CloseChoice avatar CloseChoice commented on April 26, 2024

The reason why this goes wrong is that qlist inherits from np.ndarray therefore all our checks for this are passed and the qlist is stored directly. Maybe we could sanitize_array a bit to handle qlists correctly but maybe we can't get around an explicit cast to np.ndarray

from pandas.

simonjayhawkins avatar simonjayhawkins commented on April 26, 2024

3. we can change the check for ndarray to be specific to EA/BooleanArray
3b) this goes in the pile of bugs created indirectly by pd.NA-workarounds

@jbrockmendel I've opened a PR #48313 but just checked for bool also instead (and also suppressed the numpy warnings while making changes here, see #47101 (comment))

from pandas.

simonjayhawkins avatar simonjayhawkins commented on April 26, 2024

removing milestone, xref #47485 (comment)

from pandas.

ancri avatar ancri commented on April 26, 2024

As an ugly workaround until this gets patched, this seemed to work for me, at pandas/core/missing.py:112

image

from pandas.

ChrisMLikesMath avatar ChrisMLikesMath commented on April 26, 2024

I've run into this on pandas 2.0/2.1 as well on line 118 of missing.py,

new_mask = arr ==x
...
new_mask = new_mask.to_numpy(dtype=bool,na_value=False)

It definitely feels like a bug here. This is what appears to be happening for me:

  1. x is of type integer
  2. arr is of type list of integers (in my case a QList)
  3. The comparison arr == x checks if the integer is the list, or QList, which it is not.
  4. We expect an array but get a single bool since arr was never an array in the first place.

Generally, np.array() overrides the == operator to do this elementwise which is what is expected here.

I think the hack above won't quite handle that enumeration although it'll at least not throw an exception.

Probably it'd be best to check if arr is in fact iterable then convert to a numpy array or run the comparison elementwise at the point where we call arr == x since the fundamental assumption here as that arr performs that check on x elementwise and returns an array.

from pandas.

ChrisMLikesMath avatar ChrisMLikesMath commented on April 26, 2024

I think replacing arr == x with np.equal(arr, x) might do the trick with relative ease.

I think np.equal will check if either argument is an array/iterable and do that comparison elementwise.

Fundamentally, in my case, I feel like perhaps the issue might also partially fall on qpython.qcollect.QList which claims to be an np.ndarray but doesn't implement the comparison elementwise.

from pandas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.