Comments (12)
FYI this is not only related to QList.
Thanks @bcmyguest1 for this example.
It does raise a warning FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
, so I guess this issue will resolve itself in a future pandas.
However, this example did work in pandas 1.3.5, so labelling as regression pending further investigation.
import pandas as pd
# case 1
df = pd.DataFrame({"d": [pd.NA]})
print(df.replace("", pd.NA))
d
0 <NA>
Note: the FutureWarning
was also given in pandas 1.3.5, but I think that it should probably have been suppressed.
from pandas.
Thanks @rainnnnny for the report.
labelling as constructor issue as I believe that the pd.Series
constructor should be converting/materializing array-likes passed as data
into standard numpy arrays for the in-memory storage and not storing the Qlist object directly.
>>> res._mgr.array
QList([ True, True, False, False])
as a workaround
pandas.Series(
data=np.asarray(qlist([True, True, False, False], qtype=1, adjust_dtype=False))
).replace(False, numpy.NaN)
gives
0 True
1 True
2 NaN
3 NaN
dtype: object
If there is a reason that you would want to retain the QList object in memory, should investigate creating an Extension Array https://pandas.pydata.org/pandas-docs/dev/development/extending.html#extensionarray
from pandas.
import pandas as pd
# case 1
df = pd.DataFrame({"d":[pd.NA]})
df.replace('',pd.NA) # throws the same error
# case 2
df = pd.DataFrame({"d":[None]})
df.replace('',pd.NA) # works fine
FYI this is not only related to QList.
new_mask ends up being [[False]] in case 2, but False in case 1.
from pandas.
However, this example did work in pandas 1.3.5, so labelling as regression pending further investigation.
first bad commit: [b2d54d9] BUG: IntegerArray/FloatingArray ufunc with 'out' kwd (#45122)
from pandas.
- agreed with Simon that qlist shouldn't be held directly, or should be made an EA
- if it is made an EA,
arr == x
should be array-like - we can change the check for ndarray to be specific to EA/BooleanArray
3b) this goes in the pile of bugs created indirectly by pd.NA-workarounds
from pandas.
moving to 1.4.4
from pandas.
The reason why this goes wrong is that qlist
inherits from np.ndarray
therefore all our checks for this are passed and the qlist
is stored directly. Maybe we could sanitize_array
a bit to handle qlists correctly but maybe we can't get around an explicit cast to np.ndarray
from pandas.
3. we can change the check for ndarray to be specific to EA/BooleanArray
3b) this goes in the pile of bugs created indirectly by pd.NA-workarounds
@jbrockmendel I've opened a PR #48313 but just checked for bool also instead (and also suppressed the numpy warnings while making changes here, see #47101 (comment))
from pandas.
removing milestone, xref #47485 (comment)
from pandas.
As an ugly workaround until this gets patched, this seemed to work for me, at pandas/core/missing.py:112
from pandas.
I've run into this on pandas 2.0/2.1 as well on line 118 of missing.py,
new_mask = arr ==x
...
new_mask = new_mask.to_numpy(dtype=bool,na_value=False)
It definitely feels like a bug here. This is what appears to be happening for me:
- x is of type integer
- arr is of type list of integers (in my case a QList)
- The comparison arr == x checks if the integer is the list, or QList, which it is not.
- We expect an array but get a single bool since arr was never an array in the first place.
Generally, np.array() overrides the == operator to do this elementwise which is what is expected here.
I think the hack above won't quite handle that enumeration although it'll at least not throw an exception.
Probably it'd be best to check if arr is in fact iterable then convert to a numpy array or run the comparison elementwise at the point where we call arr == x since the fundamental assumption here as that arr performs that check on x elementwise and returns an array.
from pandas.
I think replacing arr == x with np.equal(arr, x) might do the trick with relative ease.
I think np.equal will check if either argument is an array/iterable and do that comparison elementwise.
Fundamentally, in my case, I feel like perhaps the issue might also partially fall on qpython.qcollect.QList which claims to be an np.ndarray but doesn't implement the comparison elementwise.
from pandas.
Related Issues (20)
- DOC: Private-looking symbols in the public API? HOT 12
- DOC: Remove references to `AxesSubPlot` HOT 5
- DOC: indicate that `origin` argument can be a string that is timestamp convertible HOT 2
- BUG: Strange behavior with read_csv when index_col is set HOT 2
- DOC: Grammar Error in 'Contributing to pandas' documentation HOT 1
- ENH: pandas.plotting.scatter_matrix to support plotting function hexbin in addition to scatter
- DEPR: freq ''2BQ-SEP" for to_period should raise an error
- PERF: pd.BooleanDtype in row operations is still very slow HOT 8
- BUG: Can't get hdf5 data exrport from .NET framework HOT 2
- ENH: More Helpful Error Message for Concatenating Different Time Resolutions HOT 2
- ENH: allow list-like level in MultiIndex.get_level_values HOT 1
- DOC: Clarify `df.describe()` behavior with Timestamp columns HOT 4
- ENH: Allow users to disable PerformanceWarning HOT 1
- BUG: "None" in column name tuple changed to "nan" after concat HOT 5
- DOC: Document a few more methods for Categorical Array HOT 4
- BUG: KeyError when loading csv with NaNs HOT 3
- BUG: to_dict inconsistency HOT 1
- DOC: Highlight the difference between DataFrame/pd.Series/numpy ops when there are NA values HOT 13
- BUG: error indexing non-nanosecond datetime outside of nanosecond datetime range HOT 1
- PERF: Series.apply is slower on single element dict compared with multi elements dict HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pandas.