Comments (9)
I like family
. I also like flavor
, no more or less. Some more alternatives I don't like (but others might): system
, set
, base
, strain
.
from pandas.
I've seen some user confusion [citation needed] stemming from the term "backend" in the "dtype_backend" parameter. It gives the incorrect impression that behaviors are the same across backends, just with different implementations or performance characteristics.
Is there evidence that users would not be confused if it was called e.g. dtype_family
?
I feel like this is something that would happen eventually as long as the numpy/arrow dtypes shared names (e.g. "int64" vs "int64[pyarrow]").
from pandas.
Is there evidence that users would not be confused if it was called e.g. dtype_family?
I don't understand the question. We haven't used any other terms... "backend" has connotations of swappability and an invariant frontend that wouldn't apply to other terms.
from pandas.
I'm asking since renaming a parameter causes a lot of code churn.
For me, personally, it is not clear what a dtype family or flavor is, while dtype backend gives me the understanding that the underlying arrays backing my Series/DataFrame is arrow/numpy/whatever. So, IMO, dtype_backend is more clear than the other terms.
I've seen some user confusion [citation needed] stemming from the term "backend" in the "dtype_backend" parameter. It gives the incorrect impression that behaviors are the same across backends, just with different implementations or performance characteristics.
I guess the [citation needed] part was what I was asking for in my previous question. If you could dig that up, that'd be really helpful.
from pandas.
I'm asking since renaming a parameter causes a lot of code churn.
Totally reasonable concern. My thought is that ATM this is used relatively little, so is easier to change than it would be after #58141 and related.
I guess the [citation needed] part was what I was asking for in my previous question. If you could dig that up, that'd be really helpful.
Also fair. I think there was a lot of confusion surfaced in https://www.reddit.com/r/Python/comments/11fio85/we_are_the_developers_behind_pandas_currently/ about what "backend" means. I remember other things on hackernews that I'm not inclined to dig up. Searching our issues for "backend" i see #53154 has a user expecting identical behavior. I'll update this as I find more of these, as I think "incorrectly expecting identical behavior" is a common complaint.
from pandas.
Is there evidence that users would not be confused if it was called e.g. dtype_family?
I also initially agree with @lithomas1's question here. I'm not fully convinced (yet) that renaming a keyword argument would be able to convey "pick a dtype implementation that is not fully equivalent to the other options". I am open to there being a better term though.
from pandas.
#58307 another case of incorrectly expecting identical behavior
from pandas.
It gives the incorrect impression that behaviors are the same across backends, just with different implementations or performance characteristics.
Personally, I think this is actually the correct impression. It's how I think most users should think about the backends (so in that sense I don't have a problem with the current naming).
I know that in practice this of course not correct in all cases right now, but it could be what we want it to be eventually. And so whenever we get a report about different behaviours, it might be something we should fix.
It's something that we should discuss and spell out, tough, what we generally think the expectations should be about those different backends (maybe as part of the PDEP discussion in #58455)
from pandas.
Reading the room, I'm going to learn to live with users continuing to be confused by this name. Closing.
from pandas.
Related Issues (20)
- ENH: Follow dict color in pandas plotting HOT 2
- BUG: Pandas 2.2.2 incompatible with Numpy 2. AttributeError: _ARRAY_API not found HOT 2
- BUG: Series constructor with category dtype does not raise with unknown categories HOT 5
- ENH: Extend to_numeric to Convert Hexadecimal, Octal, and Binary Strings with Prefixes HOT 1
- BUG: regression in master for DataFrame.sparse.from_spmatrix
- BUG: pd.unique() does not accept NumpyExtensionArray HOT 2
- BUG: `pandas.tseries.frequencies.to_offset()` raises `ValueError` when parsing a `LastWeekOfMonth` frequency string HOT 2
- NON-BUG: `to_csv()` argument `float_format` has no effect, always saves with format of "%.2g" HOT 2
- ENH: Replacing behavior currently provided by pandas.to_numeric using errors="ignore"
- BUG: unexpected behavior when Loading lines like "\""\t"a" HOT 5
- BUG: SQL connection HOT 1
- ENH: is it worth fixing a warning from a third party library here HOT 5
- BUG: AttributeError about dateutil.relativedelta when calling pd.read_json(json_data) HOT 5
- DOC: sentence fragment in "String methods"
- DOC: "list" is not a keyword - .query() HOT 1
- BUG: json_normalize KeyError Key not found HOT 1
- ENH: Consistent API between `pd.get_dummies()` and `Series.str.get_dummies()` HOT 3
- ENH: Reduce type requirements for the subset parameter in drop_duplicates/duplicated HOT 1
- BUG: read_sql tries to convert blob/varbinary to string with pyarrow backend HOT 2
- BUG: Pandas squashes 1-dimensional Numpy array with shape (1,) down to a 0-dimensional array HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pandas.