Comments (9)
We will rename fill_null
to fill_nulls
. This aligns with has_nulls
/ drop_nulls
.
Also for fill_nan
-> fill_nans
.
from polars.
We will rename
fill_null
tofill_nulls
. This aligns withhas_nulls
/drop_nulls
.Also for
fill_nan
->fill_nans
.
different opinion here! imo we should drop the "s" for all cases
Reasons
fill_nan
/fill_null
are ROW level operation. For each row you fill a singlenan
/null
. This is also in line with column identifiers which should be singular instead of plural ("name" instead of "names", "age" instead of "ages", "value" instead of "values", ...)has_nulls
is strictly speaking semantically incorrect.nulls
as a plural implies that it checks for multiplenull
values in a column which is not the case! It checks for a singlenull
and returns if the column contains a singlenull
(potentially more) -> should behas_null
drop_nulls
: strong opinion for a rename todrop_null_rows
which would make it much more descriptive! This was probably "copied" from pandasfillna
which is much different because it can also remove columns. Otherwise would also lean towardsdrop_null
from polars.
fill_nan
/fill_null
are ROW level operation. For each row you fill a singlenan
/null
. This is also in line with column identifiers which should be singular instead of plural ("name" instead of "names", "age" instead of "ages", "value" instead of "values", ...)
Not sure. fill_nulls
says "fill all the nulls in this column with value X". What makes you think this is a row-level operation?
has_nulls
is strictly speaking semantically incorrect.nulls
as a plural implies that it checks for multiplenull
values in a column which is not the case! It checks for a singlenull
and returns if the column contains a singlenull
(potentially more) -> should behas_null
I have considered this, but you conveniently gloss over the fact that has_null
isn't quite correct either because it also returns true with multiple nulls. Neither has_null
or has_nulls
is completely correct, but contains_at_least_one_null
is too long, so we have to choose.
drop_nulls
: strong opinion for a rename todrop_null_rows
which would make it much more descriptive! This was probably "copied" from pandasfillna
which is much different because it can also remove columns. Otherwise would also lean towardsdrop_null
An expression doesn't have any rows, so drop_null_rows
makes no sense.
For what it's worth, personally I feel like fill_null
, drop_null
, has_null
feels better, but I cannot really make a good argument for it.
Anyway, I'll sleep on this one before merging it, but I'm not convinced at all by your arguments here.
from polars.
I personally prefer the singular form as well. These functions drop/fill or check the existence of any null value so it makes sense to remove the s.
Moreover, I think the fill_null is much more used than drop_nulls and has_nulls, so it will break less code.
from polars.
I have considered this, but you conveniently gloss over the fact that has_null isn't quite correct either because it also returns true with multiple nulls. Neither has_null or has_nulls is completely correct, but contains_at_least_one_null is too long, so we have to choose.
has_null
is completely correct imo. It answers the question if the column has/contains null
. The quantity is irrelevat here 🤓.
Thinking about this a bit more the best solution imo would be actually be to completely remove has_null(s)
and introduce Expr.contains
(contains(None)
with fast-path) as a super-function. Everyone is familiar with this concept. (Also contains
isn't called contains_at_least_one
because that is implied)
Thoughts? 💭
I think that would improve the api (there is already 5x contains
but not yet on Expr
) 🤓😎
from polars.
Also prefer singular
from polars.
Moreover, I think the fill_null is much more used than drop_nulls and has_nulls, so it will break less code.
Not sure about fill_null
vs drop_nulls
frequency, but has_nulls
was added very recently so renaming it will be very low impact.
from polars.
After some thought I agree with @JulianCologne. I was thinking about this yesterday and is_null
is an elementwise question. The same can be said for fill_null
.
I think we should ask if it is a single row/elementwise question and if so go for singular.
drop_nulls
isn't an elementwise operation, so it is fine to be plural here. As the longer version would be drop_null_rows
.
I want to put this on a hold as I don't think this merits a change at all.
from polars.
I want to put this on a hold as I don't think this merits a change at all.
Agree. Status quo is fine, I think. I'll close this for now.
from polars.
Related Issues (20)
- Make series raw display output to not use single quotes to be consistent with the DataFrame raw display outputs. HOT 9
- ParquetWriter<CloudWriter> hangs when uploading to S3
- When using `.implode().list` in `GroupBy.agg`, the return type is list, not the original type of the value. HOT 2
- `Series.search_sorted` gives wrong answer when using expression arguments HOT 4
- Panic occurring when using streaming and limit with Parquet
- `Expr.replace` and `Expr.replace_strict` set "NO_DEFAULT" as a value HOT 1
- Parquet files cannot be read from pre-signed S3 URLs due to S3 forbidding HTTP HEAD HOT 8
- Wrong result when filtering categorical using `.is_in` in `scan_parquet` HOT 2
- `.over()` fails with `.get`. HOT 6
- Don't disable coalesce when joining on expressions HOT 2
- entered unreachable code when using collect_schema and concat HOT 1
- Use of write_ipc with file=None
- Examples of using `scan_csv` with cloud URIs HOT 2
- Consistency in check_exact between assert_frame_equal and polars.DataFrame.equals
- `compat_level` documentation is unclear
- Request to return LazyFrame for pl.from_arrow and more HOT 2
- rolling_corr giving inconsistent results
- `test_read_database_cx_credentials` expected exception does not bifurcate over correct Python version HOT 1
- In `scan_parquet()`, `include_file_paths` returns twice the same column
- In `bin.size()`, the `unit` parameter is not described HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.