Deion This is kind of nitpicking, but I can't tell you how m

We will rename fill_null to <code class="notranslate"

We will rename fill_null to <code class=

fill_nan / <code class="notranslat

Standardize references to "null" in methods about polars HOT 9 CLOSED

Mateuscvieira commented on August 16, 2024 2

Standardize references to "null" in methods

from polars.

Comments (9)

stinodego commented on August 16, 2024 2

We will rename fill_null to fill_nulls. This aligns with has_nulls / drop_nulls.

Also for fill_nan -> fill_nans.

from polars.

Julian-J-S commented on August 16, 2024 1

We will rename fill_null to fill_nulls. This aligns with has_nulls / drop_nulls.

Also for fill_nan -> fill_nans.

different opinion here! imo we should drop the "s" for all cases

Reasons

fill_nan / fill_null are ROW level operation. For each row you fill a single nan / null. This is also in line with column identifiers which should be singular instead of plural ("name" instead of "names", "age" instead of "ages", "value" instead of "values", ...)
has_nulls is strictly speaking semantically incorrect. nulls as a plural implies that it checks for multiple null values in a column which is not the case! It checks for a single null and returns if the column contains a single null (potentially more) -> should be has_null
drop_nulls: strong opinion for a rename to drop_null_rows which would make it much more descriptive! This was probably "copied" from pandas fillna which is much different because it can also remove columns. Otherwise would also lean towards drop_null

from polars.

stinodego commented on August 16, 2024

fill_nan / fill_null are ROW level operation. For each row you fill a single nan / null. This is also in line with column identifiers which should be singular instead of plural ("name" instead of "names", "age" instead of "ages", "value" instead of "values", ...)

Not sure. fill_nulls says "fill all the nulls in this column with value X". What makes you think this is a row-level operation?

has_nulls is strictly speaking semantically incorrect. nulls as a plural implies that it checks for multiple null values in a column which is not the case! It checks for a single null and returns if the column contains a single null (potentially more) -> should be has_null

I have considered this, but you conveniently gloss over the fact that has_null isn't quite correct either because it also returns true with multiple nulls. Neither has_null or has_nulls is completely correct, but contains_at_least_one_null is too long, so we have to choose.

drop_nulls: strong opinion for a rename to drop_null_rows which would make it much more descriptive! This was probably "copied" from pandas fillna which is much different because it can also remove columns. Otherwise would also lean towards drop_null

An expression doesn't have any rows, so drop_null_rows makes no sense.

For what it's worth, personally I feel like fill_null, drop_null, has_null feels better, but I cannot really make a good argument for it.

Anyway, I'll sleep on this one before merging it, but I'm not convinced at all by your arguments here.

from polars.

gab23r commented on August 16, 2024

I personally prefer the singular form as well. These functions drop/fill or check the existence of any null value so it makes sense to remove the s.

Moreover, I think the fill_null is much more used than drop_nulls and has_nulls, so it will break less code.

from polars.

Julian-J-S commented on August 16, 2024

I have considered this, but you conveniently gloss over the fact that has_null isn't quite correct either because it also returns true with multiple nulls. Neither has_null or has_nulls is completely correct, but contains_at_least_one_null is too long, so we have to choose.

has_null is completely correct imo. It answers the question if the column has/contains null. The quantity is irrelevat here 🤓.

Thinking about this a bit more the best solution imo would be actually be to completely remove has_null(s) and introduce Expr.contains (contains(None) with fast-path) as a super-function. Everyone is familiar with this concept. (Also contains isn't called contains_at_least_one because that is implied)

Thoughts? 💭
I think that would improve the api (there is already 5x contains but not yet on Expr) 🤓😎

from polars.

lyngc commented on August 16, 2024

Also prefer singular

from polars.

stinodego commented on August 16, 2024

Moreover, I think the fill_null is much more used than drop_nulls and has_nulls, so it will break less code.

Not sure about fill_null vs drop_nulls frequency, but has_nulls was added very recently so renaming it will be very low impact.

from polars.

ritchie46 commented on August 16, 2024

After some thought I agree with @JulianCologne. I was thinking about this yesterday and is_null is an elementwise question. The same can be said for fill_null.

I think we should ask if it is a single row/elementwise question and if so go for singular.

drop_nulls isn't an elementwise operation, so it is fine to be plural here. As the longer version would be drop_null_rows.

I want to put this on a hold as I don't think this merits a change at all.

from polars.

stinodego commented on August 16, 2024

I want to put this on a hold as I don't think this merits a change at all.

Agree. Status quo is fine, I think. I'll close this for now.

from polars.

Standardize references to "null" in methods about polars HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs