Describe the bug Having both OrdinalEnco

thanks for the ping - this seems to be the issue: <div class="highlight highlight-

wait wut <div class="snippet-clipboard-content notranslate position-relative overf

Oh dear, OrdinalEncoder has a <code class="notranslat

In a sense, it does make sense that result_type is <c

I think that creates other issues <a class="issue-link js-issue-link" data-error-text=

TypeError when fitting GridSearchCV or RandomizedSearchCV with OrdinalEncoder and OneHotEncoder in parameters grid about scikit-learn HOT 13 CLOSED

BriceChivu commented on September 28, 2024

TypeError when fitting GridSearchCV or RandomizedSearchCV with OrdinalEncoder and OneHotEncoder in parameters grid

from scikit-learn.

Comments (13)

MarcoGorelli commented on September 28, 2024 2

thanks for the ping - this seems to be the issue:

(Pdb) p param_list
[OneHotEncoder(sparse_output=False), OrdinalEncoder()]
(Pdb) p np.result_type(*param_list)
dtype('float64')
(Pdb) p np.array(param_list).dtype
dtype('O')

I find it a bit surprising that np.result_type gives 'float64' here

from scikit-learn.

MarcoGorelli commented on September 28, 2024 2

wait wut

In [6]: OrdinalEncoder().dtype
Out[6]: numpy.float64

from scikit-learn.

adrinjalali commented on September 28, 2024 1

Could you please provide a minimal reproducer?

remove the extra bits from the code which do not contribute to the error
use a dataset from sklearn.datasets
the code should run without requiring extra datasets by simply copy pasting the code.

from scikit-learn.

lesteve commented on September 28, 2024 1

It's good to have a fix in scikit-learn, but I think the numpy behaviour is unexpected so I opened numpy/numpy#26612.

from scikit-learn.

BriceChivu commented on September 28, 2024

Could you please provide a minimal reproducer?

remove the extra bits from the code which do not contribute to the error

use a dataset from sklearn.datasets

the code should run without requiring extra datasets by simply copy pasting the code.

Thanks for your comment. I modified the issue's description accordingly.

from scikit-learn.

adrinjalali commented on September 28, 2024

This seems to be another one related to dtypes of the result in grid search. @lesteve @MarcoGorelli WDYT?

from scikit-learn.

lesteve commented on September 28, 2024

I can confirm this still happens in main. I have modified the snippet to not use force_int_remainder_cols (new ColumnTransformer parameter in 1.5) and the snippet runs on 1.4 so this seems like a regression indeed.

This is possible that this is the dtype tweak in grid-search .cv_results_ #28352. I did the previous bug fix so I am happy to let @MarcoGorelli take this one 😉.

from scikit-learn.

lesteve commented on September 28, 2024

Oh dear, OrdinalEncoder has a dtype parameter and hence a .dtype attribute. np.result_type probably relies on the .dtype attribute? Edit: same thing for OneHotEncoder.

from scikit-learn.

adrinjalali commented on September 28, 2024

In a sense, it does make sense that result_type is float64, since result_type implies result of an operation on those values. But we just want to create an array here, so maybe we should get the dtype of a created array instead?

from scikit-learn.

MarcoGorelli commented on September 28, 2024

I think that creates other issues #28352 (comment) which @thomasjpfan wanted to avoid

It might be simplest to just check if any object in param_list is an instance of BaseEstimator, and if so, set arr_dtype to object?

Got a call coming up but I can submit a pr later

from scikit-learn.

adrinjalali commented on September 28, 2024

Not everything is a BaseEstimator. A third party estimator might not be inheriting from BaseEstimator and that breaks this then.

We could check if anything is not a scaler of a simple object maybe? Not sure.

from scikit-learn.

MarcoGorelli commented on September 28, 2024

Ah thanks

A third-party estimator should still implement fit and predict/transform though? Maybe just check for those attributes?

As an aside, I expect that the dtype property might create other problems going forwards? It looks like it's not documented https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#onehotencoder , so that may make the case for renaming it?

from scikit-learn.

adrinjalali commented on September 28, 2024

dtype is documented. As a constructor argument, which becomes an attribute with the same name. So we can't easily rename it.

Checking for fit and predict (or any other Protocol) would also not be okay. I think we might end up in odd situations where some odd attribute / constructor argument is a random object.

from scikit-learn.

TypeError when fitting GridSearchCV or RandomizedSearchCV with OrdinalEncoder and OneHotEncoder in parameters grid about scikit-learn HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs