GithubHelp home page GithubHelp logo

TypeError when fitting GridSearchCV or RandomizedSearchCV with OrdinalEncoder and OneHotEncoder in parameters grid about scikit-learn HOT 13 CLOSED

BriceChivu avatar BriceChivu commented on September 28, 2024
TypeError when fitting GridSearchCV or RandomizedSearchCV with OrdinalEncoder and OneHotEncoder in parameters grid

from scikit-learn.

Comments (13)

MarcoGorelli avatar MarcoGorelli commented on September 28, 2024 2

thanks for the ping - this seems to be the issue:

(Pdb) p param_list
[OneHotEncoder(sparse_output=False), OrdinalEncoder()]
(Pdb) p np.result_type(*param_list)
dtype('float64')
(Pdb) p np.array(param_list).dtype
dtype('O')

I find it a bit surprising that np.result_type gives 'float64' here

from scikit-learn.

MarcoGorelli avatar MarcoGorelli commented on September 28, 2024 2

wait wut

In [6]: OrdinalEncoder().dtype
Out[6]: numpy.float64

from scikit-learn.

adrinjalali avatar adrinjalali commented on September 28, 2024 1

Could you please provide a minimal reproducer?

  • remove the extra bits from the code which do not contribute to the error
  • use a dataset from sklearn.datasets
  • the code should run without requiring extra datasets by simply copy pasting the code.

from scikit-learn.

lesteve avatar lesteve commented on September 28, 2024 1

It's good to have a fix in scikit-learn, but I think the numpy behaviour is unexpected so I opened numpy/numpy#26612.

from scikit-learn.

BriceChivu avatar BriceChivu commented on September 28, 2024

Could you please provide a minimal reproducer?

  • remove the extra bits from the code which do not contribute to the error
  • use a dataset from sklearn.datasets
  • the code should run without requiring extra datasets by simply copy pasting the code.

Thanks for your comment. I modified the issue's description accordingly.

from scikit-learn.

adrinjalali avatar adrinjalali commented on September 28, 2024

This seems to be another one related to dtypes of the result in grid search. @lesteve @MarcoGorelli WDYT?

from scikit-learn.

lesteve avatar lesteve commented on September 28, 2024

I can confirm this still happens in main. I have modified the snippet to not use force_int_remainder_cols (new ColumnTransformer parameter in 1.5) and the snippet runs on 1.4 so this seems like a regression indeed.

This is possible that this is the dtype tweak in grid-search .cv_results_ #28352. I did the previous bug fix so I am happy to let @MarcoGorelli take this one 😉.

from scikit-learn.

lesteve avatar lesteve commented on September 28, 2024

Oh dear, OrdinalEncoder has a dtype parameter and hence a .dtype attribute. np.result_type probably relies on the .dtype attribute? Edit: same thing for OneHotEncoder.

from scikit-learn.

adrinjalali avatar adrinjalali commented on September 28, 2024

In a sense, it does make sense that result_type is float64, since result_type implies result of an operation on those values. But we just want to create an array here, so maybe we should get the dtype of a created array instead?

from scikit-learn.

MarcoGorelli avatar MarcoGorelli commented on September 28, 2024

I think that creates other issues #28352 (comment) which @thomasjpfan wanted to avoid

It might be simplest to just check if any object in param_list is an instance of BaseEstimator, and if so, set arr_dtype to object?

Got a call coming up but I can submit a pr later

from scikit-learn.

adrinjalali avatar adrinjalali commented on September 28, 2024

Not everything is a BaseEstimator. A third party estimator might not be inheriting from BaseEstimator and that breaks this then.

We could check if anything is not a scaler of a simple object maybe? Not sure.

from scikit-learn.

MarcoGorelli avatar MarcoGorelli commented on September 28, 2024

Ah thanks

A third-party estimator should still implement fit and predict/transform though? Maybe just check for those attributes?


As an aside, I expect that the dtype property might create other problems going forwards? It looks like it's not documented https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#onehotencoder , so that may make the case for renaming it?

from scikit-learn.

adrinjalali avatar adrinjalali commented on September 28, 2024

dtype is documented. As a constructor argument, which becomes an attribute with the same name. So we can't easily rename it.

Checking for fit and predict (or any other Protocol) would also not be okay. I think we might end up in odd situations where some odd attribute / constructor argument is a random object.

from scikit-learn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.