Comments (13)
thanks for the ping - this seems to be the issue:
(Pdb) p param_list
[OneHotEncoder(sparse_output=False), OrdinalEncoder()]
(Pdb) p np.result_type(*param_list)
dtype('float64')
(Pdb) p np.array(param_list).dtype
dtype('O')
I find it a bit surprising that np.result_type
gives 'float64'
here
from scikit-learn.
wait wut
In [6]: OrdinalEncoder().dtype
Out[6]: numpy.float64
from scikit-learn.
Could you please provide a minimal reproducer?
- remove the extra bits from the code which do not contribute to the error
- use a dataset from
sklearn.datasets
- the code should run without requiring extra datasets by simply copy pasting the code.
from scikit-learn.
It's good to have a fix in scikit-learn, but I think the numpy behaviour is unexpected so I opened numpy/numpy#26612.
from scikit-learn.
Could you please provide a minimal reproducer?
- remove the extra bits from the code which do not contribute to the error
- use a dataset from
sklearn.datasets
- the code should run without requiring extra datasets by simply copy pasting the code.
Thanks for your comment. I modified the issue's description accordingly.
from scikit-learn.
This seems to be another one related to dtypes of the result in grid search. @lesteve @MarcoGorelli WDYT?
from scikit-learn.
I can confirm this still happens in main
. I have modified the snippet to not use force_int_remainder_cols
(new ColumnTransformer
parameter in 1.5) and the snippet runs on 1.4 so this seems like a regression indeed.
This is possible that this is the dtype tweak in grid-search .cv_results_
#28352. I did the previous bug fix so I am happy to let @MarcoGorelli take this one 😉.
from scikit-learn.
Oh dear, OrdinalEncoder
has a dtype
parameter and hence a .dtype
attribute. np.result_type
probably relies on the .dtype
attribute? Edit: same thing for OneHotEncoder
.
from scikit-learn.
In a sense, it does make sense that result_type
is float64
, since result_type
implies result of an operation on those values. But we just want to create an array here, so maybe we should get the dtype of a created array instead?
from scikit-learn.
I think that creates other issues #28352 (comment) which @thomasjpfan wanted to avoid
It might be simplest to just check if any object in param_list
is an instance of BaseEstimator
, and if so, set arr_dtype
to object
?
Got a call coming up but I can submit a pr later
from scikit-learn.
Not everything is a BaseEstimator
. A third party estimator might not be inheriting from BaseEstimator
and that breaks this then.
We could check if anything is not a scaler of a simple object maybe? Not sure.
from scikit-learn.
Ah thanks
A third-party estimator should still implement fit and predict/transform though? Maybe just check for those attributes?
As an aside, I expect that the dtype
property might create other problems going forwards? It looks like it's not documented https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#onehotencoder , so that may make the case for renaming it?
from scikit-learn.
dtype
is documented. As a constructor argument, which becomes an attribute with the same name. So we can't easily rename it.
Checking for fit
and predict
(or any other Protocol
) would also not be okay. I think we might end up in odd situations where some odd attribute / constructor argument is a random object.
from scikit-learn.
Related Issues (20)
- ⚠️ CI failed on Linux_Runs.pylatest_conda_forge_mkl (last failure: Sep 01, 2024) ⚠️ HOT 1
- Remove outdated brand file identity.pdf HOT 3
- Add a LogTransformer and a LogWithShiftTransformer HOT 1
- Request for Clarification on the Structure of tree_model._predictors[0][0].nodes in HistGradientBoosting Models
- Default argument pos_label=1 is not ignored in f1_score metric for multiclass classification HOT 1
- Improve documentation to specify the interface of metric as a callable in KNNImputer HOT 10
- spin docs --no-plot runs the examples
- Expose Seed in FeatureHasher and HashingVectorizer HOT 5
- ⚠️ CI failed on Wheel builder (last failure: Aug 31, 2024) ⚠️ HOT 1
- Compiling Fails due to sklearn/metrics/pairwise.py HOT 7
- z HOT 1
- C regularization parameter error when assigned infinity HOT 3
- Implementation of fit_transform in ColumnTransformer HOT 1
- CI CUDA CI not running in lock-file update automated PR HOT 4
- Running RFECV.fit inside joblib.Parallel causes ValueError or AttributeError HOT 2
- Big problem with scikit-learn on Python311 when installing (FreeBSD) HOT 2
- Include T-Processes Subclass of Gaussian-Processes HOT 6
- Discrepancy between .fit_transform() and .transform() methods in the LLE module HOT 1
- Ensure RandomizedSearchCV (and other optimizers) skips duplicated hyperparameter combinations HOT 2
- Importing sklearn takes too much time compared to other imports except spaCy HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scikit-learn.