Comments (10)
Hi @davechallis ,
thanks for the reproducer. The problem you found is caused by incorrect working with missing values for scipy csr matrices.
As a quick fix you can add to your code the following:
X_np = np.empty(X.shape)
X_np.fill(np.nan)
X_np[X.nonzero()] = X[X.nonzero()]
y_d4p = model_d4p.predict_proba(X_np)
Let me know if found any other problems!
from scikit-learn-intelex.
@razdoburdin Fantastic, that works perfectly! Thanks for your help, I've tested with my full example, and all works great.
from scikit-learn-intelex.
@davechallis ,
We don't have a native support of sparse matrices in GBT inference yet.
from scikit-learn-intelex.
Hi,
thanks for using daal4py!
I didn't catch from your example, why did you use predict
for xgboost and predict_proba
for d4p? These two methods are not equivalent to each over.
from scikit-learn-intelex.
@razdoburdin Hi, thanks for taking a look! I'm using the xgboost python API predict and not the scikit-learn wrapper, so the xgboost predict
function returns probabilities rather than labels (unless I'm misunderstanding the API).
E.g. in the example above, if I print out the first 5 results in each y
:
y_xgb = model_xgb.predict(xgb.DMatrix(X))
y_d4p = model_d4p.predict_proba(X)[:, 0]
print("xgb", y_xgb[:5])
print("d4p", y_d4p[:5])
this outputs:
xgb [0.00546392 0.00123668 0.00123668 0.00136322 0.00360578]
d4p [0.3883016 0.38513577 0.38620013 0.38620013 0.32219833]
from scikit-learn-intelex.
@razdoburdin Just to make it a clearer comparison, I also did a quick check with using the sklearn API and outputting the full results of predict_proba
in both cases:
import pickle
import xgboost as xgb
import daal4py as d4p
model_xgb_sklearn = xgb.XGBClassifier()
model_xgb_sklearn.load_model("model.bin")
model_d4p = d4p.convert_model(xgb.Booster().load_model("model.bin"))
with open("data.pkl", "rb") as fh:
X = pickle.load(fh)
y_xgb = model_xgb_sklearn.predict_proba(X)
y_d4p = model_d4p.predict_proba(X)
print("xgb")
print(y_xgb[:5])
print()
print("d4p")
print(y_d4p[:5])
which outputs:
xgb
[[0.9945361 0.00546392]
[0.9987633 0.00123668]
[0.9987633 0.00123668]
[0.9986368 0.00136322]
[0.9963942 0.00360578]]
d4p
[[0.3883016 0.6116984 ]
[0.38513577 0.61486423]
[0.38620013 0.61379987]
[0.38620013 0.61379987]
[0.32219833 0.67780167]]
from scikit-learn-intelex.
Hi @davechallis ,
I wasn't able to reproduce the problem.
Could you please provide some launchable reproducer?
from scikit-learn-intelex.
@razdoburdin no problem, give me a while to sample some data, and I'll try and get someone that works end to end uploaded here.
from scikit-learn-intelex.
@razdoburdin I've attached a zip file containing a python script (similar to the one posted above) named demo.py
, some data to classify in data.pkl
, and a trained XGBoost classifier in model.bin
.
If I run the script locally, I get:
X <class 'scipy.sparse._csr.csr_matrix'> float32 (50, 5300)
xgb
[[0.9945361 0.00546392]
[0.9987633 0.00123668]
[0.9987633 0.00123668]
[0.9986368 0.00136322]
[0.9963942 0.00360578]]
d4p
[[0.3883016 0.6116984 ]
[0.38513577 0.61486423]
[0.38620013 0.61379987]
[0.38620013 0.61379987]
[0.32219833 0.67780167]]
This is in a fresh python 3.12 environment, with the following packages installed:
- xgboost==2.0.3
- scipy==1.12.0
- daal4py==2024.3.0
Hopefully this helps a bit, but let me know if there's anything else I can provide to help.
from scikit-learn-intelex.
@razdoburdin One last thing I thought I'd ask/check - is there any other approach for this that doesn't involve converting to a dense matrix first? I've tested on a few hundred classifiers I've got, and some of them are extremely sparse, so I end up hitting memory issues when converting to a dense matrix.
Or if there's any documentation/source I can read to find out more, that'd also be great.
If not, then no problem, I can maybe check matrix density then swap between daal4py and native xgboost models depending on that.
from scikit-learn-intelex.
Related Issues (20)
- daal4py release notes HOT 1
- Feature Request - usage of Modin with GPU acceleration HOT 1
- SVR not working with intelex path HOT 11
- patch_sklearn() is not working but command line python3 -m sklearnex <file.py> is working HOT 5
- Crash and Cannot load onedal_thread.2.dll HOT 5
- Unable to run patch_sklearn() HOT 2
- INFO HOT 3
- model_selection.learning_curve breaks on some datasets HOT 1
- Logistic regression running with n_jobs=-1 HOT 12
- n_iters_ in SVR is always saved as 10,000 HOT 3
- Patching not implemented for GradientBoost Algorithms HOT 1
- DPC backend error on Windows HOT 1
- How to build scikit-learn-intelex from source HOT 1
- Memory leak DBSCAN HOT 3
- Not able to reproduce PCA optimization using sklearn HOT 3
- memory leak using SVR HOT 4
- Wrong results for single-prediction sparse-matrix input to SVC HOT 2
- Memory leak using RandomForestClassifier and PCA HOT 3
- How to change an sklearn trained model to an intelex model for inference? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scikit-learn-intelex.