Comments (13)
I'm one of the authors of the histgradientboosting estimators, feel free to ping me if you have any question related to them!
from hummingbird.
Glad to see you here Nicholas :)
from hummingbird.
Yup I think you got it right
The array of nodes is initialized with all fields being 0. If a node doesn't have a left child that means it doesn't have a right child either so the left/right fields are 0, and the is_leaf
field is True/1.
from hummingbird.
Why not convert the array "upstream" so that you can rely on the existing code for the non-hist estimators?
lefts = [tree_info.nodes[x]['left'] for x in range(len(tree_info.nodes))]
lefts = [idx if idx != 0 else -1 for idx in lefts]
from hummingbird.
Hi Matteo,
Can you give more context on how can I approach the implementation of this feature?
from hummingbird.
Hi Ahmed,
thanks for showing interest in Hummingbird! @ksaur has create a branch with some test. If you pull the branch and try to run this test file you should be getting something like the following:
hummingbird.ml.exceptions.MissingConverter: Unable to find converter for model type <class 'sklearn.ensemble._hist_gradient_boosting.gradient_boosting.HistGradientBoostingClassifier'>. It usually means the pipeline being converted contains a transformer or a predictor with no corresponding converter implemented. Please fill an issue at https://github.com/microsoft/hummingbird.
Next, to add a new operator in Hummingbird:
- Go into
hummingbird.ml.supported
and add theHistGradientBoostingClassifier
class to the_build_sklearn_operator_list
(and to the documentation at the beginning of the file please!). This will basically tell to Hummingbird to recognize this new operator. - Now we need to actually add the operator converter. You can add the converter to
hummingbird.ml.operator_converters.gbdt.py
(just copy and past the last line in the file, and change"SklearnGradientBoostingClassifier"
with"SklearnHistGradientBoostingClassifier"
). This will tell Hummingbird that, to convertHistGradientBoostingClassifier
it can use the same function ofGradientBoostingClassifier
. This will probably not work :) - The final step is to make the convert work. This will require some work on your side. Basically you can copy what we have for the
GradientBoostingClassifier
inconvert_sklearn_gbdt_classifier
into a newconvert_sklearn_hist_gbdt_classifier
function and try to map the tree parameters into the format understood byconvert_sklearn_gbdt_classifier
. You don't need to do anything more than this.convert_sklearn_gbdt_classifier
will already pick the pytorch tree implementation for you, so there is no need to go deeper than this or implement anything in PyTorch.
Please share any doubt or question you may have!
from hummingbird.
Thanks @interesaaat for the detailed introduction and @NicolasHug for offering help!
I've installed the dependencies, and built the library using python setup.py install
.
Everything is working fine. I ran the test_sklearn_histgbdt_converters.py
that you've mentioned and it indeed provides the error message that you've said.
Anyways, I started implementing the convert_sklearn_hist_gbdt_classifier()
function, but there is a problem that I would like to know your thoughts on:
First, I believe that the equivalent of tree_infos = operator.raw_operator.estimators_
from the convert_sklearn_gbdt_classifier()
function would be tree_infos = operator.raw_operator._predictors
in the new convert_sklearn_hist_gbdt_classifier()
.
The problem is that estimators_
is an array of DecisionTreeRegressor
objects which have a tree_
property. This tree_
object has the following list of properties:
children_left
, children_right
, feature
, threshold
and value
.
On the other hand, _predictors
is an array of TreePredictor
objects which has only one property nodes
(arrays of nodes).
So my question is, how can we map the tree_infos
in HistGradientBoostingClassifier
to GradientBoostingClassifier
?
from hummingbird.
So my question is, how can we map the tree_infos in HistGradientBoostingClassifier to GradientBoostingClassifier?
A tree
object has many arrays of size n_nodes
, i.e. it has one array per property as you noticed (children_left, children_right, etc)
However the predictor object of the hist-GBDT is different: it's a single structured numpy array, i.e. it's an array whose elements have a specific dtype with multiple entries. It's basically an array of structs, if we were in C.
The dtype is specified here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/_hist_gradient_boosting/common.pyx#L18
For example the threshold
property of the root can be accessed via nodes[0]['threshold']
. Its left child is in nodes[nodes[0]['left']]
, etc.
from hummingbird.
Thanks @NicolasHug for the clarification!
IIUC, the equivalent of:
tree_info = operator.raw_operator.estimators_[0][0]
lefts = tree_info.tree_.children_left
should be:
tree_info = operator.raw_operator._predictors[0][0]
lefts = [tree_info.nodes[x]['left'] for x in range(len(tree_info.nodes))]
when using hist-GBDT.
If that is the case, it seems like nodes which don't have left nodes are represented with 0
instead of -1
:
lefts
for GBDT: [1, 2, -1, -1, 5, -1, -1]
lefts
for hist-GBDT: [1, 2, 0, 0, 5, 0, 0]
from hummingbird.
I think that for left, right and threshold we should have -1 instead of 0 in Hummingbird because the implementation looks for -1 values. (I might be wrong, but I am on the phone and I am having hard time checking the code) Anyway this is not hard :) Thanks Nicolas for the help!
from hummingbird.
OK great,
But wouldn't that require changing all the conditions with -1
to 0
in
and
from hummingbird.
@interesaaat would it be a good idea to compare against both -1
and 0
instead of -1
only in _find_max_depth()
and leave the base case of recursion -1
as is for _find_depth()
, or is there a better approach that you suggest?
from hummingbird.
Why not convert the array "upstream" so that you can rely on the existing code for the non-hist estimators?
lefts = [tree_info.nodes[x]['left'] for x in range(len(tree_info.nodes))] lefts = [idx if idx != 0 else -1 for idx in lefts]
Yeah, that's better!
from hummingbird.
Related Issues (20)
- xgboost tweedie loss predictions do not match HOT 2
- Pandas 2.0.0 breaks pyspark which breaks our tests HOT 1
- New onnx - failed workflow run HOT 2
- PyTorch SGDClassifier `predict` result does not match Sklearn model
- Compilation issue in Prophet HOT 4
- AttributeError: 'NoneType' object has no attribute 'split' HOT 4
- [Question] Support for custom estimators and custom transformers HOT 7
- Update github actions vers
- How to compose hummingbird model with other torchscript models HOT 4
- TVM MacOS pipeline intermittent failures HOT 3
- incorrect prediction from torchscript model converted from xgboost classifier trained with multi-label dataset HOT 8
- Build is failing - SKL 1.3 release HOT 1
- TVM MacOS pipeline failing again HOT 1
- New LGBM Version 4.0.0 changes HOT 2
- TVM + Mac HOT 3
- Performing simple inference HOT 3
- XGBoost 2.0.0 breaks tests HOT 8
- Example cases for DecisionTreeClassifier HOT 2
- onnxruntime==1.16.0 release breaks tests HOT 3
- if i have saved a pytorch_based model, i want to run inference on cpu, how to change the codes? HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hummingbird.