GithubHelp home page GithubHelp logo

Comments (13)

NicolasHug avatar NicolasHug commented on May 22, 2024 1

I'm one of the authors of the histgradientboosting estimators, feel free to ping me if you have any question related to them!

from hummingbird.

interesaaat avatar interesaaat commented on May 22, 2024 1

Glad to see you here Nicholas :)

from hummingbird.

NicolasHug avatar NicolasHug commented on May 22, 2024 1

Yup I think you got it right

The array of nodes is initialized with all fields being 0. If a node doesn't have a left child that means it doesn't have a right child either so the left/right fields are 0, and the is_leaf field is True/1.

from hummingbird.

NicolasHug avatar NicolasHug commented on May 22, 2024 1

Why not convert the array "upstream" so that you can rely on the existing code for the non-hist estimators?

lefts = [tree_info.nodes[x]['left'] for x in range(len(tree_info.nodes))]
lefts = [idx if idx != 0 else -1 for idx in lefts]

from hummingbird.

ahmedkrmn avatar ahmedkrmn commented on May 22, 2024

Hi Matteo,
Can you give more context on how can I approach the implementation of this feature?

from hummingbird.

interesaaat avatar interesaaat commented on May 22, 2024

Hi Ahmed,
thanks for showing interest in Hummingbird! @ksaur has create a branch with some test. If you pull the branch and try to run this test file you should be getting something like the following:

hummingbird.ml.exceptions.MissingConverter: Unable to find converter for model type <class 'sklearn.ensemble._hist_gradient_boosting.gradient_boosting.HistGradientBoostingClassifier'>. It usually means the pipeline being converted contains a transformer or a predictor with no corresponding converter implemented. Please fill an issue at https://github.com/microsoft/hummingbird.

Next, to add a new operator in Hummingbird:

  • Go into hummingbird.ml.supported and add the HistGradientBoostingClassifier class to the _build_sklearn_operator_list (and to the documentation at the beginning of the file please!). This will basically tell to Hummingbird to recognize this new operator.
  • Now we need to actually add the operator converter. You can add the converter to hummingbird.ml.operator_converters.gbdt.py (just copy and past the last line in the file, and change "SklearnGradientBoostingClassifier" with "SklearnHistGradientBoostingClassifier"). This will tell Hummingbird that, to convert HistGradientBoostingClassifier it can use the same function of GradientBoostingClassifier. This will probably not work :)
  • The final step is to make the convert work. This will require some work on your side. Basically you can copy what we have for the GradientBoostingClassifier in convert_sklearn_gbdt_classifier into a new convert_sklearn_hist_gbdt_classifier function and try to map the tree parameters into the format understood by convert_sklearn_gbdt_classifier. You don't need to do anything more than this. convert_sklearn_gbdt_classifier will already pick the pytorch tree implementation for you, so there is no need to go deeper than this or implement anything in PyTorch.

Please share any doubt or question you may have!

from hummingbird.

ahmedkrmn avatar ahmedkrmn commented on May 22, 2024

Thanks @interesaaat for the detailed introduction and @NicolasHug for offering help!

I've installed the dependencies, and built the library using python setup.py install.
Everything is working fine. I ran the test_sklearn_histgbdt_converters.py that you've mentioned and it indeed provides the error message that you've said.

Anyways, I started implementing the convert_sklearn_hist_gbdt_classifier() function, but there is a problem that I would like to know your thoughts on:

First, I believe that the equivalent of tree_infos = operator.raw_operator.estimators_ from the convert_sklearn_gbdt_classifier() function would be tree_infos = operator.raw_operator._predictors in the new convert_sklearn_hist_gbdt_classifier().

The problem is that estimators_ is an array of DecisionTreeRegressor objects which have a tree_ property. This tree_ object has the following list of properties:
children_left, children_right, feature, threshold and value.

On the other hand, _predictors is an array of TreePredictor objects which has only one property nodes (arrays of nodes).

So my question is, how can we map the tree_infos in HistGradientBoostingClassifier to GradientBoostingClassifier?

from hummingbird.

NicolasHug avatar NicolasHug commented on May 22, 2024

So my question is, how can we map the tree_infos in HistGradientBoostingClassifier to GradientBoostingClassifier?

A tree object has many arrays of size n_nodes, i.e. it has one array per property as you noticed (children_left, children_right, etc)

However the predictor object of the hist-GBDT is different: it's a single structured numpy array, i.e. it's an array whose elements have a specific dtype with multiple entries. It's basically an array of structs, if we were in C.
The dtype is specified here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/_hist_gradient_boosting/common.pyx#L18

For example the threshold property of the root can be accessed via nodes[0]['threshold']. Its left child is in nodes[nodes[0]['left']], etc.

from hummingbird.

ahmedkrmn avatar ahmedkrmn commented on May 22, 2024

Thanks @NicolasHug for the clarification!

IIUC, the equivalent of:

tree_info = operator.raw_operator.estimators_[0][0]
lefts = tree_info.tree_.children_left

should be:

tree_info = operator.raw_operator._predictors[0][0]
lefts = [tree_info.nodes[x]['left'] for x in range(len(tree_info.nodes))]

when using hist-GBDT.


If that is the case, it seems like nodes which don't have left nodes are represented with 0 instead of -1:
lefts for GBDT: [1, 2, -1, -1, 5, -1, -1]
lefts for hist-GBDT: [1, 2, 0, 0, 5, 0, 0]

from hummingbird.

interesaaat avatar interesaaat commented on May 22, 2024

I think that for left, right and threshold we should have -1 instead of 0 in Hummingbird because the implementation looks for -1 values. (I might be wrong, but I am on the phone and I am having hard time checking the code) Anyway this is not hard :) Thanks Nicolas for the help!

from hummingbird.

ahmedkrmn avatar ahmedkrmn commented on May 22, 2024

OK great,
But wouldn't that require changing all the conditions with -1 to 0 in

def _find_max_depth(tree_parameters):

and
def _find_depth(node, current_depth):

from hummingbird.

ahmedkrmn avatar ahmedkrmn commented on May 22, 2024

@interesaaat would it be a good idea to compare against both -1 and 0 instead of -1 only in _find_max_depth() and leave the base case of recursion -1 as is for _find_depth(), or is there a better approach that you suggest?

from hummingbird.

ahmedkrmn avatar ahmedkrmn commented on May 22, 2024

Why not convert the array "upstream" so that you can rely on the existing code for the non-hist estimators?

lefts = [tree_info.nodes[x]['left'] for x in range(len(tree_info.nodes))]
lefts = [idx if idx != 0 else -1 for idx in lefts]

Yeah, that's better!

from hummingbird.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.