GithubHelp home page GithubHelp logo

kirillseva / sklearn-compiledtrees Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ajtulloch/sklearn-compiledtrees

0.0 1.0 0.0 374 KB

Compiled Decision Trees for scikit-learn

Home Page: tullo.ch/articles/decision-tree-evaluation/

License: MIT License

Makefile 2.73% Python 97.27%

sklearn-compiledtrees's Introduction

Scikit-Learn Compiled Trees

Build Status PyPI

Installation

Released under the MIT License.

pip install sklearn-compiledtrees

Rationale

In some use cases, predicting given a model is in the hot-path, so speeding up decision tree evaluation is very useful.

An effective way of speeding up evaluation of decision trees can be to generate code representing the evaluation of the tree, compile that to optimized object code, and dynamically load that file via dlopen/dlsym or equivalent.

See https://courses.cs.washington.edu/courses/cse501/10au/compile-machlearn.pdf for a detailed discussion, and http://tullo.ch/articles/decision-tree-evaluation/ for a more pedagogical explanation and more benchmarks in C++.

This package implements compiled decision tree evaluation for the simple case of a single-output regression tree or ensemble.

It has been tested to work on both OS X and Linux. We do not currently support Windows platforms for compiled evaluation, although this should not be a signficant amount of work.

Usage

import compiledtrees
import sklearn.ensemble

X_train, y_train, X_test, y_test = ...

clf = ensemble.GradientBoostingRegressor()
clf.fit(X_train, y_train)

compiled_predictor = compiledtrees.CompiledRegressionPredictor(clf)
predictions = compiled_predictor.predict(X_test)

Benchmarks

For random forests, we see 5x to 8x speedup in evaluation. For gradient boosted ensembles, it's between a 1.5x and 3x speedup in evaluation. This is due to the fact that gradient boosted trees already have an optimized prediction implementation.

There is a benchmark script attached that allows us to examine the performance of evaluation across a range of ensemble configurations and datasets.

In the graphs attached, GB is Gradient Boosted, RF is Random Forest, D1, etc correspond to setting max-depth=1, and B10 corresponds to setting max_leaf_nodes=10.

Graphs

for dataset in friedman1 friedman2 friedman3 uniform hastie; do
    python ../benchmarks/bench_compiled_tree.py \
        --iterations=10 \
        --num_examples=1000 \
        --num_features=50 \
        --dataset=$dataset \
        --max_estimators=300 \
        --num_estimator_values=6
done

timings3907426606273805268 timings-1162001441413946416 timings5617004024503483042 timings2681645894201472305 timings2070620222460516071

sklearn-compiledtrees's People

Contributors

ajtulloch avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.