GithubHelp home page GithubHelp logo

isabella232 / fast_retraining Goto Github PK

View Code? Open in Web Editor NEW

This project forked from azure/fast_retraining

0.0 0.0 0.0 2.72 MB

Show how to perform fast retraining with LightGBM in different business cases

License: MIT License

Jupyter Notebook 96.44% Python 3.50% Shell 0.06%

fast_retraining's Introduction

Fast Retraining

In this repo we compare two of the fastest boosted decision tree libraries: XGBoost and LightGBM. We will evaluate them across datasets of several domains and different sizes.

On July 25, 2017, we published a blog post evaluating both libraries and discussing the benchmark results. The post is Lessons Learned From Benchmarking Fast Machine Learning Algorithms.

Installation and Setup

The installation instructions can be found here.

Project

In the folder experiments you can find the different experiments of the project. We developed 6 experiments with the CPU and GPU versions of the libraries.

  • Airline
  • BCI
  • Football
  • Planet Kaggle
  • Fraud Detection
  • HIGGS

In the folder experiment/libs there is the common code for the project.

Benchmark

In the following table there are summarized the time results (in seconds) and the ratio of the benchmarks performed in the experiments:

Dataset Experiment Data size Features xgb time:
CPU (GPU)
xgb_hist time:
CPU (GPU)
lgb time:
CPU (GPU)
ratio xgb/lgb:
CPU (GPU)
ratio xgb_hist/lgb:
CPU
(GPU)
Football Link CPU
Link GPU
19673 46 2.27 (7.09) 2.47 (4.58) 0.58 (0.97) 3.90
(7.26)
4.25
(4.69)
Fraud Detection Link CPU
Link GPU
284807 30 4.34 (5.80) 2.01 (1.64) 0.66 (0.29) 6.58
(19.74)
3.04
(5.58)
BCI Link CPU
Link GPU
20497 2048 11.51 (12.93) 41.84 (42.69) 7.31 (2.76) 1.57
(4.67)
5.72
(15.43)
Planet Kaggle Link CPU
Link GPU
40479 2048 313.89 (-) 2115.28 (2028.43) 194.57 (317.68) 1.61
(-)
10.87
(6.38)
HIGGS Link CPU
Link GPU
11000000 28 2996.16 (-) 121.21 (114.88) 119.34 (71.87) 25.10
(-)
1.01
(1.59)
Airline Link CPU
Link GPU
115069017 13 - (-) 1242.09 (1271.91) 1056.20 (645.40) -
(-)
1.17
(1.97)

In the next table we summarize the performance results using the F1-Score.

Dataset Experiment Data size Features xgb F1:
CPU (GPU)
xgb_hist F1:
CPU (GPU)
lgb F1:
CPU (GPU)
Football Link
Link
19673 46 0.458 (0.470) 0.460 (0.472) 0.459 (0.470)
Fraud Detection Link
Link
284807 30 0.824 (0.821) 0.802 (0.814) 0.813 (0.811)
BCI Link
Link
20497 2048 0.110 (0.093) 0.142 (0.120) 0.137 (0.138)
Planet Kaggle Link
Link
40479 2048 0.805 (-) 0.822 (0.822) 0.822 (0.821)
HIGGS Link
Link
11000000 28 0.763 (-) 0.767 (0.767) 0.768 (0.767)
Airline Link
Link
115069017 13 - (-) 0.741 (0.745) 0.732 (0.745)

The experiments were run on an Azure NV24 VM with 24 cores and 224 GB memory. The machine has 4 NVIDIA M60 GPUs. In both cases we used Ubuntu 16.04.

Contributing

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

fast_retraining's People

Contributors

microsoftopensource avatar miguelgfierro avatar msalvaris avatar msftgits avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.