GithubHelp home page GithubHelp logo

guoyu07 / fast_rgf Goto Github PK

View Code? Open in Web Editor NEW

This project forked from baidu/fast_rgf

0.0 0.0 0.0 3.38 MB

Multi-core implementation of Regularized Greedy Forest

License: MIT License

CMake 1.05% C++ 98.95%

fast_rgf's Introduction


FastRGF

Multi-core implementation of Regularized Greedy Forest [RGF]

Version 0.2 (August 2016) by Tong Zhang


1. Introduction

This software package provides a multi-core implementation of a simplified Regularized Greedy Forest (RGF) described in [RGF]. Please cite the paper if you find the software useful.

RGF is a machine learning method for building decision forests that have been used to win some kaggle competitions. In our experience it works better than gradient boosting on many relatively large data.

The implementation employs the following conepts described in the [RGF] paper:

  • tree node regularization
  • fully-corrective update
  • greedy node expansion with trade-off between leaf node splitting for current tree and root splitting for new tree

However, various simplifications are made to accelerate the training speed. Therefore, unlike the original RGF program (see http://stat.rutgers.edu/home/tzhang/software/rgf/), this software does not reproduce the results in the paper.

The implementation of greedy tree node optimization employs second order Newton approximation for general loss functions. For logistic regression loss, which works especially well for many binary classification problems, this approach was considered in [PL]; for general loss functions, 2nd order approximation was considered in [ZCS].

2. Installation

Please see the file CHANGES for version information. The software is written in c++11, and it has been tested under linux and macos, and it may require g++ version 4.8 or above and cmake version 2.8 or above.

If you use g++-4.8, after running the exmaples, you may get error messages similar to the following:

terminate called after throwing an instance of 'std::system_error'
what():  Enable multithreading to use std::thread: Operation not permitted

If this occurs, you need to add the -pthread flag in CMakeLists.txt to the variable CMAKE_CXX_FLAGS in order to enable multi-threading. This problem seems to be a bug in the g++ compiler. There may be variations of this problem specific to your system that require different fixes.

To install the binaries, unpackage the software into a directory.

  • The source files are in the subdirectories include/ and src/.
  • The executables are under the subdirectory bin/.
  • The examples are under the subdirectory examples/.

To create the executables, do the following:

 cd build/
 cmake ..
 make 
 make install

The following executabels will be installed under the subdirectory bin/.

  • forest_train: train rgf and save model
  • forest_predict: apply trained model on test data

You may use the option -h to show command-line options (options can also be provided in a configuration file).

3. Examples

Go to the subdirectory examples/, and following the instructions in README.md (it also contains some tips for parameter tuning).

4. Contact

Tong Zhang

5. Copyright

The software is distributed under the MIT license. Please read the file LICENSE.

6. References

[RGF] Rie Johnson and Tong Zhang. Learning Nonlinear Functions Using Regularized Greedy Forest, IEEE Trans. on Pattern Analysis and Machine Intelligence, 36:942-954, 2014.

[PL] Ping Li. Robust LogitBoost and Adaptive Base Class (ABC) LogitBoost, UAI 2010.

[ZCS] Zhaohui Zheng, Hongyuan Zha, Tong Zhang, Olivier Chapelle, Keke Chen, Gordon Sun. A general boosting method and its application to learning ranking functions for web search, NIPS 2007.

fast_rgf's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.