GithubHelp home page GithubHelp logo

fast_rgf's Introduction


FastRGF

Multi-core implementation of Regularized Greedy Forest [RGF]

Version 0.6 (Feb 2018) by Tong Zhang

The active development of FastRGF is maintained now in RGF-team repository


1. Introduction

This software package provides a multi-core implementation of a simplified Regularized Greedy Forest (RGF) described in [RGF]. Please cite the paper if you find the software useful.

RGF is a machine learning method for building decision forests that have been used to win some kaggle competitions. In our experience it works better than gradient boosting on many relatively large datasets.

The implementation employs the following conepts described in the [RGF] paper:

  • tree node regularization
  • fully-corrective update
  • greedy node expansion with trade-off between leaf node splitting for current tree and root splitting for new tree

However, various simplifications are made to accelerate the training speed. Therefore, unlike the original RGF program (see http://tongzhang-ml.org/software/rgf/index.html), this software does not reproduce the results in the paper.

The implementation of greedy tree node optimization employs second order Newton approximation for general loss functions. For logistic regression loss, which works especially well for many binary classification problems, this approach was considered in [PL]; for general loss functions, 2nd order approximation was considered in [ZCS].

2. Installation

Please see the file CHANGES for version information. The software is written in c++11, and it has been tested under linux and macos, and it may require g++ version 4.8 or above and cmake version 2.8 or above.

To install the binaries, unpackage the software into a directory.

  • The source files are in the subdirectories include/ and src/.
  • The executables are under the subdirectory bin/.
  • The examples are under the subdirectory examples/.

To create the executables, do the following:

 cd build/
 cmake ..
 make 
 make install

The following executabels will be installed under the subdirectory bin/.

  • forest_train: train rgf and save model
  • forest_predict: apply trained model on test data

You may use the option -h to show command-line options (options can also be provided in a configuration file).

3. Examples

Go to the subdirectory examples/, and following the instructions in README.md. The file also contains some tips for parameter tuning.

4. Contact

Tong Zhang

5. Copyright

The software is distributed under the MIT license. Please read the file LICENSE.

6. References

[RGF] Rie Johnson and Tong Zhang. Learning Nonlinear Functions Using Regularized Greedy Forest, IEEE Trans. on Pattern Analysis and Machine Intelligence, 36:942-954, 2014.

[PL] Ping Li. Robust LogitBoost and Adaptive Base Class (ABC) LogitBoost, UAI 2010.

[ZCS] Zhaohui Zheng, Hongyuan Zha, Tong Zhang, Olivier Chapelle, Keke Chen, Gordon Sun. A general boosting method and its application to learning ranking functions for web search, NIPS 2007.

fast_rgf's People

Contributors

fukatani avatar strikerrus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fast_rgf's Issues

compiling error: cc1plus: error: unrecognized command line option "-std=c++11"

/usr/bin/make64 MAC=64
[ 8%] Building CXX object src/base/CMakeFiles/base.dir/utils.cpp.o
cc1plus: error: unrecognized command line option "-std=c++11"
cc1plus: error: unrecognized command line option "-std=c++11"
make64[2]: *** [src/base/CMakeFiles/base.dir/utils.cpp.o] Error 1
make64[1]: *** [src/base/CMakeFiles/base.dir/all] Error 2
make64: *** [all] Error 2

some c++11 feature used in code, so old version g++ will throw errors like this, we should upgrade gcc
I shoot it by http://ask.xmodulo.com/upgrade-gcc-centos.html
it works! May help others

Why testing is very slow

Training is very fast, why testing is so slow? It seems the whole testing procedure is single thread, but why the console output "using up to 12 threads" during testing?

cmake error

download ->

cd build/
cmake ..

outputs:

[ 52%] Linking CXX executable forest_train
CMakeFiles/forest_train.dir/forest_train.cpp.o: In function `TestOutput<unsigned short, int, unsigned char>::print_outputs(rgf::DecisionForest<unsigned short, int, unsigned char>&, int, int)':
forest_train.cpp:(.text._ZN10TestOutputItihE13print_outputsERN3rgf14DecisionForestItihEEii[_ZN10TestOutputItihE13print_outputsERN3rgf14DecisionForestItihEEii]+0xadd): undefined reference to `pthread_create'
../forest/libforest.a(forest.cpp.o): In function `rgf::DecisionForest<float, int, float>::apply(rgf::DataPoint<float, int, float>&, unsigned int, int)':
forest.cpp:(.text._ZN3rgf14DecisionForestIfifE5applyERNS_9DataPointIfifEEji[_ZN3rgf14DecisionForestIfifE5applyERNS_9DataPointIfifEEji]+0x3fe): undefined reference to `pthread_create'
../forest/libforest.a(forest.cpp.o): In function `rgf::DecisionForest<int, int, int>::apply(rgf::DataPoint<int, int, int>&, unsigned int, int)':
forest.cpp:(.text._ZN3rgf14DecisionForestIiiiE5applyERNS_9DataPointIiiiEEji[_ZN3rgf14DecisionForestIiiiE5applyERNS_9DataPointIiiiEEji]+0x3fe): undefined reference to `pthread_create'
../forest/libforest.a(forest.cpp.o): In function `rgf::DecisionForest<unsigned short, int, unsigned char>::apply(rgf::DataPoint<unsigned short, int, unsigned char>&, unsigned int, int)':
forest.cpp:(.text._ZN3rgf14DecisionForestItihE5applyERNS_9DataPointItihEEji[_ZN3rgf14DecisionForestItihE5applyERNS_9DataPointItihEEji]+0x3fe): undefined reference to `pthread_create'
../forest/libforest.a(forest.cpp.o): In function `void rgf::MapReduceRunner::run_threads<rgf::DecisionForest<unsigned short, int, unsigned char>::train(rgf::DataSet<unsigned short, int, unsigned char>&, double*, rgf::DecisionTree<unsigned short, int, unsigned char>::TrainParam&, rgf::DecisionForest<unsigned short, int, unsigned char>::TrainParam&, rgf::DataSet<unsigned short, int, unsigned char>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, rgf::DataDiscretization<int, int, int, int>*)::TrainEvalMR>(rgf::DecisionForest<unsigned short, int, unsigned char>::train(rgf::DataSet<unsigned short, int, unsigned char>&, double*, rgf::DecisionTree<unsigned short, int, unsigned char>::TrainParam&, rgf::DecisionForest<unsigned short, int, unsigned char>::TrainParam&, rgf::DataSet<unsigned short, int, unsigned char>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, rgf::DataDiscretization<int, int, int, int>*)::TrainEvalMR&, int, int, bool)':
forest.cpp:(.text._ZN3rgf15MapReduceRunner11run_threadsIZNS_14DecisionForestItihE5trainERNS_7DataSetItihEEPdRNS_12DecisionTreeItihE10TrainParamERNS3_10TrainParamES6_NSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPNS_18DataDiscretizationIiiiiEEE11TrainEvalMREEvRT_iib[_ZN3rgf15MapReduceRunner11run_threadsIZNS_14DecisionForestItihE5trainERNS_7DataSetItihEEPdRNS_12DecisionTreeItihE10TrainParamERNS3_10TrainParamES6_NSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPNS_18DataDiscretizationIiiiiEEE11TrainEvalMREEvRT_iib]+0x11b): undefined reference to `pthread_create'
../forest/libforest.a(forest.cpp.o):forest.cpp:(.text._ZN3rgf15MapReduceRunner11run_threadsIZNS_14DecisionForestIiiiE5trainERNS_7DataSetIiiiEEPdRNS_12DecisionTreeIiiiE10TrainParamERNS3_10TrainParamES6_NSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPNS_18DataDiscretizationIiiiiEEE11TrainEvalMREEvRT_iib[_ZN3rgf15MapReduceRunner11run_threadsIZNS_14DecisionForestIiiiE5trainERNS_7DataSetIiiiEEPdRNS_12DecisionTreeIiiiE10TrainParamERNS3_10TrainParamES6_NSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPNS_18DataDiscretizationIiiiiEEE11TrainEvalMREEvRT_iib]+0x11b): more undefined references to `pthread_create' follow
collect2: error: ld returned 1 exit status
src/exe/CMakeFiles/forest_train.dir/build.make:96: recipe for target 'src/exe/forest_train' failed
make[2]: *** [src/exe/forest_train] Error 1
CMakeFiles/Makefile2:204: recipe for target 'src/exe/CMakeFiles/forest_train.dir/all' failed
make[1]: *** [src/exe/CMakeFiles/forest_train.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

any hints/advices?

Segmentation fault when run examples

In ex1&2,run run.sh, I got these:

  • exe_train=../../bin/forest_train
  • exe_predict=../../bin/forest_predict
  • trn=inputs/madelon.train
  • tst=inputs/madelon.test
  • model_rgf=outputs/model-rgf
  • prediction=outputs/prediction
  • orig_format=y.sparse
  • save_freq=200
  • echo ------ training ------
    ------ training ------
  • ../../bin/forest_train trn.x-file=inputs/madelon.train trn.x-file_format=y.sparse trn.target=BINARY tst.x-file=inputs/madelon.test tst.x-file_format=y.sparse tst.target=BINARY model.save=outputs/model-rgf dtree.new_tree_gain_ratio=1.0 dtree.lamL2=5000 forest.ntrees=1000 dtree.loss=LOGISTIC forest.save_frequency=200

using up to 12 threads

loading training data ...

trn.target=BINARY

trn.x-file_format=y.sparse

trn.x-file=inputs/madelon.train

trn.y-file=

trn.w-file=

run.sh: line 19: 9835 Segmentation fault ${exe_train} trn.x-file=${trn} trn.x-file_format=${orig_format} trn.target=BINARY tst.x-file=${tst} tst.x-file_format=${orig_format} tst.target=BINARY model.save=${model_rgf} dtree.new_tree_gain_ratio=1.0 dtree.lamL2=5000 forest.ntrees=1000 dtree.loss=LOGISTIC forest.save_frequency=${save_freq}

It maybe caused by wrong compiling or just code itself?
How to solve it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.