GithubHelp home page GithubHelp logo

mondrianforest's Introduction

This folder contains the scripts used in the following papers:

Mondrian Forests: Efficient Online Random Forests

Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh

Advances in Neural Information Processing Systems (NIPS), 2014.

Link to PDF

Mondrian Forests for Large-Scale Regression when Uncertainty Matters

Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh

Proceedings of AISTATS, 2016.

Link to PDF

Please cite the appropriate paper if you use this code.

I ran my experiments using Enthought python (which includes all the necessary python packages). If you are running a different version of python (e.g. anaconda), you will need the following python packages (and possibly other packages) to run the scripts:

  • numpy
  • scipy
  • matplotlib (for plotting Mondrian partitions)
  • pydot and graphviz (for printing Mondrian trees)
  • sklearn (for reading libsvm format files)

Some of the packages (e.g. pydot, matplotlib) are necessary only for '--draw_mondrian 1' option. If you just want to run experiments without plotting the Mondrians, these packages may not be necessary.

Paul Heideman has created requirements.txt, which makes it easy to install the packages using 'pip install -r requirements.txt'. Dan Stowell pointed out that dvipng package is required in ubuntu to draw the Mondrians.

The datasets are not included here; you need to download them from the UCI repository. You can run experiments using toy data though. Run commands.sh in process_data folder for automatically downloading and processing the datasets. I have tested these scripts only on Ubuntu, but it should be straightforward to process datasets in other platforms.

If you have any questions/comments/suggestions, please contact me at [email protected].

Code released under MIT license (see COPYING for more info).

Copyright © 2014 Balaji Lakshminarayanan


List of scripts in the src folder:

  • mondrianforest.py
  • mondrianforest_utils.py
  • mondrianforest_demo.py
  • utils.py

I have added mondrianforest_demo.py which supports fit and partial_fit methods.

Help on usage can be obtained by typing the following commands on the terminal:

./mondrianforest.py -h

Example usage:

./mondrianforest_demo.py --dataset toy-mf --n_mondrians 100 --budget -1 --normalize_features 1 --optype class

Examples that draw the Mondrian partition and Mondrian tree:

./mondrianforest_demo.py --draw_mondrian 1 --save 1 --n_mondrians 10 --dataset toy-mf --store_every 1 --n_mini 6 --tag demo --optype class

./mondrianforest_demo.py --draw_mondrian 1 --save 1 --n_mondrians 1 --dataset toy-mf --store_every 1 --n_mini 6 --tag demo --optype class

Example on a real-world dataset:

assuming you have successfully run commands.sh in process_data folder

./mondrianforest_demo.py --dataset satimage --n_mondrians 100 --budget -1 --normalize_features 1 --save 1 --data_path ../process_data/ --n_minibatches 10 --store_every 1 --optype class


I generated commands for parameter sweeps using 'build_cmds' script by Jan Gasthaus, available publicly at https://github.com/jgasthaus/Gitsby/tree/master/pbs/python.

Some examples of parameter sweeps are:

./build_cmds ./mondrianforest_demo.py "--op_dir={results}" "--init_id=1:1:6" "--dataset={letter,satimage,usps,dna,dna-61-120}" "--n_mondrians={100}" "--save={1}" "--discount_factor={10.0}" "--budget={-1}" "--n_minibatches={100}" "--bagging={0}" "--store_every={1}" "--normalize_features={1}" "--data_path={../process_data/}" >> run

Note that the results (predictions, accuracy, log predictive probability on training/test data, runtimes) are stored in the pickle files. You need to write additional scripts to aggregate the results from these pickle files and generate the plots.

mondrianforest's People

Contributors

balajiln avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mondrianforest's Issues

Key error

Hey there

There seems to be a bug on example code mondrianforest_demo.py with PLOT=True :

Traceback (most recent call last):
  File "mondrianforest_demo.py", line 71, in <module>
    pred_mean = pred_forest_test['pred_mean']
KeyError: 'pred_mean'

pred_forest_test only has one key 'pred_prob'. The code tries to access both pred_mean and pred_var.

Manuel

"dvipng" is needed

Hi, nice to see the code published! I tried running the demo command ./mondrianforest.py --draw_mondrian 1 --save 1 --n_mondrians 10 --dataset toy-mf --store_every 1 --n_mini 6 --tag demo on my ubuntu machine and got:

       sh: 1: dvipng: not found

That's the only dependency missing from your readme, for an ubuntu machine, I think. If I install the dvipng package then it works.

Demo code fails due to missing "results" folder

I ran the demo code ./mondrianforest.py --draw_mondrian 1 --save 1 --n_mondrians 10 --dataset toy-mf --store_every 1 --n_mini 6 --tag demo and it terminated on:

          IOError: [Errno 2] No such file or directory: 'results/toy-mf-mf-budg--1_nmon-10_mini-6_discount-10-param-0-init_id-1-bag-0-tag-demo-mondrians_minibatch-0.pdf'

A mkdir fixed that. Maybe add the folder to the repository? (With a dummy file in, since git doesn't understand empty folders.) Or create the folder from python.

Bagging

Is the bagging capability functional? I've run the code with settings.bagging=1 and settings.bagging=0 and I get the same answer both times. So I was wondering if there is a problem with the bagging?

discount = 0 for leaf nodes leads to non-smooth predictive distributions

For leaf nodes, max_split_costs is large, which often leads to discount = 0 if discount_parameter is large (>10). Due to this, predictive distributions at the leaves tend not to be smooth.

For example, the following are the predictive distributions at the leaves when testing with the provided toy dataset and default parameters (for a single tree):

pred_prob:
[[ 1.  0.  0.]
 [ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]
 [ 0.  0.  1.]]

Is this the intended behaviour?

Storing training samples from previous iterations?

Great project, have been looking at possibilities to modify this code to support streaming learning on the fly. One question I have has to do with what happens to training examples from previous iterations:

It seems that after mf.fit() is run and mf.partial_fit() is called, the nodes are still holding references (indicies) to previous training samples:

self.train_ids[node_id] = np.append(self.train_ids[node_id], train_ids_new)

in the method add_training_points_to_node

When I try to replace (rather than concatenate) data["x_train"] with new training samples, I get an error in get_data_min_max(data, train_ids) which is caused by train_ids being empty.

scikit-learn compatibility

Could you provide a class that implements fit and predict methods, like in scikit-learn?

We met at MLSS Kyoto two years ago ;-)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.