GithubHelp home page GithubHelp logo

thvasilo / uncertain-trees-reproducible Goto Github PK

View Code? Open in Web Editor NEW
2.0 6.0 2.0 133.93 MB

Online random forests with prediction uncertainty

License: GNU General Public License v3.0

Shell 88.43% Python 11.57%
machine-learning online-learning random-forest uncertainty quantile-regression conformal-prediction

uncertain-trees-reproducible's Introduction

Reproducibility repository for Online Regression Forests

Using this repository you should be able to reproduce all the experiments we performed for our JMLR paper on online regression forests with uncertainty.

Follow the instructions to prepare you environment and data. The file reproduce-output.sh contains the commands to re-create the most important tables and figures in the paper.

Instructions

The repository uses submodules to keep track of the different repositories needed to run the algorithms, so ensure you clone using the --recursive option, i.e. git clone --recursive https://github.com/thvasilo/uncertain-trees-reproducible.git

Installing the dependencies

uncertain-trees-experiments and scikit-garden

There are a few Python libraries needed to run the project, so we recommend creating a virtual environment to avoid messing up your default environment. We have used the Anaconda Python distribution to make things easier.

We've made some small modifications to the original scikit-garden library, so we need to install it from the included submodule rather than the PyPI repository.

conda env create -f rf-pred.yml  # Installs the base dependencies as a new virtual env
source activate rf-pred
pip install -e ./scikit-garden  # Install the customized scikit-garden repo.

MOA

We recommend using the pre-built binaries under binaries. The only requirement is Java 8. We've tested with the Oracle JDK, OpenJDK seems to cause issues with the results.

Alternatively you can build the MOA distribution using Maven by running mvn package -DskipTests in moa/moa.

Obtaining the data

The stationary data are included with the repository under the data/small-mid directory. The large airlines data are compressed under data/airlines. To decompress them, cd into data/airlines and run:

for FILE in *.tar.gz; do tar -zxf ${FILE}; done

To re-create the Friedman data run the generate_friedman_data.sh script.

Re-creating the files

It's also possible to re-create the files using the scripts we've included in the data/airlines directory.

You just need to run in succession:

./get_data.sh
./create_splits.sh

These two scripts will pull the original data, transform to csv, apply the pre-processing steps, and create the 700k, 2M and 5M splits in arff format using Weka.

Running the experiments

After you've prepared the environment and data, to re-run the experiments from the paper we can use the example commands in reproduce-output.sh. We recommend running the experiments selectively and not simply running the script, because the runtime for the airlines experiments is very long. The experiments on the small-scale data should not take very long however.

NOTE: Due to the random nature of the algorithms the exact results will be slightly different from those reported in the paper, unfortunately we didn't keep track of all the random seeds used in our experiments. The overall performance of the algorithms should not change significantly however.

Troubleshooting

Ensure you did git clone --recursive https://github.com/thvasilo/uncertain-trees-reproducible.git. Please file an issue if you run into any problems.

Citing

If you use this work please cite our JMLR paper:

@article{JMLR:v20:19-006,
  author  = {Theodore Vasiloudis and Gianmarco De Francisci Morales and Henrik Bostr{{\"o}}m},
  title   = {Quantifying Uncertainty in Online Regression Forests},
  journal = {Journal of Machine Learning Research},
  year    = {2019},
  volume  = {20},
  number  = {155},
  pages   = {1-35},
  url     = {http://jmlr.org/papers/v20/19-006.html}
}

uncertain-trees-reproducible's People

Contributors

thvasilo avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.