GithubHelp home page GithubHelp logo

microsoft / ibac-sni Goto Github PK

View Code? Open in Web Editor NEW
50.0 5.0 17.0 54.45 MB

Code to reproduce the NeurIPS 2019 paper "Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck" by Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin and Katja Hofmann.

Home Page: https://arxiv.org/abs/1910.12911

License: Other

Dockerfile 0.08% Makefile 0.55% Python 83.11% C++ 16.23% Shell 0.02%

ibac-sni's Introduction

Introduction

This is the codebase for the NeurIPS 2019 paper "Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck" which was work done by Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin and Katja Hofmann during Maximilian's internship at Microsoft Research Cambridge (UK).

It comprises several sub-projects:

  1. toy-classification contains code for the classification experiment in the paper
  2. gym-minigrid contains the grid-world environment and is adapted from https://github.com/maximecb/gym-minigrid (BSD 3-Clause license)
  3. torch_rl contains the agent and training code to run on the gym-minigrid environment and is adapted from https://github.com/lcswillems/rl-starter-files (MIT license)
  4. coinrun contains the code for the main results on the coinrun domain. It is code adapted from https://github.com/openai/coinrun (MIT license)

Toy Classification Taks

To run the experiment, use

python experiment.py --iterate_fpc

or

python experiment.py --iterate_dps

. Using the addition --cuda uses GPUs. Hyperparameters for dropout, vib and l2w (=weight decay) can be set using --dropout_rate, --vib_beta and --l2w_beta. Other hyperparameters like learning rate, model size and number of epochs are hardcoded as variables in the python file.

For --iterate_fpc, the script iterates over the number of different patterns the encoding function f uses, with values [2, 4, 8, 16, 32, 64]. For --iterate_dps it iterates over the number of training data-points in [64, 128, 256, 512, 1024, 2048]. Both iterate over all 4 regularization techniques [VIB, L2W, Dropout, None]. Results will be saved in results/, with one *.npy file for each combination, with the used hyperparameters indicated in the filename.

Plotting

To plot the results use

python plots.py --plot_fc

or

python plots.py --plot_dbs

and it will look through the results/ folder for previously generated *.npy files. You can use --fixed_vib_param, --fixed_dropout_param, --fixed_l2w_param, to only plot the line for one value instead of all available values of the respective regularization technique.

Grid-world environment

First, install gym-minigrid with

cd gym-minigrid
pip install -e .

The original gym-minigrid is modified by adding Multiroom environments MiniGrid-MultiRoom-N2r-v0 and MiniGrid-MultiRoom-N3r-v0, as well as MiniGrid-Choice-9x9-v0 (which was not used, though).

Then also install the torch_rl:

cd ../torch_rl/torch_rl
pip install -e .

Furthermore, you'll need to set the "TORCH_RL_STORAGE" environmental variable, which determines where the results will be stored, e.g. by including export TORCH_RL_STORAGE=~/results in your ~/.bashrc.

The results from the paper can then be reproduced by running from the (outer!) torch_rl directory:

python -m scripts.train --frames 100000000 --algo ppo --env MiniGrid-MultiRoom-N3r-v0 --model N3r-vib1e6 --save-interval 100 --tb --fullObs --model_type default2 --use_bottleneck --beta 0.000001
python -m scripts.train --frames 100000000 --algo ppo --env MiniGrid-MultiRoom-N3r-v0 --model N3r-vibS1e6 --save-interval 100 --tb --fullObs --model_type default2 --use_bottleneck --beta 0.000001 --sni_type vib
python -m scripts.train --frames 100000000 --algo ppo --env MiniGrid-MultiRoom-N3r-v0 --model N3r-Plain --save-interval 100 --tb --fullObs --model_type default2
python -m scripts.train --frames 100000000 --algo ppo --env MiniGrid-MultiRoom-N3r-v0 --model N3r-l2w1e4 --save-interval 100 --tb --fullObs --model_type default2 --use_l2w --beta 0.0001
python -m scripts.train --frames 100000000 --algo ppo --env MiniGrid-MultiRoom-N3r-v0 --model N3r-dout0.2 --save-interval 100 --tb --fullObs --model_type default2 --use_dropout 0.2
python -m scripts.train --frames 100000000 --algo ppo --env MiniGrid-MultiRoom-N3r-v0 --model N3r-doutS0.2 --save-interval 100 --tb --fullObs --model_type default2 --use_dropout 0.2 --sni_type dropout

When using multiple runs, make sure to change the --model name, which determines the folder name of the results and also make sure to specify different random seeds using --seed <nr>.

Plotting

To plot the results, modify the plots.py file by changing the path, as well as the experiments dictionary to specify which subfolders in path you would like to plot.

Coinrun

Please follow the installation instructions taken from the original repo to install the requirements:

# Linux
apt-get install mpich build-essential qt5-default pkg-config
# Mac
brew install qt open-mpi pkg-config

cd coinrun
pip install tensorflow==1.12.0  # or tensorflow-gpu
pip install -r requirements.txt
pip install -e .

Also, in coinrun/coinrun/config.py set the self.WORKDIR and self.TB_DIR variables.

Reproducing Results

To reproduce the results, run on a NC24 with 4 GPUs (3 will be used for training, one for testing):

RCALL_NUM_GPU=4 mpiexec -n 4 python3 -m coinrun.train_agent --run-id baseline --num-levels 500 --test --long --l2 0.0001 -uda 1
RCALL_NUM_GPU=4 mpiexec -n 4 python3 -m coinrun.train_agent --run-id ibac-sni-lambda0.5 --num-levels 500 --test --long --l2 0.0001 -uda 1 --beta 0.0001 --nr-samples 12 --sni
RCALL_NUM_GPU=4 mpiexec -n 4 python3 -m coinrun.train_agent --run-id ibac-sni-lambda1.0 --num-levels 500 --test --l2 0.0001 -uda 1 --beta-l2a 0.0001 --long
RCALL_NUM_GPU=4 mpiexec -n 4 python3 -m coinrun.train_agent --run-id ibac --num-levels 500 --test --long --l2 0.0001 -uda 1 --beta 0.0001 --nr-samples 12
RCALL_NUM_GPU=4 mpiexec -n 4 python3 -m coinrun.train_agent --run-id dropout0.2-sni-lambda0.5 --num-levels 500 --test --long --l2 0.0001 -uda 1 --dropout 0.2 --sni2
RCALL_NUM_GPU=4 mpiexec -n 4 python3 -m coinrun.train_agent --run-id dropout0.2-sni-lambda1.0 --num-levels 500 --test --long --l2 0.0001 -uda 1 --dropout 0.2 --openai
RCALL_NUM_GPU=4 mpiexec -n 4 python3 -m coinrun.train_agent --run-id dropout0.2 --num-levels 500 --test --long --l2 0.0001 -uda 1 --dropout 0.2
RCALL_NUM_GPU=4 mpiexec -n 4 python3 -m coinrun.train_agent --run-id batchnorm --num-levels 500 --test --long --l2 0.0001 -uda 1 -norm 1

where all the results are including weight decay (--l2 0.0001) and data augmentation (-uda 1). Batchnorm is -norm 1, Dropout is --dropout 0.2, VIB is --beta 0.0001, L2 on Activations is --beta-l2a 0.0001 which corresponds to VIB-SNI with lambda=1. For dropout, we can either use SNI with lambda=0.5 by using --sni2 or with lamda=1.0 by using --openai.

The experiments, especially with the --long flag, take a while. If it's run on the VMs, it will likely crash at some point (around 6pm is particularly likely), probably because the servers are preemtible. If they do, you can restart with the additional arguments --restore-id <run-id> and --restore-step <step> where you can read out the step from the tensor-board plot.

Plotting

A note on the tensorboard plots: For each run, you will see 4 different folders 'name_0', 'name_1', etc.. The 'name_0' version is the performance on the training set. The 'name_1' version is the performance on the test set. Furthermore, to compare to the paper you'll need to multiply the number of frames by 3, as tensorboard reports the frames per worker, whereas the paper reports the total number of frames used for training.

Using plots.py, fill in the path variable, as well as plotname, plotname_kl and the experiments dictionary where each entry corresponds to one line which will be the average over all run-ids listed in the corresponding list (see the script for examples.)

ibac-sni's People

Contributors

maximilianigl avatar microsoft-github-operations[bot] avatar microsoftopensource avatar smdevlin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ibac-sni's Issues

Diverging results from those in the paper

Hi,
I ran the commands for IBAC, IBAC-SNI and NoReg from the Readme two times for 100 million steps, and I got the results below.
Do you have an idea why
(a) my return for IBAC is so much higher than in the paper and
(b) the return for IBAC-SNI is so low?

I know that training for only two times isn't much but at least for the two IBAC variants the variance was almost non-existent.
Sincerely, Marko

Not really an issue, more a question

Hi,
I want to reuse your experiment on MiniGrid as a benchmark to my paper on RL generalisation ... it fits nicely, but I am not clear how to replicate the experiment to generate the orange line on your paper, can you provide some insight ?
Are your running the training on 2 000 000 environments to generate the chart ?
Thanks a lot in advance.

How to use Bottleneck layer?

Hello, thanks for sharing this impressive work. I would be more than glad if you can explain how to use the proposed Bottleneck layer, as it is a bit difficult to understand that from the current code / paper e.g. where the Bottleneck layer should be placed in a Deep Learning architecture? It would be great to include a small toy example that demonstrates the usage. Sincerely, Kamer

The agent learn nothing in Grid-world environment

Hi
Thanks your great work.
I'm facing a problem of the visualization performence of the agent in Grid-world environment.
I train the agent successfully.
However, I found the agent learn nothing in Grid-world environment by visualize.py.
The agent chose it's action rundamly. It never success!

  1. I run python -m scripts.train --frames 100000000 --algo ppo --env MiniGrid-MultiRoom-N3r-v0 --model N3r-vib1e6 --save-interval 100 --tb --fullObs --model_type default2 --use_bottleneck --beta 0.000001
    And I got the model parameter in the folder /storage/N3r-vib1e6
  2. I run python -m scripts.train --frames 100000000 --algo ppo --env MiniGrid-MultiRoom-N3r-v0 --model N3r-vibS1e6 --save-interval 100 --tb --fullObs --model_type default2 --use_bottleneck --beta 0.000001 --sni_type vib
    And I got the model parameter in the folder /storage/N3r-vibS1e6
  3. I run plots.py
    And I got the picture below:
    Figure_1
  4. We can find that the return in step 3 is increase monotonously.
    However, when I run visualize.py for N3r-vib1e6 and N3r-vibS1e6 respectively, I found the agent learn nothing in Grid-world environment. The agent chose it's action rundamly. It never success!

Can you give some suggestions?

Thanks a lot!

Evaluate and visualize scripts don't seem to work

Hi there,
I've run into a problem running the evaluate.py and visualize.py scripts (using models trained on the gridworld).

Maybe it is a misunderstanding on my side, on how these scripts are meant to be used, but they don't seem to be "compatible" with the kind of ACModel (IBAC-SNI/torch_rl/model.py) saved after a training, as they don't contain the required attributes (e.g. recurrent) or methods (forward(..) )

Steps to reproduce:
1: Train a gridworld model as stated in the readme:

python -m scripts.train --frames 100000000 --algo ppo --env MiniGrid-MultiRoom-N3r-v0 --model N3r-vib1e6 --save-interval 100 --tb --fullObs --model_type default2 --use_bottleneck --beta 0.000001

2: Run the evaluation script on the result folder

python -m scripts.evaluate --env MiniGrid-MultiRoom-N2r-v0 --model N3r-vib1e6

Error message:

Traceback (most recent call last):
  File "/home/andreas/miniconda2/envs/ibac/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/andreas/miniconda2/envs/ibac/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/andreas/research/others/IBAC-SNI/torch_rl/scripts/evaluate.py", line 51, in <module>
    agent = utils.Agent(args.env, env.observation_space, model_dir, args.argmax, args.procs)
  File "/home/andreas/research/others/IBAC-SNI/torch_rl/utils/agent.py", line 17, in __init__
    if self.acmodel.recurrent:
  File "/home/andreas/miniconda2/envs/ibac/lib/python3.7/site-packages/torch/nn/modules/module.py", line 576, in __getattr__
    type(self).__name__, name))
AttributeError: 'ACModel' object has no attribute 'recurrent'

Thanks for your help!

How to draw figure for coinrun

Hello,

I'm wondering how to draw a figure like Fig.3 in the paper with your code (without re-writing the enjoy.py script)? Or does the original code already support this but I somehow missed, could you please point out?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.