GithubHelp home page GithubHelp logo

3d_control_deep_rl's Introduction

3D Control and Reasoning without a Supercomputer

A collection of scenarios and efficient benchmarks for the ViZDoom RL environment. For further details refer to the paper

This repository includes:

  • Source code for generation of custom scenarios for the ViZDoom simulator
  • Source code for training new agents with the GPU-batched A2C algorithm
  • Detailed instructions of how to evaluate pretrained agents and train new ones
  • Example videos of rollouts of the agent.

Contents

Installation

Requirements

  • Ubuntu 16.04+ (there is no reason this will not work on Mac or Windows but I have not tested)
  • python 3.5+
  • PyTorch 0.4.0+
  • ViZDoom 1.1.4 (if evaluating a pretrained model, otherwise the latest vision should be fine)

Instructions

  1. ViZDoom has many dependencies which are described on their site, make sure to install the ZDoom dependencies.
  2. Clone this repo
  3. Assuming you are using a venv, activate it and install packages listed in requirements.txt
  4. Test the installation with the following command, this should train an agent 100,000 frames in the basic health gathering scenario:
    python  3dcdrl/train_agent.py  --num_frames 100000

Note if you want to train this agent to convergence it takes between 5-10M frames.

Custom scenario generation

As detailed in the paper, there are a number of scenarios. We include a script generate_scenarios.sh in the repo that will generate the following scenarios:

  • Labyrinth: Sizes 5, 7, 9, 11, 13
  • Find return: Size 5, 7, 9, 11, 13
  • K-Item : 2, 4, 6, 8 items
  • Two color correlation: 10%, 30%, 50% and 70% walls retained.

This takes around 10 minutes so grab a coffee. If you wish to only generate for one scenario, take a look at the script it should be clear what you need to change.

Evaluating pretrained agents and training new ones on the Scenarios

We include pretrained models in the repo that you can test out, or you can train your own agents from scratch. The evaluation code will output example rollouts for all 64 test scenarios.

Labyrinth

Evaluation:

SIZE=9
python 3dcdrl/create_rollout_videos.py --limit_actions \
       --scenario_dir  3dcdrl/scenarios/custom_scenarios/labyrinth/$SIZE/test/ \
       --scenario custom_scenario{:003}.cfg  --model_checkpoint \
       3dcdrl/saved_models/labyrinth_$SIZE\_checkpoint_0198658048.pth.tar \
       --multimaze --num_mazes_test 64

Training:

SIZE=9
python  3dcdrl/train_agent.py --scenario custom_scenario{:003}.cfg \
        --limit_actions \
        --scenario_dir 3dcdrl/scenarios/custom_scenarios/labyrinth/$SIZE/train/ \
        --test_scenario_dir 3dcdrl/scenarios/custom_scenarios/labyrinth/$SIZE/test/ \
        --multimaze --num_mazes_train 256 --num_mazes_test 64 --fixed_scenario

Find and return

Evaluation:

SIZE=9
python 3dcdrl/create_rollout_videos.py --limit_actions \
       --scenario_dir  3dcdrl/scenarios/custom_scenarios/find_return/$SIZE/test/ \
       --scenario custom_scenario{:003}.cfg  --model_checkpoint \
       3dcdrl/saved_models/find_return_$SIZE\_checkpoint_0198658048.pth.tar \
       --multimaze --num_mazes_test 64

Training:

SIZE=9
python  3dcdrl/train_agent.py --scenario custom_scenario{:003}.cfg \
         --limit_actions \
        --scenario_dir 3dcdrl/scenarios/custom_scenarios/find_return/$SIZE/train/ \
        --test_scenario_dir 3dcdrl/scenarios/custom_scenarios/find_return/$SIZE/test/ \
        --multimaze --num_mazes_train 256 --num_mazes_test 64 --fixed_scenario

K-item

Evaluation:

NUM_ITEMS=4
python 3dcdrl/create_rollout_videos.py --limit_actions \
       --scenario_dir  3dcdrl/scenarios/custom_scenarios/kitem/$NUM_ITEM/test/ \
       --scenario custom_scenario{:003}.cfg  --model_checkpoint \
       3dcdrl/saved_models/$NUM_ITEMS\item_checkpoint_0198658048.pth.tar \
       --multimaze --num_mazes_test 64

Training:

NUM_ITEMS=4
python  3dcdrl/train_agent.py --scenario custom_scenario{:003}.cfg \
        --limit_actions \
        --scenario_dir 3dcdrl/scenarios/custom_scenarios/kitem/$NUM_ITEMS/train/ \
        --test_scenario_dir 3dcdrl/scenarios/custom_scenarios/kitem/$NUM_ITEMS/test/ \
        --multimaze --num_mazes_train 256 --num_mazes_test 64 --fixed_scenario

Two color correlation

Evaluation:

DIFFICULTY=3
python 3dcdrl/create_rollout_videos.py --limit_actions \
       --scenario_dir  3dcdrl/scenarios/custom_scenarios/two_color/$DIFFICULTY/$DIFFICULTY/test/ \
       --scenario custom_scenario{:003}.cfg  --model_checkpoint \
       3dcdrl/saved_models/two_col_p$DIFFICULTY\_checkpoint_0198658048.pth.tar \
       --multimaze --num_mazes_test 64

Training:

DIFFICULTY=3
python  3dcdrl/train_agent.py --scenario custom_scenario{:003}.cfg \
        --limit_actions \
        --scenario_dir 3dcdrl/scenarios/custom_scenarios/two_color/$DIFFICULTY/train/ \
        --test_scenario_dir 3dcdrl/scenarios/custom_scenarios/two_color/$DIFFICULTY/test/ \
        --multimaze --num_mazes_train 256 --num_mazes_test 64 --fixed_scenario

FAQ

Why is my FPS 4x lower than in your paper?

In the paper we report a frames per second in terms of envionrment interactions, the agents are trained with a frame skip of 4, which means for each observation the same action is repeated 4 times.

I have limited memory, is there anything I can do?

Yes, we have made a tradeoff between increased memory usage in order to increase performance, you can reduce the memory footprint by excluding --fixed scenario from the command line arguments. You will see a 10% drop in efficiency.

Citation

If you find this useful, consider citing the following:

@article{DBLP:journals/corr/abs-1904-01806,
  author    = {Edward Beeching and
               Christian Wolf and
               Jilles Dibangoye and
               Olivier Simonin},
  title     = {Deep Reinforcement Learning on a Budget: 3D Control and Reasoning
               Without a Supercomputer},
  journal   = {CoRR},
  volume    = {abs/1904.01806},
  year      = {2019},
  url       = {http://arxiv.org/abs/1904.01806},
  archivePrefix = {arXiv},
  eprint    = {1904.01806},
  timestamp = {Sat, 27 Apr 2019 15:09:06 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1904-01806},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

3d_control_deep_rl's People

Contributors

edbeeching avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.