GithubHelp home page GithubHelp logo

korymath / atari Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kaixhin/atari

0.0 2.0 1.0 345 KB

Persistent advantage learning dueling double DQN for the Arcade Learning Environment.

License: MIT License

Lua 96.39% Shell 3.61%

atari's Introduction

Atari Space Invader

Build Status MIT License Gitter

Work In Progress: Crossed out items have been partially implemented.

Prioritised experience replay [1] persistent advantage learning [2] bootstrapped [3] dueling [4] double [5] deep recurrent [6] Q-network [7] for the Arcade Learning Environment [8] (and custom environments). Or PERPALB(triple-D)RQN for short...

Additional asynchronous agents [9]:

  • One-step Sarsa
  • One-step Q-learning
  • N-step Q-learning
  • Advantage actor-critic

Run th main.lua to run headless, or qlua main.lua to display the game. The main options are -game to choose the ROM (see the ROM directory for more details) and -mode as either train or eval. Can visualise saliency maps [10], optionally using guided [11] or "deconvnet" [12] backpropagation. Saliency map modes are applied at runtime so that they can be applied retrospectively to saved models.

To run experiments based on hyperparameters specified in the individual papers, use ./run.sh <paper> <game> <args>. <args> can be used to overwrite arguments specified earlier (in the script); for more details see the script itself. By default the code trains on a demo environment called Catch - use ./run.sh demo to run the demo with good default parameters. Note that this code uses CUDA by default if available, but the Catch network is small enough that it runs faster on CPU.

In training mode if you want to quit using Ctrl+C then this will be caught and you will be asked if you would like to save the agent. Note that for non-asynchronous agents the experience replay memory will be included, totalling ~7GB. The main script also automatically saves the last weights (last.weights.t7) and the weights of the best performing DQN (according to the average validation score) (best.weights.t7).

In evaluation mode you can create recordings with -record true (requires FFmpeg); this does not require using qlua. Recordings will be stored in the videos directory.

Requirements

Requires Torch7, and uses CUDA if available. Also requires the following extra luarocks packages:

  • luaposix
  • moses
  • logroll
  • classic
  • torchx
  • rnn
  • dpnn
  • nninit
  • tds
  • xitari
  • alewrap
  • rlenvs

xitari, alewrap and rlenvs can be installed using the following commands:

luarocks install https://raw.githubusercontent.com/lake4790k/xitari/master/xitari-0-0.rockspec
luarocks install https://raw.githubusercontent.com/Kaixhin/alewrap/master/alewrap-0-0.rockspec
luarocks install https://raw.githubusercontent.com/Kaixhin/rlenvs/master/rocks/rlenvs-scm-1.rockspec

Custom

You can use a custom environment (as the path to a Lua file/rlenvs-namespaced environment) using -env, as long as the class returned respects the rlenvs API. One restriction is that the state must be represented as a single tensor (with arbitrary dimensionality), and only a single discrete action must be returned. To prevent massive memory consumption for agents that use experience replay memory, states are discretised to integers โˆˆ [0, 255], assuming the state is comprised of reals โˆˆ [0, 1] - this can be disabled with -discretiseMem false. Visual environments can make use of explicit -height, -width and -colorSpace options to perform preprocessing for the network.

If the environment has separate behaviour during training and testing it should also implement training and evaluate methods - otherwise these will be added as empty methods during runtime. The environment can also implement a getDisplay method (with a mandatory getDisplaySpec method for determining screen size) which will be used for displaying the screen/computing saliency maps, where getDisplay must return a RGB (3D) tensor; this can also be utilised even if the state is not an image (although saliency can only be computed for states that are images). This must be implemented to have a visual display/computing saliency maps. The -zoom factor can be used to increase the size of small displays.

You can also use a custom model (body) with -modelBody, which replaces the usual DQN convolutional layers with a custom Torch neural network (as the path to a Lua file). The model will receive a stack of the previous states (as determined by -histLen), and must reshape them manually if needed. The DQN "heads" will then be constructed as normal, with -hiddenSize used to change the size of the fully connected layer if needed.

For an example on a GridWorld environment, run ./run.sh demo-grid - the demo also works with qlua and experience replay agents. The custom environment and network can be found in the examples folder.

Acknowledgements

  • @GeorgOstrovski for confirmation on network usage in advantage operators + note on interaction with Double DQN.
  • @schaul for clarifications on prioritised experience replay + dueling DQN hyperparameters.

References

[1] Prioritized Experience Replay
[2] Increasing the Action Gap: New Operators for Reinforcement Learning
[3] Deep Exploration via Bootstrapped DQN
[4] Dueling Network Architectures for Deep Reinforcement Learning
[5] Deep Reinforcement Learning with Double Q-learning
[6] Deep Recurrent Q-Learning for Partially Observable MDPs
[7] Playing Atari with Deep Reinforcement Learning
[8] The Arcade Learning Environment: An Evaluation Platform for General Agents
[9] Asynchronous Methods for Deep Reinforcement Learning
[10] Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
[11] Striving for Simplicity: The All Convolutional Net
[12] Visualizing and Understanding Convolutional Networks

atari's People

Contributors

gitter-badger avatar joostvdoorn avatar kaixhin avatar lake4790k avatar mryellow avatar

Watchers

 avatar  avatar

Forkers

jiths

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.