moabitcoin / cherry-pytorch Goto Github PK

Reinforcement Learning Tutorials & other bedtime stories in PyTorch

License: MIT License

Python 98.38% Makefile 1.62%

agents atari ddpg-pytorch ddqn-pyotrch deep-learning deep-reinforcement-learning dqn-pytorch pytorch pytorch-tutorial reinforcement-learning vizdoom vpg

cherry-pytorch's People

Contributors

Stargazers

Watchers

Forkers

vishalbelsare

cherry-pytorch's Issues

Include deepmind style Atari wrapper

As default pre-processing

Change Model API to include properties

Adapt agents to the following API proposal

Advantage Actor Critic

Implement A2C for Atari / Doom games. Inspiration here and here

Dockerhub support

Currently we don't have push rights to moabitcoin/cherry-torch-<cpu/gpu>. Its recommended to build from source. We want to support Docker pull/run

Using wrappers monitor to write Atari video

Currently we use scikit video to write env states to video for visualising the results. We want to move to gym.wrappers.Monitor to do the same.

Sample code from here

Doom optimiser bug fix

We could not re-produce doom results after porting learnings from Atari
This bug fix was due to BN layers in Doom which need RMSprop as the optimiser

Build optimised replay buffer for Doom/Atari in DQN & DDQN

We want to port learnings from Atari + DQN into Agent of Doom

Optimized replay buffer
Lazy loading of tensors into GPU

Add verbosity levels to training/testing the agent

Currently all agents are training with logging.INFO logging enabled. We'd like to expose user configured logging level enabled from CLI

  parser = argparse.ArgumentParser('Train an RL Agent to'
                                   ' play Atari Game (DQN)')
  parser.add_argument('-x', dest='config_file', type=str,
                      help='Config for the Atari env/agent', required=True)
  parser.add_argument('-d', dest='device', choices=['gpu', 'cpu'],
                      help='Device to run the train/test', default='gpu')
  parser.add_argument('-v', dest='verbosity', choices=['info', 'debug', 'warning, 
'error'],
                      help='Logging verbosity level ', default='info')

Wrap VizDoom with OpenAI gym

Using instructions here. Modify DoomEnvironment to mimic OpenAI gym wrapper

Vanilla policy gradient (with baseline) for Atari/Doom

Import learning for Policy Gradients & Actor Critic for ALE (Arcade Learning Environment)

Track code version with model version

Currently we don't track code version with the model version this can lead to inability to re-produce results.

Use git python package to get commit tag.
Add git commit tag as a prefix when saving model files
Add git commit tag to logs
Add git commit tags to yaml config files at exp_dir

Include Doom vars into DoomEnvironment

Currently DoomEnvironment gets default setting from the *.cfg/*.wad files. The default rewards from the DoomGame do not account for any change to the reward based on any changes to GameVariable.

We would like the DoomEnvironment to be configurable via config yaml file to account for changes to rewards based on available GameVariable.

F.ex include GameVariable.AMMO in basic.cfg to budget weapon fire. and GameVariable.HEALTH for health_gathering.cfg to incentivise gathering of medkits

Docker support

Currently we only support using cherry within a conda 🐍 environment. We'd like to add support for building/running docker image. Following functionalities are handy

Building docker image for cherry
Publishing docker image on Docker Hub

Sample setup from here

Documentation on Agents & Environments

We need explanation on the design principles and documentations of agents/model architectures/environment

Port optimised memory buffer into Atari/Doom for DDQN

We want to port the following changes from DQN into DDQN

Optimised memory buffer for replay
Using wrapper_deepmind for Atari games

Optimize buffer for Doom

Bring in learnings from Atari into Doom for faster learning

Training Atari games with VPG

We currently train Atari models with DQN + DDQN. VPG (Vanilla Policy Gradient) have shown to be a better structured agent (as tested on Control problems / Health gathering in Doom).

Generate model + hyper params which can solve Breakout Atari retro game

PPO/TRPO

We want to support Trusted Region Policy Optimisation and Proximal Policy Optimisation

Realtime render Doom/Atari playtime

Currently at playtime the gameplay + actions are written to a video file. We'd like an option for viewing the current gameplay rendered in realtime for debugging.

Using PyTorch Dataset to wrap around the memory buffer

Use PyTorch dataloader class to read from buffer/replay than running sequential over the batch

DDPG support

Implement ideas from here

Optimize replay buffer

Currently we keep 2 * 4 frame states in buffer this can be reduced to 5 with first 4 frames the current state and last 4 the next state

TensorboardX support

Use TensorboardX for monitoring training / validation session

Health Gathering for Doom

Adapt DoomEnviroment and yaml config files to solve health_gathering scenario.

Travis CI setup

Since ours is an open source project we'd like to use Travis CI for building / testing a PR/commits. This would also push the built images to Docker Hub for distribution.

Sample usage from here

Major code restructure

Currently the codebase is organised as follows

├── ddqn
│   ├── atari
│   └── doom
├── dqn
│   ├── atari
│   └── doom
├── policy_gradients
│   ├── atari
│   ├── classic_control
│   └── doom
├── q-learning
└── utils

Rather than running via an entry scripts as we do currently. We'd like a cli : bin/cherry
We'd to like to re-organise as follow

├── agents
│   ├── dqn
│   └── ddqn
│   ├── q_learning
│   └── policy_gradient
├── envs
│   ├── atari
│   └── doom
│   └── classic_control
├── configs
│   ├── doom.yaml
│   └── atari.yaml
├── utils
└── bin/cherry

And run as follows

bin/cherry train -cfg configs/doom.yaml
bin/cherry play -cfg configs/doom.yaml

Policy gradients
Actor Critic models

moabitcoin / cherry-pytorch Goto Github PK

cherry-pytorch's People

Contributors

Stargazers

Watchers

Forkers

cherry-pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs