GithubHelp home page GithubHelp logo

rl_games's Introduction

Basic RL Algorithms Implementations

  • Starcraft 2 Multiple Agents Results with PPO (https://github.com/oxwhirl/smac)
  • Every agent was controlled independently and has restricted information
  • All the environments were trained with a default difficulty level 7
  • No curriculum, just baseline PPO
  • Full state information wasn't used for critic, actor and critic recieved the same agent observations
  • Most results are significantly better by win rate and were trained on a single PC much faster than QMIX (https://arxiv.org/pdf/1902.04043.pdf), MAVEN (https://arxiv.org/pdf/1910.07483.pdf) or QTRAN
  • No hyperparameter search
  • 4 frames + conv1d actor-critic network
  • Miniepoch num was set to 1, higher numbers didn't work
  • Simple MLP networks didnot work good on hard envs

Watch the video

How to run configs:

Pytorch

  • python runner.py --train --file rl_games/configs/smac/3m_torch.yaml
  • python runner.py --play --file rl_games/configs/smac/3m_torch.yaml --checkpoint 'nn/3m_cnn'

Tensorflow

  • python runner.py --tf --train --file rl_games/configs/smac/3m_torch.yaml
  • python runner.py --tf --play --file rl_games/configs/smac/3m_torch.yaml --checkpoint 'nn/3m_cnn'
  • tensorboard --logdir runs

Results on some environments:

  • 2m_vs_1z took near 2 minutes to achive 100% WR
  • corridor took near 2 hours for 95+% WR
  • MMM2 4 hours for 90+% WR
  • 6h_vs_8z got 82% WR after 8 hours of training
  • 5m_vs_6m got 72% WR after 8 hours of training

Plots:

FPS in these plots is calculated on per env basis except MMM2 (it was scaled by number of agents which is 10), to get a win rate per number of environmental steps info, the same as used in plots in QMIX, MAVEN, QTRAN or Deep Coordination Graphs (https://arxiv.org/pdf/1910.00091.pdf) papers FPS numbers under the horizontal axis should be devided by number of agents in player's team.

  • 2m_vs_1z: 2m_vs_1z
  • 3s5z_vs_3s6z: 3s5z_vs_3s6z
  • 3s_vs_5z: 3s_vs_5z
  • corridor: corridor
  • 5m_vs_6m: 5m_vs_6m
  • MMM2: MMM2

Link to the continuous results

Currently Implemented:

  • DQN
  • Double DQN
  • Dueling DQN
  • Noisy DQN
  • N-Step DQN
  • Categorical
  • Rainbow DQN
  • A2C
  • PPO

Tensorflow implementations of the DQN atari.

  • Double dueling DQN vs DQN with the same parameters

alt text Near 90 minutes to learn with this setup.

  • Different DQN Configurations tests

Light grey is noisy 1-step dddqn. Noisy 3-step dddqn was even faster. Best network (configuration 5) needs near 20 minutes to learn, on NVIDIA 1080. Currently the best setup for pong is noisy 3-step double dueling network. In pong_runs.py different experiments could be found. Less then 200k frames to take score > 18. alt text DQN has more optimistic Q value estimations.

Other Games Results

This results are not stable. Just best games, for good average results you need to train network more then 10 million steps. Some games need 50m steps.

  • 5 million frames two step noisy double dueling dqn:

Watch the video

  • Random lucky game in Space Invaders after less then one hour learning:

Watch the video

A2C and PPO Results

  • More than 2 hours for Pong to achieve 20 score with one actor playing.
  • 8 Hours for Supermario lvl1

Watch the video

  • PPO with LSTM layers

Watch the video

alt text

rl_games's People

Contributors

densumy avatar denys88 avatar viktorm avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.