GithubHelp home page GithubHelp logo

rl-experiments's Introduction

RLlib Reference Results

Benchmarks of RLlib algorithms against published results. These benchmarks are a work in progress. For other results to compare against, see yarlp and more plots from OpenAI.

Ape-X Distributed Prioritized Experience Replay

rllib train -f atari-apex/atari-apex.yaml

Comparison of RLlib Ape-X to Async DQN after 10M time-steps (40M frames). Results compared to learning curves from Mnih et al, 2016 extracted at 10M time-steps from Figure 3.

env RLlib Ape-X 8-workers Mnih et al Async DQN 16-workers Mnih et al DQN 1-worker
BeamRider 6134 ~6000 ~3000
Breakout 123 ~50 ~10
QBert 15302 ~1200 ~500
SpaceInvaders 686 ~600 ~500

Here we use only eight workers per environment in order to run all experiments concurrently on a single g3.16xl machine. Further speedups may be obtained by using more workers. Comparing wall-time performance after 1 hour of training:

env RLlib Ape-X 8-workers Mnih et al Async DQN 16-workers Mnih et al DQN 1-worker
BeamRider 4873 ~1000 ~300
Breakout 77 ~10 ~1
QBert 4083 ~500 ~150
SpaceInvaders 646 ~300 ~160

Ape-X plots: apex

IMPALA and A2C

rllib train -f atari-impala/atari-impala.yaml

rllib train -f atari-a2c/atari-a2c.yaml

RLlib IMPALA and A2C on 10M time-steps (40M frames). Results compared to learning curves from Mnih et al, 2016 extracted at 10M time-steps from Figure 3.

env RLlib IMPALA 32-workers RLlib A2C 5-workers Mnih et al A3C 16-workers
BeamRider 2071 1401 ~3000
Breakout 385 374 ~150
QBert 4068 3620 ~1000
SpaceInvaders 719 692 ~600

IMPALA and A2C vs A3C after 1 hour of training:

env RLlib IMPALA 32-workers RLlib A2C 5-workers Mnih et al A3C 16-workers
BeamRider 3181 874 ~1000
Breakout 538 268 ~10
QBert 10850 1212 ~500
SpaceInvaders 843 518 ~300

IMPALA plots: tensorboard

A2C plots: tensorboard

Pong in 3 minutes

With a bit of tuning, RLlib IMPALA can solve Pong in ~3 minutes:

rllib train -f pong-speedrun/pong-impala-fast.yaml

tensorboard

DQN / Rainbow

rllib train -f atari-dqn/basic-dqn.yaml rllib train -f atari-dqn/duel-ddqn.yaml rllib train -f atari-dqn/dist-dqn.yaml

RLlib DQN after 10M time-steps (40M frames). Note that RLlib evaluation scores include the 1% random actions of epsilon-greedy exploration. You can expect slightly higher rewards when rolling out the policies without any exploration at all.

env RLlib Basic DQN RLlib Dueling DDQN RLlib Distributional DQN Hessel et al. DQN Hessel et al. Rainbow
BeamRider 2869 1910 4447 ~2000 ~13000
Breakout 287 312 410 ~150 ~300
QBert 3921 7968 15780 ~4000 ~20000
SpaceInvaders 650 1001 1025 ~500 ~2000

Basic DQN plots: tensorboard

Dueling DDQN plots: tensorboard

Distributional DQN plots: tensorboard

Proximal Policy Optimization

rllib train -f atari-ppo/atari-ppo.yaml

rllib train -f halfcheetah-ppo/halfcheetah-ppo.yaml

RLlib PPO with 10 workers after 10M and 25M time-steps (40M/100M frames). Note that RLlib does not use clip parameter annealing.

env RLlib PPO @10M RLlib PPO @25M Baselines PPO @10M
BeamRider 2807 4480 ~1800
Breakout 104 201 ~250
QBert 11085 14247 ~14000
SpaceInvaders 671 944 ~800

tensorboard

RLlib PPO wall-time performance vs other implementations using a single Titan XP and the same number of CPUs. Results compared to learning curves from Fan et al, 2018 extracted at 1 hour of training from Figure 7. Here we get optimal results with a vectorization of 32 environment instances per worker:

env RLlib PPO 16-workers Fan et al PPO 16-workers TF BatchPPO 16-workers
HalfCheetah 9664 ~7700 ~3200

tensorboard

Soft Actor Critic

rllib train -f halfcheetah-sac/halfcheetah-sac.yaml

RLlib SAC after 3M time-steps.

RLlib SAC versus SoftLearning implementation Haarnoja et al, 2018 benchmarked at 500k and 3M timesteps respectively.

env RLlib SAC @500K Haarnoja et al SAC @500K RLlib SAC @3M Haarnoja et al SAC @3M
HalfCheetah 9000 ~9000 13000 ~15000

tensorboard

rl-experiments's People

Contributors

ericl avatar michaelzhiluo avatar pcmoritz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.