Open RL Benchmark is a comprehensive collection of tracked experiments for RL. It aims to make it easier for RL practitioners to pull and compare all kinds of metrics from reputable RL libraries like Stable-baselines3, Tianshou, CleanRL, and others.
- ๐พ GitHub Repo: source code and more docs.
- ๐ Design docs: our motivation and vision.
- ๐ Open RL Benchmark reports: W&B reports with tracked Atari, MuJoCo experiments from SB3, CleanRL, and others.
Warning This is the
0.1.1a5
alpha release. There has been a major refactoring. If you are looking for Pre-Alpha API, check out 43fc8e2. Expect breaking changes in the future.
You can install it via pip or the dev setup.
pip install openrlbenchmark
Prerequisites:
- Python >=3.7.1,<3.10 (not yet 3.10)
- Poetry 1.2.1+
git clone https://github.com/openrlbenchmark/openrlbenchmark.git
cd openrlbenchmark
poetry install
Open RL Benchmark provides an RLops CLI to pull and compare metrics from Weights and Biases. The following example shows how to compare the performance of SB3's ppo, a2c, ddpg, ppo_lstm, sac, td3, ppo, trpo, CleanRL's sac on HalfCheetahBulletEnv-v0
.
python -m openrlbenchmark.rlops \
--filters '?we=openrlbenchmark&wpn=sb3&ceik=env&cen=algo&metric=rollout/ep_rew_mean' \
'a2c' \
'ddpg' \
'ppo_lstm' \
'sac' \
'td3' \
'ppo' \
'trpo' \
--filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return' \
'sac_continuous_action?tag=rlops-pilot' \
--env-ids HalfCheetahBulletEnv-v0 \
--ncols 1 \
--ncols-legend 2 \
--output-filename compare.png \
--report
Here, we created multiple filters. The first string in the first filter is '?we=openrlbenchmark&wpn=sb3&ceik=env&cen=algo&metric=rollout/ep_rew_mean'
, which is a query string that specifies the following:
we
: the W&B entity namewpn
: the W&B project nameceik
: the custom key for the environment idcen
: the custom key for the experiment namemetric
: the metric we are interested in
So we are fetching metrics from https://wandb.ai/openrlbenchmark/sb3. The environment id is stored in the env
key, and the experiment name is stored in the algo
key. The metric we are interested in is rollout/ep_rew_mean
.
Similarly, we are fetching metrics from https://wandb.ai/openrlbenchmark/cleanrl. The environment id is stored in the env_id
key, and the experiment name is stored in the exp_name
key. The metric we are interested in is charts/episodic_return
.
The command above generates the following plot:
The --report
tag also generates a wandb report
The command also generates a compare.png
, a compare.md
, and a compare.csv
in the current directory.
Learning curves: the compare.png
shows the learning curves which subsamples 10000 data points and and interpolate. The curves are smoothed by a rolling average with a window size 100 and their shaded region represents the standard deviation.
Result table: the compare.md
and compare.csv
shows the average episodic return of the last 100 episodes. For each random seed
Warning You may get slightly different curves every time you run the commands. This is because we sample 500 data points from the track experiments to save bandwidth. You can, however, get all of the data points by running the command with
--scan-history
which will always give you the same curves but will take longer to run. For generating an accuratecompare.md
, you should always use--scan-history
.
- CleanRL
ceik
:env_id
cen
:exp_name
(e.g.,sac_continuous_action
,ppo_continuous_action
,ppo_atari
)metric
:charts/episodic_return
- Stable-baselines3
ceik
:env
cen
:algo
(e.g.,sac
,ppo
,a2c
)metric
:rollout/ep_rew_mean
oreval/mean_reward
- ikostrikov/jaxrl
ceik
:env_name
cen
:algo
(e.g.,sac
)metric
:training/return
orevaluation/average_returns
- baselines
ceik
:env
cen
:alg
(e.g.,ppo2
)metric
:charts/episodic_return
oreprewmean
- sbx
ceik
:env
cen
:alg
(e.g.,sac
,tqc
)metric
:rollout/ep_rew_mean
oreval/mean_reward
- Tianshou
ceik
:task
cen
:algo_name
(e.g.,ppo
,iqn
)metric
:test/reward
The following libraries have some recorded experiments:
- openai/phasic-policy-gradient (has some metrics)
ceik
:env_name
cen
:arch
(shared
)metric
:charts/episodic_return
- sfujim/TD3 (has some metrics)
ceik
:env
cen
:policy
(e.g.,TD3
)metric
:charts/episodic_return
python -m openrlbenchmark.rlops \
--filters '?we=openrlbenchmark&wpn=baselines&ceik=env&cen=exp_name&metric=charts/episodic_return' 'baselines-ppo2-cnn' \
--filters '?we=openrlbenchmark&wpn=envpool-atari&ceik=env_id&cen=exp_name&metric=charts/avg_episodic_return' 'ppo_atari_envpool_xla_jax_truncation' \
--env-ids Alien-v5 Amidar-v5 Assault-v5 Asterix-v5 Asteroids-v5 Atlantis-v5 BankHeist-v5 BattleZone-v5 BeamRider-v5 Berzerk-v5 Bowling-v5 Boxing-v5 Breakout-v5 Centipede-v5 ChopperCommand-v5 CrazyClimber-v5 Defender-v5 DemonAttack-v5 DoubleDunk-v5 Enduro-v5 FishingDerby-v5 Freeway-v5 Frostbite-v5 Gopher-v5 Gravitar-v5 Hero-v5 IceHockey-v5 PrivateEye-v5 Qbert-v5 Riverraid-v5 RoadRunner-v5 Robotank-v5 Seaquest-v5 Skiing-v5 Solaris-v5 SpaceInvaders-v5 StarGunner-v5 Surround-v5 Tennis-v5 TimePilot-v5 Tutankham-v5 UpNDown-v5 Venture-v5 VideoPinball-v5 WizardOfWor-v5 YarsRevenge-v5 Zaxxon-v5 Jamesbond-v5 Kangaroo-v5 Krull-v5 KungFuMaster-v5 MontezumaRevenge-v5 MsPacman-v5 NameThisGame-v5 Phoenix-v5 Pitfall-v5 Pong-v5 \
--check-empty-runs False \
--ncols 5 \
--ncols-legend 2 \
--output-filename static/cleanrl_vs_baselines \
--scan-history
python -m openrlbenchmark.rlops \
--filters '?we=openrlbenchmark&wpn=baselines&ceik=env&cen=exp_name&metric=charts/episodic_return' 'baselines-ppo2-mlp' \
--filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return' 'ppo_continuous_action?tag=v1.0.0-27-gde3f410' \
--filters '?we=openrlbenchmark&wpn=jaxrl&ceik=env_name&cen=algo&metric=training/return' 'sac' \
--env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 InvertedPendulum-v2 Humanoid-v2 Pusher-v2 \
--check-empty-runs False \
--ncols 3 \
--ncols-legend 3 \
--output-filename static/baselines_vs_cleanrl_vs_jaxrl \
--scan-history
python -m openrlbenchmark.rlops \
--filters '?we=openrlbenchmark&wpn=baselines&ceik=env&cen=exp_name&metric=charts/episodic_return' 'baselines-ppo2-cnn' \
--filters '?we=openrlbenchmark&wpn=envpool-atari&ceik=env_id&cen=exp_name&metric=charts/avg_episodic_return' 'ppo_atari_envpool_xla_jax_truncation' \
--filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/avg_episodic_return' 'ppo_atari_envpool_xla_jax_scan?tag=pr-328' \
--env-ids BeamRider-v5 Breakout-v5 Pong-v5 \
--check-empty-runs False \
--ncols 3 \
--ncols-legend 2 \
--output-filename compare \
--scan-history
python -m openrlbenchmark.rlops \
--filters '?we=tianshou&wpn=atari.benchmark&ceik=task&cen=algo_name&metric=test/reward' 'iqn' 'ppo' 'rainbow' 'fqf' 'c51' 'dqn' 'qrdqn' \
--filters '?we=openrlbenchmark&wpn=baselines&ceik=env&cen=exp_name&metric=charts/episodic_return' 'baselines-ppo2-cnn' \
--env-ids BreakoutNoFrameskip-v4 SpaceInvadersNoFrameskip-v4 SeaquestNoFrameskip-v4 MsPacmanNoFrameskip-v4 EnduroNoFrameskip-v4 PongNoFrameskip-v4 QbertNoFrameskip-v4 \
--check-empty-runs False \
--ncols 4 \
--ncols-legend 4 \
--output-filename static/baselines_vs_tianshou --scan-history
python -m openrlbenchmark.rlops \
--filters '?we=openrlbenchmark&wpn=phasic-policy-gradient&ceik=env_name&cen=arch&metric=charts/episodic_return' 'shared' \
--filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return' 'ppo_procgen?tag=v1.0.0b1-4-g4ea73d9' 'ppg_procgen?tag=v1.0.0b1-4-g4ea73d9' \
--env-ids starpilot bossfight bigfish \
--check-empty-runs False \
--ncols 3 \
--ncols-legend 3 \
--output-filename static/ppg_vs_cleanrl \
--scan-history
python -m openrlbenchmark.rlops \
--filters '?we=openrlbenchmark&wpn=sfujim-TD3&ceik=env&cen=policy&metric=charts/episodic_return' 'TD3' \
--filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return' 'td3_continuous_action_jax?tag=pr-285' 'ddpg_continuous_action_jax?tag=pr-298' \
--env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 \
--check-empty-runs False \
--ncols 3 \
--ncols-legend 3 \
--output-filename static/td3_vs_cleanrl \
--scan-history
Compare CleanRL's PPO + JAX + EnvPool's XLA with openai/baselines
's Atari wrappers vs Machado's recommendetation for Atari (e.g., sticky action with probability 0.25). The machado's experiments ran for 10M steps, which corresponds to 40M frames.
python -m openrlbenchmark.rlops \
--filters '?we=openrlbenchmark&wpn=envpool-atari&ceik=env_id&cen=exp_name&metric=charts/avg_episodic_return' 'ppo_atari_envpool_xla_jax_truncation_machado_10M' 'ppo_atari_envpool_xla_jax_truncation' \
--env-ids Alien-v5 Amidar-v5 Assault-v5 Asterix-v5 Asteroids-v5 Atlantis-v5 BankHeist-v5 BattleZone-v5 BeamRider-v5 Berzerk-v5 Bowling-v5 Boxing-v5 Breakout-v5 Centipede-v5 ChopperCommand-v5 CrazyClimber-v5 Defender-v5 DemonAttack-v5 DoubleDunk-v5 Enduro-v5 FishingDerby-v5 Freeway-v5 Frostbite-v5 Gopher-v5 Gravitar-v5 Hero-v5 IceHockey-v5 PrivateEye-v5 Qbert-v5 Riverraid-v5 RoadRunner-v5 Robotank-v5 Seaquest-v5 Skiing-v5 Solaris-v5 SpaceInvaders-v5 StarGunner-v5 Surround-v5 Tennis-v5 TimePilot-v5 Tutankham-v5 UpNDown-v5 Venture-v5 VideoPinball-v5 WizardOfWor-v5 YarsRevenge-v5 Zaxxon-v5 Jamesbond-v5 Kangaroo-v5 Krull-v5 KungFuMaster-v5 MontezumaRevenge-v5 MsPacman-v5 NameThisGame-v5 Phoenix-v5 Pitfall-v5 Pong-v5 \
--check-empty-runs False \
--ncols 5 \
--ncols-legend 2 \
--output-filename static/machado_10M \
--scan-history
Compare CleanRL's PPO + JAX + EnvPool's XLA with openai/baselines
's Atari wrappers vs Machado's recommendetation for Atari (e.g., sticky action with probability 0.25). The machado's experiments ran for 50M steps, which corresponds to 200M frames.
python -m openrlbenchmark.rlops \
--filters '?we=openrlbenchmark&wpn=envpool-atari&ceik=env_id&cen=exp_name&metric=charts/avg_episodic_return' 'ppo_atari_envpool_xla_jax_truncation_machado' 'ppo_atari_envpool_xla_jax_truncation' \
--env-ids Alien-v5 Amidar-v5 Assault-v5 Asterix-v5 Asteroids-v5 Atlantis-v5 BankHeist-v5 BattleZone-v5 BeamRider-v5 Berzerk-v5 Bowling-v5 Boxing-v5 Breakout-v5 Centipede-v5 ChopperCommand-v5 CrazyClimber-v5 Defender-v5 DemonAttack-v5 DoubleDunk-v5 Enduro-v5 FishingDerby-v5 Freeway-v5 Frostbite-v5 Gopher-v5 Gravitar-v5 Hero-v5 IceHockey-v5 PrivateEye-v5 Qbert-v5 Riverraid-v5 RoadRunner-v5 Robotank-v5 Seaquest-v5 Skiing-v5 Solaris-v5 SpaceInvaders-v5 StarGunner-v5 Surround-v5 Tennis-v5 TimePilot-v5 Tutankham-v5 UpNDown-v5 Venture-v5 VideoPinball-v5 WizardOfWor-v5 YarsRevenge-v5 Zaxxon-v5 Jamesbond-v5 Kangaroo-v5 Krull-v5 KungFuMaster-v5 MontezumaRevenge-v5 MsPacman-v5 NameThisGame-v5 Phoenix-v5 Pitfall-v5 Pong-v5 \
--check-empty-runs False \
--ncols 5 \
--ncols-legend 2 \
--output-filename static/machado_50M \
--scan-history
python -m openrlbenchmark.hns --files static/cleanrl_vs_baselines.csv static/machado_10M.csv static/machado_50M.csv
baselines-ppo2-cnn ({})
โฃโโ median hns: 0.7959851540635047
โฃโโ mean hns: 4.54588939893709
ppo_atari_envpool_xla_jax_truncation ({})
โฃโโ median hns: 0.9783505154639175
โฃโโ mean hns: 6.841083973256849
ppo_atari_envpool_xla_jax_truncation_machado_10M ({})
โฃโโ median hns: 0.7347972972972973
โฃโโ mean hns: 2.919095857954249
ppo_atari_envpool_xla_jax_truncation ({'metric': ['charts/avg_episodic_return']})
โฃโโ median hns: 0.9783505154639175
โฃโโ mean hns: 6.841083973256849
ppo_atari_envpool_xla_jax_truncation_machado ({'metric': ['charts/avg_episodic_return']})
โฃโโ median hns: 1.5679929625118418
โฃโโ mean hns: 8.352308370550299
This is a project we are slowly working on. There is no specific timeline or roadmap, but if you want to get involved. Feel free to reach out to me or open an issue. We are looking for volunteers to help us with the following:
- Add experiments from other libraries
- Run more experiments from currently supported libraries
- Documentation and designing standards
- Download the tensorboard metrics from the tracked experiments and load them locally to save time
pip install --upgrade openrlbenchmark
python -m openrlbenchmark.rlops \
--filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return' 'dqn_atari?tag=latest' 'c51_atari?tag=latest' \
--filters '?we=openrlbenchmark&wpn=baselines&ceik=env&cen=exp_name&metric=charts/episodic_return' 'baselines-ppo2-cnn' \
--filters '?we=openrlbenchmark&wpn=envpool-atari&ceik=env_id&cen=exp_name&metric=charts/avg_episodic_return' 'ppo_atari_envpool_xla_jax_truncation?user=costa-huang' \
--env-ids BeamRider-v5 Breakout-v5 Pong-v5 \
--ncols 3 \
--ncols-legend 2 \
--output-filename compare \
--scan-history \
--report \
--check-empty-runs False