GithubHelp home page GithubHelp logo

dlr-rm / rl-baselines3-zoo Goto Github PK

View Code? Open in Web Editor NEW
1.8K 21.0 467.0 3.84 MB

A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.

Home Page: https://rl-baselines3-zoo.readthedocs.io

License: MIT License

Makefile 0.64% Dockerfile 0.27% Shell 0.72% Python 98.37%
rl reinforcement-learning stable-baselines openai gym pybullet hyperparameter-optimization hyperparameter-tuning hyperparameter-search optimization

rl-baselines3-zoo's Introduction

CI Documentation Status coverage report codestyle

RL Baselines3 Zoo: A Training Framework for Stable Baselines3 Reinforcement Learning Agents

RL Baselines3 Zoo is a training framework for Reinforcement Learning (RL), using Stable Baselines3.

It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.

In addition, it includes a collection of tuned hyperparameters for common environments and RL algorithms, and agents trained with those settings.

We are looking for contributors to complete the collection!

Goals of this repository:

  1. Provide a simple interface to train and enjoy RL agents
  2. Benchmark the different Reinforcement Learning algorithms
  3. Provide tuned hyperparameters for each environment and RL algorithm
  4. Have fun with the trained agents!

This is the SB3 version of the original SB2 rl-zoo.

Documentation

Documentation is available online: https://rl-baselines3-zoo.readthedocs.io/

Installation

Minimal installation

From source:

pip install -e .

As a python package:

pip install rl_zoo3

Note: you can do python -m rl_zoo3.train from any folder and you have access to rl_zoo3 command line interface, for instance, rl_zoo3 train is equivalent to python train.py

Full installation (with extra envs and test dependencies)

apt-get install swig cmake ffmpeg
pip install -r requirements.txt
pip install -e .[plots,tests]

Please see Stable Baselines3 documentation for alternatives to install stable baselines3.

Train an Agent

The hyperparameters for each environment are defined in hyperparameters/algo_name.yml.

If the environment exists in this file, then you can train an agent using:

python train.py --algo algo_name --env env_id

Evaluate the agent every 10000 steps using 10 episodes for evaluation (using only one evaluation env):

python train.py --algo sac --env HalfCheetahBulletEnv-v0 --eval-freq 10000 --eval-episodes 10 --n-eval-envs 1

More examples are available in the documentation.

Integrations

The RL Zoo has some integration with other libraries/services like Weights & Biases for experiment tracking or Hugging Face for storing/sharing trained models. You can find out more in the dedicated section of the documentation.

Plot Scripts

Please see the dedicated section of the documentation.

Enjoy a Trained Agent

Note: to download the repo with the trained agents, you must use git clone --recursive https://github.com/DLR-RM/rl-baselines3-zoo in order to clone the submodule too.

If the trained agent exists, then you can see it in action using:

python enjoy.py --algo algo_name --env env_id

For example, enjoy A2C on Breakout during 5000 timesteps:

python enjoy.py --algo a2c --env BreakoutNoFrameskip-v4 --folder rl-trained-agents/ -n 5000

Hyperparameters Tuning

Please see the dedicated section of the documentation.

Custom Configuration

Please see the dedicated section of the documentation.

Current Collection: 200+ Trained Agents!

Final performance of the trained agents can be found in benchmark.md. To compute them, simply run python -m rl_zoo3.benchmark.

List and videos of trained agents can be found on our Huggingface page: https://huggingface.co/sb3

NOTE: this is not a quantitative benchmark as it corresponds to only one run (cf issue #38). This benchmark is meant to check algorithm (maximal) performance, find potential bugs and also allow users to have access to pretrained agents.

Atari Games

7 atari games from OpenAI benchmark (NoFrameskip-v4 versions).

RL Algo BeamRider Breakout Enduro Pong Qbert Seaquest SpaceInvaders
A2C ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
PPO ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
DQN ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
QR-DQN ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️

Additional Atari Games (to be completed):

RL Algo MsPacman Asteroids RoadRunner
A2C ✔️ ✔️ ✔️
PPO ✔️ ✔️ ✔️
DQN ✔️ ✔️ ✔️
QR-DQN ✔️ ✔️ ✔️

Classic Control Environments

RL Algo CartPole-v1 MountainCar-v0 Acrobot-v1 Pendulum-v1 MountainCarContinuous-v0
ARS ✔️ ✔️ ✔️ ✔️ ✔️
A2C ✔️ ✔️ ✔️ ✔️ ✔️
PPO ✔️ ✔️ ✔️ ✔️ ✔️
DQN ✔️ ✔️ ✔️ N/A N/A
QR-DQN ✔️ ✔️ ✔️ N/A N/A
DDPG N/A N/A N/A ✔️ ✔️
SAC N/A N/A N/A ✔️ ✔️
TD3 N/A N/A N/A ✔️ ✔️
TQC N/A N/A N/A ✔️ ✔️
TRPO ✔️ ✔️ ✔️ ✔️ ✔️

Box2D Environments

RL Algo BipedalWalker-v3 LunarLander-v2 LunarLanderContinuous-v2 BipedalWalkerHardcore-v3 CarRacing-v0
ARS ✔️ ✔️
A2C ✔️ ✔️ ✔️ ✔️
PPO ✔️ ✔️ ✔️ ✔️
DQN N/A ✔️ N/A N/A N/A
QR-DQN N/A ✔️ N/A N/A N/A
DDPG ✔️ N/A ✔️
SAC ✔️ N/A ✔️ ✔️
TD3 ✔️ N/A ✔️ ✔️
TQC ✔️ N/A ✔️ ✔️
TRPO ✔️ ✔️

PyBullet Environments

See https://github.com/bulletphysics/bullet3/tree/master/examples/pybullet/gym/pybullet_envs. Similar to MuJoCo Envs but with a free (MuJoCo 2.1.0+ is now free!) easy to install simulator: pybullet. We are using BulletEnv-v0 version.

Note: those environments are derived from Roboschool and are harder than the Mujoco version (see Pybullet issue)

RL Algo Walker2D HalfCheetah Ant Reacher Hopper Humanoid
ARS
A2C ✔️ ✔️ ✔️ ✔️ ✔️
PPO ✔️ ✔️ ✔️ ✔️ ✔️
DDPG ✔️ ✔️ ✔️ ✔️ ✔️
SAC ✔️ ✔️ ✔️ ✔️ ✔️
TD3 ✔️ ✔️ ✔️ ✔️ ✔️
TQC ✔️ ✔️ ✔️ ✔️ ✔️
TRPO ✔️ ✔️ ✔️ ✔️ ✔️

PyBullet Envs (Continued)

RL Algo Minitaur MinitaurDuck InvertedDoublePendulum InvertedPendulumSwingup
A2C
PPO
DDPG
SAC
TD3
TQC

MuJoCo Environments

RL Algo Walker2d HalfCheetah Ant Swimmer Hopper Humanoid
ARS ✔️ ✔️ ✔️ ✔️ ✔️
A2C ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
PPO ✔️ ✔️ ✔️ ✔️ ✔️
DDPG
SAC ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
TD3 ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
TQC ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
TRPO ✔️ ✔️ ✔️ ✔️ ✔️

Robotics Environments

See https://gym.openai.com/envs/#robotics and #71

MuJoCo version: 1.50.1.0 Gym version: 0.18.0

We used the v1 environments.

RL Algo FetchReach FetchPickAndPlace FetchPush FetchSlide
HER+TQC ✔️ ✔️ ✔️ ✔️

Panda robot Environments

See https://github.com/qgallouedec/panda-gym/.

Similar to MuJoCo Robotics Envs but with a free easy to install simulator: pybullet.

We used the v1 environments.

RL Algo PandaReach PandaPickAndPlace PandaPush PandaSlide PandaStack
HER+TQC ✔️ ✔️ ✔️ ✔️ ✔️

MiniGrid Envs

See https://github.com/Farama-Foundation/Minigrid. A simple, lightweight and fast Gym environments implementation of the famous gridworld.

RL Algo Empty-Random-5x5 FourRooms DoorKey-5x5 MultiRoom-N4-S5 Fetch-5x5-N2 GoToDoor-5x5 PutNear-6x6-N2 RedBlueDoors-6x6 LockedRoom KeyCorridorS3R1 Unlock ObstructedMaze-2Dlh
A2C
PPO ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
DQN
QR-DQN
TRPO

There are 22 environment groups (variations for each) in total.

Colab Notebook: Try it Online!

You can train agents online using Colab notebook.

Passing arguments in an interactive session

The zoo is not meant to be executed from an interactive session (e.g: Jupyter Notebooks, IPython), however, it can be done by modifying sys.argv and adding the desired arguments.

Example

import sys
from rl_zoo3.train import train

sys.argv = ["python", "--algo", "ppo", "--env", "MountainCar-v0"]

train()

Tests

To run tests, first install pytest, then:

make pytest

Same for type checking with pytype:

make type

Citing the Project

To cite this repository in publications:

@misc{rl-zoo3,
  author = {Raffin, Antonin},
  title = {RL Baselines3 Zoo},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/DLR-RM/rl-baselines3-zoo}},
}

Contributing

If you trained an agent that is not present in the RL Zoo, please submit a Pull Request (containing the hyperparameters and the score too).

Contributors

We would like to thank our contributors: @iandanforth, @tatsubori @Shade5 @mcres, @ernestum, @qgallouedec

rl-baselines3-zoo's People

Contributors

alperenunlu avatar araffin avatar blurlake avatar cboettig avatar cyprienc avatar eric-y-chen avatar ernestum avatar gregwar avatar guspan-tanadi avatar jkterry1 avatar johannesul avatar kant avatar manifoldfr avatar mbertheau avatar mcres avatar nikhilrayaprolu avatar pchalasani avatar qgallouedec avatar rick-v-e avatar salmannotkhan avatar sammyramone avatar scottemmons avatar sgillen avatar simoninithomas avatar sonsang avatar technocrat13 avatar tobirohrer avatar toshikwa avatar vinayhajare avatar vwxyzjn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rl-baselines3-zoo's Issues

[question] Cannot enjoy the trained agents.

After cloning the rl-baselines3-zoo, I was trying to train my own agent.
By :
python train.py --algo algo_name --env env_id
After that, I used
python enjoy.py --algo td3 --env AntBulletEnv-v0 -f logs/
However I got the following problem

Loading latest experiment, id=2 pybullet build time: Jun 2 2020 06:49:02 
/home/dell/gym/gym/logger.py:30: UserWarning: 
WARN: Box bound precision lowered by casting to float32 warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow')) 
Process ForkServerProcess-1: Traceback (most recent call last):
 
File "/home/dell/anaconda3/lib/python3.8/multiprocessing/process.py", 
line 315, in _bootstrap self.run() 

File "/home/dell/anaconda3/lib/python3.8/multiprocessing/process.py", 
line 108, in run self._target(*self._args, **self._kwargs) File "/home/dell/stable-baselines3/stable_baselines3/common/vec_env/subproc_vec_env.py", 
line 42, in _worker 
**raise NotImplementedError
 NotImplemented Error**

 Traceback (most recent call last): File "enjoy.py", 
line 201, in <module> main() File "enjoy.py", line 116, in main
 model = ALGOS[algo].load(model_path, env=env) 
File "/home/dell/stable-baselines3/stable_baselines3/common/base_class.py",
 line 362, in load model._setup_model() 
File "/home/dell/stable-baselines3/stable_baselines3/td3/td3.py",
 line 95, in _setup_model super(TD3, self)._setup_model() 
File "/home/dell/stable-baselines3/stable_baselines3/common/base_class.py", 
line 729, in _setup_model self.set_random_seed(self.seed)
 File "/home/dell/stable-baselines3/stable_baselines3/common/base_class.py", 
line 461, in set_random_seed self.env.seed(seed) File "/home/dell/stable-baselines3/stable_baselines3/common/vec_env/subproc_vec_env.py",
 line 112, in seed return [remote.recv() for remote in self.remotes] 
File "/home/dell/stable-baselines3/stable_baselines3/common/vec_env/subproc_vec_env.py", 
line 112, in <listcomp> return [remote.recv() for remote in self.remotes] File "/home/dell/anaconda3/lib/python3.8/multiprocessing/connection.py", 
line 250, in recv buf = self._recv_bytes() 
File "/home/dell/anaconda3/lib/python3.8/multiprocessing/connection.py", 
line 414, in _recv_bytes buf = self._recv(4) 
File "/home/dell/anaconda3/lib/python3.8/multiprocessing/connection.py", 
line 383, in _recv 
**raise EOFError 
EOFError**`

System Info

  • Stable Baselines3 was installed by pip
  • Ubuntu 18.04
  • Python 3.8.3
  • PyTorch 1.5.0
  • Gym 0.17.2
  • Pybullet 2.8.1
  • 2* Nvidia RTX 2080TI CUDA 10.2

I appreciate your help. Thank you in advance.

Colab rl-baselines-zoo Error while finding module specification for 'train.py'

Describe the bug
/usr/bin/python3: Error while finding module specification for 'train.py' (ModuleNotFoundError: __path__ attribute not found on 'train' while trying to find 'train.py')

Code example
Run the cell !python -m train.py --algo ppo --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 --sampler tpe --pruner median
in https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/rl-baselines-zoo.ipynb

[Question] Tuned hyperparameters not reproducing well

I've been having problems where I find hyperparameters that result in optimal policies on a custom environment during tuning, but these hyperparameters only ever achieve fairly mediocre performance when I train with them again and are rather unstable. Is there any way to prevent this or anything obvious that I might be doing wrong? My working plan to try to find better hyperparameters is training on an environment for ~5 times for each hyperparameter value and picking the median score from that.

Hyperparameter optimization of activation function

Describe the bug
When using hyperparameter optimization with PPO (but should be the same with other algorithms) Optuna throws following warning:
optuna/distributions.py:331: UserWarning: Choices for a categorical distribution should be a tuple of None, bool, int, float and str for persistent storage but contains <class 'torch.nn.modules.activation.Tanh'> which is of type type.
From Optuna documentation (https://readthedocs.org/projects/optuna/downloads/pdf/latest/):

Not all types are guaranteed to be compatible with all storages. It is recommended to restrict the types of the choices to None,bool,int,float and str

Code example
The bug can be reproduced by running train.py with optimization activated.
The problem comes from this line:

activation_fn = trial.suggest_categorical('activation_fn', [nn.Tanh, nn.ReLU])

I would propose to fix it similar to how the network structure is handled. First using just strings and then using the corresponding class afterwards

net_arch = trial.suggest_categorical('net_arch', ['small', 'medium'])

net_arch = {
'small': [dict(pi=[64, 64], vf=[64, 64])],
'medium': [dict(pi=[256, 256], vf=[256, 256])],
}[net_arch]

System Info
Python 3.7
Optuna installed via pip on version 1.4.0

How can one train pixel based algorithms by using the CnnPolicy? [question]

First: This is a wonderful and very instructive repo - thank you very much for creating it!

I would like to train a pixel-based policy for the LunarLander environment. How could this be done? I tried to specify the hyperparameters for the ppo algorithm in the file hyperparameters/ppo.yml as follows:

LunarLander-v2:
  env_wrapper:
    - gym.wrappers.resize_observation.ResizeObservation:
        shape: 64
  frame_stack: 4
  n_envs: 1
  n_timesteps: !!float 1e6
  policy: 'CnnPolicy'
  n_steps: 1024
  batch_size: 64
  gae_lambda: 0.98
  gamma: 0.999
  n_epochs: 4
  ent_coef: 0.01

When trying to start training the model with python train.py --algo ppo --env LunarLander-v2 I receive an assertion error:

AssertionError: You should use NatureCNN only with images not with Box(64, 256) (you are probably using `CnnPolicy` instead of `MlpPolicy`)

Can somebody kindly illustrate how to use a CNN as feature extractor in this case? I queried python train --help but didn't found any indication how I can render the environment and use the resulting images as state.

[Question] --n-jobs>1 workaround>

Hey, I'm working on creating a fork of this to create various tuned baselines for PettingZoo environments, and I've been running some initial experiments. Everything has worked as expected except one thing. Here's the command I've been testing with:

python3 train.py --algo ppo --env LunarLanderContinuous-v2 -n 2000000 -optimize --n-trials 1000 --n-jobs 2 --sampler tpe --pruner median --study-name lunar_lander_1 --storage mysql://[redacted]

I'm running this on a machine with 2 GPUs. However, only 1 job starts (despite --n-jobs=1). Do you guys just specify each job with CUDA_VISIBLE_DEVICES=1 or whatever and start a job for every single GPU, or is there something smarter that I can't find on Google?

(Also sorry for accidentally posting my last issue on the old repo)

[colab] cloudpickle error when evaluating pretrained agents

Describe the bug
When executing the colab demo I cannot enjoy (or create a video) of a pretrained agent.

Code example
!python enjoy.py --algo a2c --env CartPole-v1 --no-render --n-timesteps 5000
Results in the following output and error message:

Loading latest experiment, id=1
Traceback (most recent call last):
  File "enjoy.py", line 225, in <module>
    main()
  File "enjoy.py", line 155, in main
    model = ALGOS[algo].load(model_path, env=env, custom_objects=custom_objects, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/base_class.py", line 608, in load
    data, params, pytorch_variables = load_from_zip_file(path, device=device, custom_objects=custom_objects)
  File "/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/save_util.py", line 402, in load_from_zip_file
    data = json_to_data(json_data, custom_objects=custom_objects)
  File "/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/save_util.py", line 164, in json_to_data
    deserialized_object = cloudpickle.loads(base64_object)
  File "/usr/local/lib/python3.7/dist-packages/cloudpickle/cloudpickle_fast.py", line 395, in <module>
    class CloudPickler(Pickler):
  File "/usr/local/lib/python3.7/dist-packages/cloudpickle/cloudpickle_fast.py", line 418, in CloudPickler
    dispatch[types.CellType] = _cell_reduce
AttributeError: module 'types' has no attribute 'CellType'

A similar error message is given when trying to create a video of a pretrained agent.

System Info
Describe the characteristic of your environment:

  • Describe how Stable Baselines3 was installed (pip, docker, source, ...): pip (inside the colab notebook)
  • GPU models and configuration: no GPU
  • Python version: 3.7.10 (obtained by evaluating !python --version in the colab notebook)
  • PyTorch version: 1.8.1 (see below output)
  • Versions of any other relevant libraries (Pybullet, ...): see below (obtained by evaluating !python -m pip list in the colab notebook)
    Cloudpickle version: 1.3.0 (see below output)
Package                       Version       
----------------------------- --------------
absl-py                       0.12.0        
alabaster                     0.7.12        
albumentations                0.1.12        
alembic                       1.6.2         
altair                        4.1.0         
apipkg                        1.5           
appdirs                       1.4.4         
argon2-cffi                   20.1.0        
astor                         0.8.1         
astropy                       4.2.1         
astunparse                    1.6.3         
async-generator               1.10          
atari-py                      0.2.6         
atomicwrites                  1.4.0         
attrs                         20.3.0        
audioread                     2.1.9         
autograd                      1.3           
Babel                         2.9.0         
backcall                      0.2.0         
beautifulsoup4                4.6.3         
black                         21.5b1        
bleach                        3.3.0         
blis                          0.4.1         
bokeh                         2.3.1         
Bottleneck                    1.3.2         
box2d-py                      2.3.8         
branca                        0.4.2         
bs4                           0.0.1         
CacheControl                  0.12.6        
cachetools                    4.2.1         
catalogue                     1.0.0         
certifi                       2020.12.5     
cffi                          1.14.5        
chainer                       7.4.0         
chardet                       3.0.4         
click                         7.1.2         
cliff                         3.7.0         
cloudpickle                   1.3.0         
cmaes                         0.8.2         
cmake                         3.12.0        
cmd2                          1.5.0         
cmdstanpy                     0.9.5         
colorama                      0.4.4         
colorcet                      2.0.6         
colorlog                      5.0.1         
colorlover                    0.3.0         
community                     1.0.0b1       
contextlib2                   0.5.5         
convertdate                   2.3.2         
coverage                      5.5           
coveralls                     0.5           
crcmod                        1.7           
cufflinks                     0.17.3        
cvxopt                        1.2.6         
cvxpy                         1.0.31        
cycler                        0.10.0        
cymem                         2.0.5         
Cython                        0.29.22       
daft                          0.0.4         
dask                          2.12.0        
DataProperty                  0.50.1        
datascience                   0.10.6        
debugpy                       1.0.0         
decorator                     4.4.2         
defusedxml                    0.7.1         
descartes                     1.1.0         
dill                          0.3.3         
distributed                   1.25.3        
dlib                          19.18.0       
dm-tree                       0.1.6         
docopt                        0.6.2         
docutils                      0.17          
dopamine-rl                   1.0.5         
earthengine-api               0.1.260       
easydict                      1.9           
ecos                          2.0.7.post1   
editdistance                  0.5.3         
en-core-web-sm                2.2.5         
entrypoints                   0.3           
ephem                         3.7.7.1       
et-xmlfile                    1.0.1         
execnet                       1.8.0         
fa2                           0.3.5         
fancyimpute                   0.4.3         
fastai                        1.0.61        
fastdtw                       0.3.4         
fastprogress                  1.0.0         
fastrlock                     0.6           
fbprophet                     0.7.1         
feather-format                0.4.1         
filelock                      3.0.12        
firebase-admin                4.4.0         
fix-yahoo-finance             0.0.22        
flake8                        3.9.2         
Flask                         1.1.2         
flatbuffers                   1.12          
folium                        0.8.3         
future                        0.16.0        
gast                          0.3.3         
GDAL                          2.2.2         
gdown                         3.6.4         
gensim                        3.6.0         
geographiclib                 1.50          
geopy                         1.17.0        
gin-config                    0.4.0         
glob2                         0.7           
google                        2.0.3         
google-api-core               1.26.3        
google-api-python-client      1.12.8        
google-auth                   1.28.1        
google-auth-httplib2          0.0.4         
google-auth-oauthlib          0.4.4         
google-cloud-bigquery         1.21.0        
google-cloud-bigquery-storage 1.1.0         
google-cloud-core             1.0.3         
google-cloud-datastore        1.8.0         
google-cloud-firestore        1.7.0         
google-cloud-language         1.2.0         
google-cloud-storage          1.18.1        
google-cloud-translate        1.5.0         
google-colab                  1.0.0         
google-pasta                  0.2.0         
google-resumable-media        0.4.1         
googleapis-common-protos      1.53.0        
googledrivedownloader         0.4           
graphviz                      0.10.1        
greenlet                      1.0.0         
grpcio                        1.32.0        
gspread                       3.0.1         
gspread-dataframe             3.0.8         
gym                           0.17.3        
gym-minigrid                  1.0.2         
h5py                          2.10.0        
HeapDict                      1.0.1         
hijri-converter               2.1.1         
holidays                      0.10.5.2      
holoviews                     1.14.3        
html5lib                      1.0.1         
httpimport                    0.5.18        
httplib2                      0.17.4        
httplib2shim                  0.0.3         
humanize                      0.5.1         
hyperopt                      0.1.2         
ideep4py                      2.0.0.post3   
idna                          2.10          
imageio                       2.4.1         
imagesize                     1.2.0         
imbalanced-learn              0.4.3         
imblearn                      0.0           
imgaug                        0.2.9         
importlab                     0.6.1         
importlib-metadata            3.10.1        
importlib-resources           5.1.2         
imutils                       0.5.4         
inflect                       2.1.0         
iniconfig                     1.1.1         
intel-openmp                  2021.2.0      
intervaltree                  2.1.0         
ipykernel                     4.10.1        
ipython                       5.5.0         
ipython-genutils              0.2.0         
ipython-sql                   0.3.9         
ipywidgets                    7.6.3         
isort                         5.8.0         
itsdangerous                  1.1.0         
jax                           0.2.12        
jaxlib                        0.1.65+cuda110
jdcal                         1.4.1         
jedi                          0.18.0        
jieba                         0.42.1        
Jinja2                        2.11.3        
joblib                        1.0.1         
jpeg4py                       0.1.4         
jsonschema                    2.6.0         
jupyter                       1.0.0         
jupyter-client                5.3.5         
jupyter-console               5.2.0         
jupyter-core                  4.7.1         
jupyterlab-pygments           0.1.2         
jupyterlab-widgets            1.0.0         
kaggle                        1.5.12        
kapre                         0.1.3.1       
Keras                         2.4.3         
Keras-Preprocessing           1.1.2         
keras-vis                     0.4.1         
kiwisolver                    1.3.1         
knnimpute                     0.1.0         
korean-lunar-calendar         0.2.1         
librosa                       0.8.0         
lightgbm                      2.2.3         
livereload                    2.6.3         
llvmlite                      0.34.0        
lmdb                          0.99          
LunarCalendar                 0.0.9         
lxml                          4.2.6         
Mako                          1.1.4         
Markdown                      3.3.4         
MarkupSafe                    1.1.1         
matplotlib                    3.2.2         
matplotlib-inline             0.1.2         
matplotlib-venn               0.11.6        
mbstrdecoder                  1.0.1         
mccabe                        0.6.1         
missingno                     0.4.2         
mistune                       0.8.4         
mizani                        0.6.0         
mkl                           2019.0        
mlxtend                       0.14.0        
more-itertools                8.7.0         
moviepy                       0.2.3.5       
mpmath                        1.2.1         
msgfy                         0.1.0         
msgpack                       1.0.2         
multiprocess                  0.70.11.1     
multitasking                  0.0.9         
murmurhash                    1.0.5         
music21                       5.5.0         
mypy-extensions               0.4.3         
natsort                       5.5.0         
nbclient                      0.5.3         
nbconvert                     5.6.1         
nbformat                      5.1.3         
nest-asyncio                  1.5.1         
networkx                      2.5.1         
nibabel                       3.0.2         
ninja                         1.10.0.post2  
nltk                          3.2.5         
notebook                      5.3.1         
np-utils                      0.5.12.1      
numba                         0.51.2        
numexpr                       2.7.3         
numpy                         1.19.5        
nvidia-ml-py3                 7.352.0       
oauth2client                  4.1.3         
oauthlib                      3.1.0         
okgrade                       0.4.3         
opencv-contrib-python         4.1.2.30      
opencv-python                 4.1.2.30      
openpyxl                      2.5.9         
opt-einsum                    3.3.0         
optuna                        2.7.0         
osqp                          0.6.2.post0   
packaging                     20.9          
palettable                    3.3.0         
pandas                        1.1.5         
pandas-datareader             0.9.0         
pandas-gbq                    0.13.3        
pandas-profiling              1.4.1         
pandocfilters                 1.4.3         
panel                         0.11.2        
param                         1.10.1        
parso                         0.8.2         
pathlib                       1.0.1         
pathspec                      0.8.1         
pathvalidate                  2.4.1         
patsy                         0.5.1         
pbr                           5.6.0         
pexpect                       4.8.0         
pickleshare                   0.7.5         
Pillow                        7.1.2         
pip                           19.3.1        
pip-tools                     4.5.1         
plac                          1.1.3         
plotly                        4.4.1         
plotnine                      0.6.0         
pluggy                        0.7.1         
pooch                         1.3.0         
portpicker                    1.3.1         
prefetch-generator            1.0.1         
preshed                       3.0.5         
prettytable                   2.1.0         
progressbar2                  3.38.0        
prometheus-client             0.10.1        
promise                       2.3           
prompt-toolkit                1.0.18        
protobuf                      3.12.4        
psutil                        5.4.8         
psycopg2                      2.7.6.1       
ptyprocess                    0.7.0         
py                            1.10.0        
pyaml                         20.4.0        
pyarrow                       3.0.0         
pyasn1                        0.4.8         
pyasn1-modules                0.2.8         
pybullet                      3.1.7         
pycocotools                   2.0.2         
pycodestyle                   2.7.0         
pycparser                     2.20          
pyct                          0.4.8         
pydata-google-auth            1.1.0         
pydot                         1.3.0         
pydot-ng                      2.0.0         
pydotplus                     2.0.2         
PyDrive                       1.3.1         
pyemd                         0.5.1         
pyenchant                     3.2.0         
pyerfa                        1.7.2         
pyflakes                      2.3.1         
pyglet                        1.5.0         
Pygments                      2.6.1         
pygobject                     3.26.1        
pymc3                         3.7           
PyMeeus                       0.5.11        
pymongo                       3.11.3        
pymystem3                     0.2.0         
PyOpenGL                      3.1.5         
pyparsing                     2.4.7         
pyperclip                     1.8.2         
pyrsistent                    0.17.3        
pysndfile                     1.3.8         
PySocks                       1.7.1         
pystan                        2.19.1.1      
pytablewriter                 0.59.0        
pytest                        3.6.4         
pytest-cov                    2.11.1        
pytest-env                    0.6.2         
pytest-forked                 1.3.0         
pytest-xdist                  2.2.1         
python-apt                    0.0.0         
python-chess                  0.23.11       
python-dateutil               2.8.1         
python-editor                 1.0.4         
python-louvain                0.15          
python-slugify                4.0.1         
python-utils                  2.5.6         
pytype                        2021.5.6      
pytz                          2018.9        
pyviz-comms                   2.0.1         
PyWavelets                    1.1.1         
PyYAML                        5.4.1         
pyzmq                         22.0.3        
qdldl                         0.1.5.post0   
qtconsole                     5.1.0         
QtPy                          1.9.0         
regex                         2021.4.4      
requests                      2.23.0        
requests-oauthlib             1.3.0         
resampy                       0.2.2         
retrying                      1.3.3         
rpy2                          3.4.3         
rsa                           4.7.2         
sb3-contrib                   1.0           
scikit-image                  0.16.2        
scikit-learn                  0.22.2.post1  
scikit-optimize               0.8.1         
scipy                         1.4.1         
screen-resolution-extra       0.0.0         
scs                           2.1.3         
seaborn                       0.11.1        
Send2Trash                    1.5.0         
setuptools                    56.1.0        
setuptools-git                1.2           
Shapely                       1.7.1         
simplegeneric                 0.8.1         
six                           1.15.0        
sklearn                       0.0           
sklearn-pandas                1.8.0         
smart-open                    5.0.0         
snowballstemmer               2.1.0         
sortedcontainers              2.3.0         
SoundFile                     0.10.3.post1  
spacy                         2.2.4         
Sphinx                        1.8.5         
sphinx-autobuild              2021.3.14     
sphinx-autodoc-typehints      1.12.0        
sphinx-rtd-theme              0.5.2         
sphinxcontrib-serializinghtml 1.1.4         
sphinxcontrib-spelling        7.2.1         
sphinxcontrib-websupport      1.2.4         
SQLAlchemy                    1.4.7         
sqlparse                      0.4.1         
srsly                         1.0.5         
stable-baselines3             1.0           
statsmodels                   0.10.2        
stevedore                     3.3.0         
sympy                         1.7.1         
tabledata                     1.1.4         
tables                        3.4.4         
tabulate                      0.8.9         
tblib                         1.7.0         
tcolorpy                      0.0.9         
tensorboard                   2.4.1         
tensorboard-plugin-wit        1.8.0         
tensorflow                    2.4.1         
tensorflow-datasets           4.0.1         
tensorflow-estimator          2.4.0         
tensorflow-gcs-config         2.4.0         
tensorflow-hub                0.12.0        
tensorflow-metadata           0.29.0        
tensorflow-probability        0.12.1        
termcolor                     1.1.0         
terminado                     0.9.4         
testpath                      0.4.4         
text-unidecode                1.3           
textblob                      0.15.3        
textgenrnn                    1.4.1         
Theano                        1.0.5         
thinc                         7.4.0         
tifffile                      2021.4.8      
toml                          0.10.2        
toolz                         0.11.1        
torch                         1.8.1+cu101   
torchsummary                  1.5.1         
torchtext                     0.9.1         
torchvision                   0.9.1+cu101   
tornado                       5.1.1         
tqdm                          4.41.1        
traitlets                     5.0.5         
tweepy                        3.10.0        
typed-ast                     1.4.3         
typeguard                     2.7.1         
typepy                        1.1.5         
typing-extensions             3.7.4.3       
tzlocal                       1.5.1         
uritemplate                   3.0.1         
urllib3                       1.24.3        
vega-datasets                 0.9.0         
wasabi                        0.8.2         
wcwidth                       0.2.5         
webencodings                  0.5.1         
Werkzeug                      1.0.1         
wheel                         0.36.2        
widgetsnbextension            3.5.1         
wordcloud                     1.5.0         
wrapt                         1.12.1        
xarray                        0.15.1        
xgboost                       0.90          
xkit                          0.0.0         
xlrd                          1.1.0         
xlwt                          1.3.0         
yellowbrick                   0.9.1         
zict                          2.0.0         
zipp                          3.4.1

SB3 v1.1 Breaking changes #116

I recently upgraded to a SB3 1.1 prerelease (what's currently master), and the breaking changes have minor implications for RL zoo. I don't know if you want PRs for this now, for me to wait until 1.1 is properly released, or if you want to do the upgrade yourself what. So far I've found one problem I've had to fix on my fork though:

Traceback (most recent call last):
  File "train.py", line 15, in <module>
    from utils.exp_manager import ExperimentManager
  File "/home/justin_terry/rl-baselines3-zoo/utils/exp_manager.py", line 26, in <module>
    from stable_baselines3.common.vec_env.obs_dict_wrapper import ObsDictWrapper
ModuleNotFoundError: No module named 'stable_baselines3.common.vec_env.obs_dict_wrapper'

[Feature request] Hyperparameter optimization from pretrained agent

Enabling the possibility to run --optimize with the --trained-agent flag would be great !
In my case, I pre-trained an agent on a simplified task and want to continue training it on the real task (which involves a modified reward, more obstacles etc.).
It would be great to be able to run a hyperparameter search for this second phase of the training. (Even though some hyperparameters, such as the network architecture, can't be tuned here).
For now, when I run both flags together, it just continues training (weirdly outputting less info than without the --optimize flag by the way).
Thanks for the awesome training framework !

[Minor Bug] Unnecessary and mildy confusing print statement during hyperparameter optimization

Starting a hyperparameter run with train.py prints out something like this:

========== LunarLanderContinuous-v2 ==========
Seed: 3139539977
OrderedDict([('batch_size', 64),
             ('ent_coef', 0.01),
             ('gae_lambda', 0.98),
             ('gamma', 0.999),
             ('n_envs', 16),
             ('n_epochs', 4),
             ('n_steps', 1024),
             ('n_timesteps', 1000000.0),
             ('policy', 'MlpPolicy')])
Using 4 environments
Overwriting n_timesteps with n=4000000

The problem is that the OrderedDict that's printed is the parameters for the environment in the relevant .yml file. It has no impact on hyperparameter tuning, isn't even used, can make it look like you're doing something wrong (e.g. by default you're tuning hyperparameters that aren't included in that list) and the origin of it takes a shockingly long time to hunt down to confirm you aren't somehow doing something wrong.

Understanding BipedalWalkerHardcore-v3 results

  1. Are the hyperparameters for BipedalWalkerHardcore-v3 and BipedalWalker-v3 tuned for PPO? You provide a benchmark for both but the YAML file does not have a #tuned tag. From my understanding I would get similar results for different seeds when using tuned parameters but the performance varies for different runs for the hardcore environment.

  2. What does

NOTE: this is not a quantitative benchmark as it corresponds to only one run (cf issue #38). This benchmark is meant to check algorithm (maximal) performance, find potential bugs and also allow users to have access to pretrained agents.

mean for the BipedalWalker with PPO? Did you use the best model you got from all runs or is it reproducable using the hyperparamters?

Kind regards (:

Several trials with the same value [question]

Good day. I'm trying zoo now on a custom environment, and I'm getting a couple of questions.

  • There are many trials that finished with the exact same value, and there's more than 1 instance of that happening, too. Why could that be so ? I can't make sense of it.

  • It's been over 12 hours and I'm getting just 47 trials, that means that it's gonna take 11 days non-stop to finish the 1000 trials. Is this common ? The max number of steps for my env is 1200

  • What should I set my 'net_arch' to, if the suggestion is "big" or "medium" ?

  • What does it mean for a suggestion to be "episodic" ? There's no flag like that in the TD3 Class or the .learn() method.

Thanks a lot in advance.

(py3_6_12) andres@andres-mint:~/Documents/pyt_projs/git_clones/rl-baselines3-zoo$ python train.py --algo td3 --env BalanceBallPlus-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 4 --sampler tpe --pruner median
========== BalanceBallPlus-v0 ==========
Seed: 688703236
OrderedDict([('batch_size', 100),
             ('buffer_size', 1000000),
             ('env_wrapper', 'sb3_contrib.common.wrappers.TimeFeatureWrapper'),
             ('gamma', 0.99),
             ('gradient_steps', 1000),
             ('learning_rate', 0.001),
             ('learning_starts', 10000),
             ('n_timesteps', 1000000.0),
             ('noise_std', 0.1),
             ('noise_type', 'normal'),
             ('policy', 'MlpPolicy'),
             ('policy_kwargs', 'dict(net_arch=[400, 300])'),
             ('train_freq', 1000)])
Using 1 environments
Overwriting n_timesteps with n=50000
pybullet build time: Nov 26 2020 23:07:47
/home/andres/anaconda3/envs/py3_6_12/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
Applying normal noise with std 0.1
Optimizing hyperparameters
Sampler: tpe - Pruner: median
[I 2020-12-26 06:44:05,953] A new study created in memory with name: no-name-ce67f306-96d9-4778-85a6-f7f50e6461a5
[I 2020-12-26 07:09:10,061] Trial 3 finished with value: -2.966192 and parameters: {'gamma': 0.995, 'lr': 2.3763065873412468e-05, 'batch_size': 128, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 1000, 'noise_type': None, 'noise_std': 0.17512030997245787, 'net_arch': 'small'}. Best is trial 3 with value: -2.966192.
[I 2020-12-26 07:22:59,041] Trial 2 finished with value: -167.043263 and parameters: {'gamma': 0.999, 'lr': 0.0014386451147686773, 'batch_size': 32, 'buffer_size': 1000000, 'episodic': True, 'noise_type': None, 'noise_std': 0.5005792278567412, 'net_arch': 'medium'}. Best is trial 3 with value: -2.966192.
[I 2020-12-26 07:40:32,356] Trial 0 finished with value: -53.806006000000004 and parameters: {'gamma': 0.98, 'lr': 9.035743858880986e-05, 'batch_size': 256, 'buffer_size': 100000, 'episodic': True, 'noise_type': 'ornstein-uhlenbeck', 'noise_std': 0.9237972716285621, 'net_arch': 'big'}. Best is trial 3 with value: -2.966192.
[I 2020-12-26 07:43:16,663] Trial 5 finished with value: -107.140596 and parameters: {'gamma': 0.9, 'lr': 0.0003670649552870335, 'batch_size': 64, 'buffer_size': 1000000, 'episodic': True, 'noise_type': 'normal', 'noise_std': 0.10014564939749304, 'net_arch': 'small'}. Best is trial 3 with value: -2.966192.
[I 2020-12-26 08:05:35,558] Trial 7 finished with value: -46.69298 and parameters: {'gamma': 0.9, 'lr': 0.006234401879219351, 'batch_size': 256, 'buffer_size': 10000, 'episodic': True, 'noise_type': None, 'noise_std': 0.8369587884996299, 'net_arch': 'small'}. Best is trial 3 with value: -2.966192.
[I 2020-12-26 08:24:38,394] Trial 6 finished with value: -167.043263 and parameters: {'gamma': 0.95, 'lr': 0.32797048731022754, 'batch_size': 128, 'buffer_size': 10000, 'episodic': True, 'noise_type': 'normal', 'noise_std': 0.6534519680711605, 'net_arch': 'medium'}. Best is trial 3 with value: -2.966192.
[I 2020-12-26 08:57:49,229] Trial 1 finished with value: -167.043263 and parameters: {'gamma': 0.99, 'lr': 0.016250740365838234, 'batch_size': 2048, 'buffer_size': 10000, 'episodic': True, 'noise_type': 'normal', 'noise_std': 0.8837561835079127, 'net_arch': 'medium'}. Best is trial 3 with value: -2.966192.
[I 2020-12-26 09:02:04,719] Trial 9 finished with value: 149.950291 and parameters: {'gamma': 0.98, 'lr': 0.09222338177011687, 'batch_size': 16, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 256, 'noise_type': 'normal', 'noise_std': 0.48004918808417185, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 09:18:46,665] Trial 10 finished with value: -86.020416 and parameters: {'gamma': 0.9999, 'lr': 0.010826699760461068, 'batch_size': 128, 'buffer_size': 100000, 'episodic': False, 'train_freq': 16, 'noise_type': None, 'noise_std': 0.9218371772935142, 'net_arch': 'small'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 09:49:57,960] Trial 11 finished with value: -27.108461 and parameters: {'gamma': 0.99, 'lr': 0.0002607336884492247, 'batch_size': 2048, 'buffer_size': 100000, 'episodic': True, 'noise_type': 'normal', 'noise_std': 0.4876766153300758, 'net_arch': 'small'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 10:10:09,944] Trial 4 finished with value: 149.950291 and parameters: {'gamma': 0.99, 'lr': 0.09596935214110522, 'batch_size': 2048, 'buffer_size': 10000, 'episodic': False, 'train_freq': 1000, 'noise_type': None, 'noise_std': 0.7610310201815246, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 10:18:13,678] Trial 12 finished with value: -179.039573 and parameters: {'gamma': 0.999, 'lr': 0.009408532288202605, 'batch_size': 256, 'buffer_size': 10000, 'episodic': True, 'noise_type': 'ornstein-uhlenbeck', 'noise_std': 0.6316383873470157, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 10:27:21,354] Trial 13 finished with value: -167.043263 and parameters: {'gamma': 0.98, 'lr': 0.4498888137624217, 'batch_size': 16, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 256, 'noise_type': 'ornstein-uhlenbeck', 'noise_std': 0.29382174552012386, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 10:35:46,187] Trial 16 pruned. 
[I 2020-12-26 10:46:24,987] Trial 14 finished with value: -167.043263 and parameters: {'gamma': 0.98, 'lr': 0.48126050460164926, 'batch_size': 16, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 256, 'noise_type': 'ornstein-uhlenbeck', 'noise_std': 0.35394714914984504, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 10:55:16,323] Trial 8 finished with value: -167.043263 and parameters: {'gamma': 0.98, 'lr': 0.14056740011818536, 'batch_size': 2048, 'buffer_size': 1000000, 'episodic': True, 'noise_type': 'normal', 'noise_std': 0.24861868995949032, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 10:55:30,237] Trial 15 finished with value: -52.049041 and parameters: {'gamma': 0.98, 'lr': 0.34282425928675386, 'batch_size': 16, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 256, 'noise_type': 'normal', 'noise_std': 0.30874667494551744, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 11:18:48,623] Trial 17 finished with value: -52.62094 and parameters: {'gamma': 0.98, 'lr': 0.07741577403078599, 'batch_size': 100, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 1000, 'noise_type': 'normal', 'noise_std': 0.38034604318077103, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 11:29:24,542] Trial 18 finished with value: -179.039573 and parameters: {'gamma': 0.98, 'lr': 0.07000544909228827, 'batch_size': 100, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': 'normal', 'noise_std': 0.4917063481756351, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 12:42:58,148] Trial 21 finished with value: 149.950291 and parameters: {'gamma': 0.99, 'lr': 0.03485792895505385, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.7748378071423547, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 12:53:57,673] Trial 22 finished with value: -167.043263 and parameters: {'gamma': 0.9999, 'lr': 0.03017840665601139, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 128, 'noise_type': None, 'noise_std': 0.7895056964816064, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 12:58:05,962] Trial 19 finished with value: -167.043263 and parameters: {'gamma': 0.99, 'lr': 0.04564261693344617, 'batch_size': 1024, 'buffer_size': 10000, 'episodic': False, 'train_freq': 128, 'noise_type': 'normal', 'noise_std': 0.465268061541669, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 12:58:27,350] Trial 20 finished with value: -52.049041 and parameters: {'gamma': 0.995, 'lr': 0.06252329780433087, 'batch_size': 1024, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.4942103437027389, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 13:32:28,457] Trial 26 pruned. 
[I 2020-12-26 14:16:07,999] Trial 24 finished with value: -65.89648 and parameters: {'gamma': 0.99, 'lr': 0.001809149984011838, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.6014320878186472, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 14:20:37,143] Trial 25 finished with value: 149.950291 and parameters: {'gamma': 0.99, 'lr': 0.003360498924126242, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.7435862410168138, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 14:44:27,204] Trial 23 finished with value: -59.411483 and parameters: {'gamma': 0.995, 'lr': 0.0018973546110706463, 'batch_size': 1024, 'buffer_size': 10000, 'episodic': False, 'train_freq': 128, 'noise_type': None, 'noise_std': 0.5797212846678705, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 14:51:46,423] Trial 27 finished with value: 149.950291 and parameters: {'gamma': 0.95, 'lr': 0.9081390217934642, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.9939435496543059, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 15:17:35,901] Trial 29 finished with value: -52.049041 and parameters: {'gamma': 0.99, 'lr': 0.02087475325558366, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.9858779085972529, 'net_arch': 'medium'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 15:29:38,490] Trial 28 finished with value: -179.039573 and parameters: {'gamma': 0.95, 'lr': 0.20026135651189286, 'batch_size': 512, 'buffer_size': 100000, 'episodic': False, 'train_freq': 1000, 'noise_type': None, 'noise_std': 0.5828407402028809, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 15:40:23,231] Trial 30 finished with value: -179.039573 and parameters: {'gamma': 0.99, 'lr': 0.005146795122267626, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.9922450882442746, 'net_arch': 'medium'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 15:47:47,567] Trial 31 finished with value: -167.043263 and parameters: {'gamma': 0.95, 'lr': 0.004842010015157103, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.9707259160241756, 'net_arch': 'medium'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 16:32:43,979] Trial 32 finished with value: -167.043263 and parameters: {'gamma': 0.99, 'lr': 0.005043618255920021, 'batch_size': 512, 'buffer_size': 100000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.6993714984181254, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 16:43:11,417] Trial 33 finished with value: -81.090492 and parameters: {'gamma': 0.95, 'lr': 1.142953243401593e-05, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.9954450975421397, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 16:53:18,618] Trial 34 finished with value: -96.260865 and parameters: {'gamma': 0.95, 'lr': 0.0006517069341591469, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.834699645830437, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 17:01:20,391] Trial 35 finished with value: 149.950291 and parameters: {'gamma': 0.95, 'lr': 0.87227910330435, 'batch_size': 512, 'buffer_size': 10000, 'episodic': False, 'train_freq': 2000, 'noise_type': None, 'noise_std': 0.8533854521550969, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 17:14:28,346] Trial 36 finished with value: 17.055749 and parameters: {'gamma': 0.95, 'lr': 0.0007090520764364181, 'batch_size': 32, 'buffer_size': 10000, 'episodic': False, 'train_freq': 1, 'noise_type': 'ornstein-uhlenbeck', 'noise_std': 0.8750327667779954, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 17:25:24,484] Trial 37 finished with value: -77.045215 and parameters: {'gamma': 0.95, 'lr': 0.8531124566426208, 'batch_size': 32, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 1, 'noise_type': 'ornstein-uhlenbeck', 'noise_std': 0.8677264705868984, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 17:27:24,652] Trial 38 finished with value: -52.049041 and parameters: {'gamma': 0.99, 'lr': 0.02620810746577697, 'batch_size': 64, 'buffer_size': 10000, 'episodic': False, 'train_freq': 16, 'noise_type': None, 'noise_std': 0.8928606266895032, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 17:34:03,031] Trial 42 pruned. 
[I 2020-12-26 17:42:51,929] Trial 39 finished with value: 149.950291 and parameters: {'gamma': 0.95, 'lr': 0.9494030805368031, 'batch_size': 32, 'buffer_size': 10000, 'episodic': False, 'train_freq': 1, 'noise_type': None, 'noise_std': 0.8809092713572011, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 17:49:35,371] Trial 40 finished with value: -52.049041 and parameters: {'gamma': 0.99, 'lr': 0.9189626266960231, 'batch_size': 64, 'buffer_size': 10000, 'episodic': False, 'train_freq': 16, 'noise_type': None, 'noise_std': 0.7858857639710304, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 18:01:22,750] Trial 41 finished with value: -168.040922 and parameters: {'gamma': 0.999, 'lr': 0.9488439713863275, 'batch_size': 64, 'buffer_size': 10000, 'episodic': False, 'train_freq': 1000, 'noise_type': None, 'noise_std': 0.9371933101519725, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 18:12:18,427] Trial 43 finished with value: -179.039573 and parameters: {'gamma': 0.999, 'lr': 0.22322268887467173, 'batch_size': 64, 'buffer_size': 10000, 'episodic': False, 'train_freq': 1000, 'noise_type': None, 'noise_std': 0.8049306042778772, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 18:27:16,658] Trial 45 finished with value: -167.043263 and parameters: {'gamma': 0.999, 'lr': 0.2068361241733535, 'batch_size': 32, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 256, 'noise_type': 'normal', 'noise_std': 0.007584062943803038, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 19:21:14,016] Trial 46 finished with value: -167.043263 and parameters: {'gamma': 0.9999, 'lr': 0.2290603740718307, 'batch_size': 512, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 2000, 'noise_type': 'normal', 'noise_std': 0.4188127541502431, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.
[I 2020-12-26 19:32:54,938] Trial 47 finished with value: -179.039573 and parameters: {'gamma': 0.9999, 'lr': 0.014488144853174149, 'batch_size': 512, 'buffer_size': 1000000, 'episodic': False, 'train_freq': 256, 'noise_type': 'normal', 'noise_std': 0.7270447502646388, 'net_arch': 'big'}. Best is trial 9 with value: 149.950291.

System Info
No GPU
stable-baselines3 0.11.0a4
Conda 4.9.2
Python 3.6.12
Pytorch 1.7.0
Pybullet 3.0.7
Gym 0.18.0
Numpy 1.19.4
Optuna 2.3.0

Linux Mint 20 Cinnamon - 4.6.7 - 5.4.0-58-generic
Intel© Core™ i5-3570 CPU @ 3.40GHz × 4

Training from scratch takes too much time (in Atari envs for 10M steps) [question]

Describe the bug
When I tried training dqn from scratch in Atari envs for 10M steps, it only takes about 1000MiB on my gpu and it's really slow (I have to wait for more than 2 days for it to complete). In other pytorch dqn implementations it would take much less time, but would take much larger space on the gpu (~ 10000MiB). However I really like the structure here and the trained agents provided in this repo. So is there any help on how to speed it up? Thanks so much!!

System Info
Describe the characteristic of your environment:

  • Describe how Stable Baselines3 was installed: pip
  • GPU models and configuration: NVIDIA GeForce RTX-2080Ti, CUDA 10.1
  • Python version: 3.7.10
  • PyTorch version: 1.6.0+cu101

Running enjoy.py on existing trained agent throws exception

Describe the bug
I am trying to directly run one of the trained agents, using the colab notebook linked in the repo, but get an exception printed below.
Link to notebook: https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/rl-baselines-zoo.ipynb

Code example
These are the steps I ran until I hit the error, on colab with GPU:

!apt-get install swig cmake ffmpeg freeglut3-dev xvfb
!git clone --recursive https://github.com/DLR-RM/rl-baselines3-zoo
%cd /content/rl-baselines3-zoo/
!pip install -r requirements.txt
!python enjoy.py --algo a2c --env Pendulum-v0 --folder rl-trained-agents/ -n 5000

Here's the error stack trace:

Loading latest experiment, id=1
Loading running average
with params: {'norm_obs': True, 'norm_reward': False}
Traceback (most recent call last):
  File "enjoy.py", line 225, in <module>
    main()
  File "enjoy.py", line 155, in main
    model = ALGOS[algo].load(model_path, env=env, custom_objects=custom_objects, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/base_class.py", line 623, in load
    data, params, pytorch_variables = load_from_zip_file(path, device=device, custom_objects=custom_objects)
  File "/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/save_util.py", line 402, in load_from_zip_file
    data = json_to_data(json_data, custom_objects=custom_objects)
  File "/usr/local/lib/python3.7/dist-packages/stable_baselines3/common/save_util.py", line 164, in json_to_data
    deserialized_object = cloudpickle.loads(base64_object)
  File "/usr/local/lib/python3.7/dist-packages/cloudpickle/cloudpickle_fast.py", line 395, in <module>
    class CloudPickler(Pickler):
  File "/usr/local/lib/python3.7/dist-packages/cloudpickle/cloudpickle_fast.py", line 418, in CloudPickler
    dispatch[types.CellType] = _cell_reduce
AttributeError: module 'types' has no attribute 'CellType'

System Info
Describe the characteristic of your environment:

  • Describe how Stable Baselines3 was installed (pip, docker, source, ...)
  • GPU models and configuration
  • Python version
  • PyTorch version
  • Versions of any other relevant libraries (Pybullet, ...)
    Library versions that get installed:
    cloudpickle 1.3.0
    stable_baselines3 1.1.0a7

Additional context
This may be similar to the problem listed here, of a mismatch of cloudpickle version used to save the model, and one being loaded?

The error AttributeError: module 'types' has no attribute 'CellType' is the result of different minor versions of cloudpickle between the environment where a flow runs and the environment where it was registered.
PrefectHQ/prefect#3148

[question] Hyperparameter tuning - single process

Hi everyone.

I am trying to do hyperparameter tuning for a custom environment of mine.
The problem is, that I cannot run multiple processes of the same environment, since it loads a dll with ctypes, that will only let me run one instance at a time.
Does anybody see any workaround for that?

Thanks in advance

[question] Why hyperparameters search don't use `--eval-freq` argument?

When we train an agent we need to use --eval-freq argument to configure the evaluation process.

But when we do hyperparameters search --eval-freq is ignored and --n-evaluations is used instead. So evaluation frequency is indirectly calculated (n_timesteps / n_evaluations).

I think it would be simpler if we could use only --eval-freq argument. Is there any special reason to keep both arguments?

Record a training experiment [feature request]

I think it would be useful to create a video of a specific training experiment, as long as the training saved one or more checkpoints.
The final result would be a video where each checkpoint is loaded and executed, and the format could be specified, e.g. mp4 or gif. The process would be like this:

  • Record a video of each checkpoint and final models (perhaps best model also if available) into a temporary directory, using utils/record_video.py.
  • Use ffmpeg or another tool to join the videos together and convert it into other formats. Wait some time between the different checkpoints, so the final video isn't too fast and confusing.
  • It would be nice if the number of checkpoint appeared as text in the video, and changed with the models recorded accordingly.

The problems I can think ahead for implementing this:

  • In order to keep the code clean, I'd suggest implementing this feature in utils/record_training.py. However, I'm not sure if it's a good idea to call the module utils/record_video.py within this new file.
  • For me, it would make more sense to specify a video length by the number of episodes rather than the number of steps, so when x episodes are done, the video would stop. If that's the case, VecVideoRecorder would need to be modified.

[Other] Optimized Hyperparameters for LunarLanderContinuous_v2 with PPO

I needed to do a run with PPO on a Gym environment on my cluster to make sure everything is working right before moving onto tuning PettingZoo environments, so I did combination that no one had done before:

python3 train.py --algo ppo --env LunarLanderContinuous-v2 -n 2000000 -optimize --n-trials 1000 --n-jobs 2 --sampler tpe --pruner median --study-name lunar_lander_1 --storage mysql://[redacted]

I ran that on 10 GPUs for a little less than 24 hours and got through about 170 trials before performance improvements stopped being meaningful and everything went to pruning. These were the best parameters (mind you a score of 200 counts as "solved" here).

[I 2021-04-15 15:13:02,318] Trial 134 finished with value: 307.1917026 and parameters: {'batch_size': 256, 'n_steps': 256, 'gamma': 0.995, 'lr': 0.000803803946053569, 'ent_coef': 3.2165680942085065e-07, 'clip_range': 0.2, 'n_epochs': 5, 'gae_lambda': 0.99, 'max_grad_norm': 2, 'vf_coef': 0.8682145978405473, 'net_arch': 'small', 'activation_fn': 'relu'}. Best is trial 134 with value: 307.192.

[Question] Every few days all rl-zoo jobs on a node stop without any errors and I can't figure out why

In my trials tuning both pettingzoo gym environments, all optuna jobs on one node have just simply stopped twice now. There have been no system errors, nothing printed to stdout, the processes have all stopped (no zombie processes etc.), and the jobs have been very far from finishing and can simply be restarted by connecting them to the sql server again seemingly without issue. They just sort of magically stop and I have no idea why. Have you seen anything like this before?

record_video.py not working on PyBullet envs [bug]

Describe the bug

After training a PyBullet environment, I try to record a video of it:

python -m utils.record_video --algo tqc --env ReacherBulletEnv-v0 -n 1000

And I get the following output:

pybullet build time: Mar  8 2021 17:24:12
Loading latest experiment, id=1
pybullet build time: Mar  8 2021 17:24:12
Loading latest experiment, id=1
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/forkserver.py", line 196, in main
    _serve_one(s, listener, alive_r, old_handlers)
  File "/usr/lib/python3.6/multiprocessing/forkserver.py", line 231, in _serve_one
    code = spawn._main(child_r)
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 114, in _main
    prepare(preparation_data)
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 223, in prepare
    _fixup_main_from_name(data['init_main_from_name'])
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 249, in _fixup_main_from_name
    alter_sys=True)
  File "/usr/lib/python3.6/runpy.py", line 205, in run_module
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/mcres/git/rl-baselines3-zoo/utils/record_video.py", line 74, in <module>
    hyperparams=hyperparams,
  File "/home/mcres/git/rl-baselines3-zoo/utils/utils.py", line 219, in create_test_env
    vec_env_kwargs=vec_env_kwargs,
  File "/home/mcres/git/stable-baselines3/stable_baselines3/common/env_util.py", line 102, in make_vec_env
    return vec_env_cls([make_env(i + start_index) for i in range(n_envs)], **vec_env_kwargs)
  File "/home/mcres/git/stable-baselines3/stable_baselines3/common/vec_env/subproc_vec_env.py", line 106, in __init__
    process.start()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.6/multiprocessing/context.py", line 291, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.6/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.6/multiprocessing/popen_forkserver.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/mcres/git/rl-baselines3-zoo/utils/record_video.py", line 74, in <module>
    hyperparams=hyperparams,
  File "/home/mcres/git/rl-baselines3-zoo/utils/utils.py", line 219, in create_test_env
    vec_env_kwargs=vec_env_kwargs,
  File "/home/mcres/git/stable-baselines3/stable_baselines3/common/env_util.py", line 102, in make_vec_env
    return vec_env_cls([make_env(i + start_index) for i in range(n_envs)], **vec_env_kwargs)
  File "/home/mcres/git/stable-baselines3/stable_baselines3/common/vec_env/subproc_vec_env.py", line 111, in __init__
    observation_space, action_space = self.remotes[0].recv()
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

The same thing happens if I change algorithm and environment, e.g. td3 and HalfCheetahBulletEnv-v0, respectively.

System Info
Describe the characteristic of your environment:

  • Describe how Stable Baselines3 was installed: pip install -e
  • Python version: 3.6.9
  • PyTorch version: 1.8.0
  • Versions of any other relevant libraries:
gym==0.18.0
pybullet==3.1.0

Additional context
Recording video on normal gym environments is working.
Executing the PyBullet envs with enjoy.py is also working.

Tensorboard log broken while hyperparameter optimization

Describe the bug
Hyperparameter optimization breaks the tensorboard logging. When it is active and multiple optimization jobs are running, all datapoints are logged to the last job's tensorboard.
While this does not break training itself, it makes the tensorboard log unreadable.

Code example
Running the following command will show two logs in the tensorboard, but all values are written to the second one.

python train.py --algo ppo --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 \
  --sampler tpe --pruner median -tb TB_LOG_DIR

tensorboard

System Info
Stable Baselines 0.9.0a1
Python 3.7.5
Tensorboard 2.3.0
torch 1.6.0

Additional context
I think the issue comes from the logger being a singleton. During normal training with multiple envs it is good that all of them write to the same tensorboard. During optimization, each job calls configure_logger. This overwrites the current tensorboard logger. Resulting in just having one logger at the end with the name of the last job.

OpenCV error when enjoying Bullet env using Docker

Hi, I'm not sure if I should put this here or in the SB3 main repo.

Steps to reproduce

git clone --recursive https://github.com/DLR-RM/rl-baselines3-zoo
cd rl-baselines3-zoo
make docker-cpu
./scripts/run_docker_cpu.sh python train.py --algo sac --env HumanoidBulletEnv-v0 --save-freq 1000
./scripts/run_docker_cpu.sh python enjoy.py --algo sac --env HumanoidBulletEnv-v0 -f logs/ --exp-id 1

Output (of last command)

Executing in the docker (cpu image):
python enjoy.py --algo sac --env HumanoidBulletEnv-v0 -f logs/ --exp-id 1
+ export DISPLAY=:1
+ DISPLAY=:1
+ display=1
+ file=/tmp/.X11-unix/X1
+ sleep 1
+ Xvfb :1 -screen 0 1024x768x24
++ seq 1 10
+ for i in '$(seq 1 10)'
+ '[' -e /tmp/.X11-unix/X1 ']'
+ break
+ '[' -e /tmp/.X11-unix/X1 ']'
+ exec bash -c 'cd /root/code/rl_zoo/ && python enjoy.py --algo sac --env HumanoidBulletEnv-v0 -f logs/ --exp-id 1'
pybullet build time: Jun 19 2020 04:01:58
/opt/conda/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
Loading running average
with params: {'norm_obs': True, 'norm_reward': False}
Traceback (most recent call last):
  File "enjoy.py", line 201, in <module>
    main()
  File "enjoy.py", line 138, in main
    env.render('human')
  File "/opt/conda/lib/python3.6/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 272, in render
    return self.venv.render(mode=mode)
  File "/opt/conda/lib/python3.6/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 272, in render
    return self.venv.render(mode=mode)
  File "/opt/conda/lib/python3.6/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 187, in render
    cv2.imshow('vecenv', bigimg[:, :, ::-1])
cv2.error: OpenCV(4.2.0) /io/opencv/modules/highgui/src/window.cpp:651: error: (-2:Unspecified error) The function is not implemented. Rebuild the library with Windows, GTK+ 2.x or Cocoa support. If you are on Ubuntu or Debian, install libgtk2.0-dev and pkg-config, then re-run cmake or configure script in function 'cvShowImage'

System Info
Describe the characteristic of your environment:

  • Docker version 19.03.12
  • Host OS: Linux 5.3.0-59-generic #53~18.04.1-Ubuntu x86_64 GNU/Linux
  • CPU only

[Feature Request] Documentation for n_evaluations flag should be improved

Right now, all that's said is:

parser.add_argument("--n-evaluations", help="Number of evaluations for hyperparameter optimization", type=int, default=20)

The problem is that this doesn't tell you what exact number is being referred to without diving into the code--it could plausibly be the number of times a hyperparameter set is evaluated and tested, the number of evaluation points during a training curve, or a few other things.

hyperparameter optimization with customised environment

Hi, first question I have is that is it possible to run hyperparameter optimization using customised environment? If possible, what should be the file structure for train.py to recognize the environment. I have tried the following structure

│   train.py    
|   setup.py
│
└───gym_environment/
│   │   envs/
│   │   __init__.py

where __init__.py registers

    id='FullFilterEnv-v0',
    entry_point='gym_environment.envs:FullFilterEnv',
    max_episode_steps=10,

Then I run
python train.py --algo td3 --env FullFilterEnv-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 --sampler random --pruner median .
But the following error pops up
ValueError: FullFilterEnv-v0 not found in gym registry, you maybe meant AntBulletEnv-v0?

Should I put the env into the yml file in hyperparams/ ? Perhaps a good example in the documents would be helpful.

Thanks a lot!

System Info

  • stable-baselines3-0.8.0a0

Is CarRacing tuned for PPO? [question]

I tried to train CarRacing-v0 with the ppo algorithm with the custom hyperparameters but the final rewards seem to be far away from a good performance.

At about 1e5 and 1e6 steps, the rollout mean reward seem to be similar, but the training losses fell during that time. I read solving the environment is defined as receiving a reward of 900 for hundred consecutive episodes. For description, I copied the two prints to stdout at 1e5 and 1e6 below.

-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 1e+03       |
|    mean_reward          | -82.3       |
| rollout/                |             |
|    ep_len_mean          | 962         |
|    ep_rew_mean          | -44.3       |
| time/                   |             |
|    fps                  | 101         |
|    iterations           | 25          |
|    time_elapsed         | 1004        |
|    total_timesteps      | 102400      |
| train/                  |             |
|    approx_kl            | 0.015621316 |
|    clip_fraction        | 0.0234      |
|    clip_range           | 0.4         |
|    entropy_loss         | -4.67       |
|    explained_variance   | -36.2       |
|    learning_rate        | 3e-05       |
|    loss                 | 1.81        |
|    n_updates            | 480         |
|    policy_gradient_loss | -0.00595    |
|    std                  | 0.127       |
|    value_loss           | 2.82        |
-----------------------------------------

and

-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 1e+03       |
|    mean_reward          | -82.4       |
| rollout/                |             |
|    ep_len_mean          | 1e+03       |
|    ep_rew_mean          | -75.6       |
| time/                   |             |
|    fps                  | 95          |
|    iterations           | 245         |
|    time_elapsed         | 10460       |
|    total_timesteps      | 1003520     |
| train/                  |             |
|    approx_kl            | 0.060493056 |
|    clip_fraction        | 0.148       |
|    clip_range           | 0.4         |
|    entropy_loss         | -4.06       |
|    explained_variance   | -6.8        |
|    learning_rate        | 3e-05       |
|    loss                 | 0.2         |
|    n_updates            | 4880        |
|    policy_gradient_loss | -0.015      |
|    std                  | 0.106       |
|    value_loss           | 0.465       |
-----------------------------------------

I trained the RL agent with the default setting by tying python3 train.py --algo ppo --env CarRacing-v0.

Does anybody have an idea what I could do to train the agent better? What information could I look at? I am grateful for any tips or suggestions.

What does the hyperparameter "normalize" refer to in PPO?

PPO hyperparameter configurations often refer to normalize as a logical, e.g.

normalize: true

It's not clear to me what configuration this particular hyperparameter refers to, (if anything?) e.g. I see A2C tunes the normalize_advantage parameter, but that's not a hyperparameter for PPO. PPO has a boolean to normalize_image, but don't think that's it either. Is this controlling whether or not the env gets wrapped in vector normalize?

(For context here -- I've found the zoo scripts here particularly handy for tuning even for my custom environments, thanks! but am struggling to reproduce some of the tuned results by passing the best hyper-parameters directly to fresh initializations of the RL algorithms. Thanks for the amazing work you've done in developing stable-baselines and the zoo!)

Multiple concerns surrounding eval_freq

  1. The eval_freq passed to train.py as a command line arg is completely thrown out during optimization in favor one one generated from n_evaluations. This is either large omission in the command line argument description or a bug and I am not certain which.
  2. If that is not a bug, exp_mananger.py essentially has two different variables floating around named eval_freq, which makes tracking down what's going on very confusing. One should presumably be renamed or something.
  3. This is a minor issue that I wouldn't have created an issue for on it's own, but the underscore in the variable name here
    eval_freq_ = max(eval_freq // model.get_env().num_envs, 1)
    is not best style practices and I've stared at that line for a pretty long time now.

I'm also going to create a related issue in SB3.

Error when run hyperparameter optimization

Hi, I was trying the following code from README python train.py --algo ppo --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 \ --sampler random --pruner median, but I got the following error. Could you tell me where the problem possibly is? Thanks!

Error message

Seed: 1990554247
OrderedDict([('ent_coef', 0.0),
             ('gae_lambda', 0.98),
             ('gamma', 0.99),
             ('n_envs', 16),
             ('n_epochs', 4),
             ('n_steps', 16),
             ('n_timesteps', 1000000.0),
             ('nminibatches', 1),
             ('normalize', True),
             ('policy', 'MlpPolicy')])
Using 16 environments
Overwriting n_timesteps with n=50000
Normalizing input and reward
Optimizing hyperparameters
Sampler: tpe - Pruner: median
Normalizing input and reward
[W 2020-06-25 11:09:18,251] Setting status of trial#0 as TrialState.FAIL because of the following error: TypeError("__init__() got an unexpected keyword argument 'nminibatches'")
Traceback (most recent call last):
  File "/Users/eejiiew/Projects/filter-zoo/venv/lib/python3.7/site-packages/optuna/study.py", line 734, in _run_trial
    result = func(trial)
  File "/Users/eejiiew/Projects/filter-zoo/rl-baselines3-zoo/utils/hyperparams_opt.py", line 84, in objective
    model = model_fn(**kwargs)
  File "train.py", line 385, in create_model
    verbose=0, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'nminibatches'
Normalizing input and reward
[W 2020-06-25 11:09:18,346] Setting status of trial#1 as TrialState.FAIL because of the following error: TypeError("__init__() got an unexpected keyword argument 'nminibatches'")
Traceback (most recent call last):
  File "/Users/eejiiew/Projects/filter-zoo/venv/lib/python3.7/site-packages/optuna/study.py", line 734, in _run_trial
    result = func(trial)
  File "/Users/eejiiew/Projects/filter-zoo/rl-baselines3-zoo/utils/hyperparams_opt.py", line 84, in objective
    model = model_fn(**kwargs)
  File "train.py", line 385, in create_model
    verbose=0, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'nminibatches'
Traceback (most recent call last):
  File "train.py", line 392, in <module>
    storage=args.storage, study_name=args.study_name, verbose=args.verbose)
  File "/Users/eejiiew/Projects/filter-zoo/rl-baselines3-zoo/utils/hyperparams_opt.py", line 119, in hyperparam_optimization
    study.optimize(objective, n_trials=n_trials, n_jobs=n_jobs)
  File "/Users/eejiiew/Projects/filter-zoo/venv/lib/python3.7/site-packages/optuna/study.py", line 382, in optimize
    for _ in _iter
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/joblib/parallel.py", line 934, in __call__
    self.retrieve()
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/joblib/parallel.py", line 833, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 567, in __call__
    return self.func(*args, **kwargs)
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/joblib/parallel.py", line 225, in __call__
    for func, args, kwargs in self.items]
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/joblib/parallel.py", line 225, in <listcomp>
    for func, args, kwargs in self.items]
  File "/Users/eejiiew/Projects/filter-zoo/venv/lib/python3.7/site-packages/optuna/study.py", line 648, in _reseed_and_optimize_sequential
    func, n_trials, timeout, catch, callbacks, gc_after_trial, time_start
  File "/Users/eejiiew/Projects/filter-zoo/venv/lib/python3.7/site-packages/optuna/study.py", line 682, in _optimize_sequential
    self._run_trial_and_callbacks(func, catch, callbacks, gc_after_trial)
  File "/Users/eejiiew/Projects/filter-zoo/venv/lib/python3.7/site-packages/optuna/study.py", line 713, in _run_trial_and_callbacks
    trial = self._run_trial(func, catch, gc_after_trial)
  File "/Users/eejiiew/Projects/filter-zoo/venv/lib/python3.7/site-packages/optuna/study.py", line 734, in _run_trial
    result = func(trial)
  File "/Users/eejiiew/Projects/filter-zoo/rl-baselines3-zoo/utils/hyperparams_opt.py", line 84, in objective
    model = model_fn(**kwargs)
  File "train.py", line 385, in create_model
    verbose=0, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'nminibatches'

System Info

  • stable-baselines3-0.8.0a0

Blur view of reacher pybullet env

I trained the ReacherBulletEnv-v0 using this repository, but when I run the enjoy.py file, I get a very blur environment. How to correct this?

Screenshot from 2020-12-07 16-56-01

The config and version of packages in my environment are given below:

Package                        Version           
------------------------------ -------------------
absl-py                        0.11.0
alembic                        1.4.3
astroid                        2.4.2
astunparse                     1.6.3
atari-py                       0.2.6
attrs                          20.3.0
azure                          1.0.3
azure-common                   1.1.26
azure-mgmt                     0.20.2
azure-mgmt-common              0.20.0
azure-mgmt-compute             0.20.1
azure-mgmt-network             0.20.1
azure-mgmt-nspkg               3.0.2
azure-mgmt-resource            0.20.1
azure-mgmt-storage             0.20.0
azure-nspkg                    3.0.2
azure-servicebus               0.20.1
azure-servicemanagement-legacy 0.20.2
azure-storage                  0.20.3
backcall                       0.2.0
baselines                      0.1.4            
cachetools                     4.1.1
certifi                        2020.6.20
cffi                           1.14.3
chardet                        3.0.4
click                          7.1.2
cliff                          3.5.0
cloudpickle                    1.6.0
cmaes                          0.7.0
cmd2                           1.4.0
colorama                       0.4.4
colorlog                       4.6.2
cycler                         0.10.0
Cython                         0.29.21
dataclasses                    0.6
decorator                      4.4.2
dill                           0.3.3
future                         0.18.2
gast                           0.3.3
glfw                           2.0.0
google-auth                    1.23.0
google-auth-oauthlib           0.4.2
google-pasta                   0.2.0
grpcio                         1.33.2
gym                            0.17.3
h5py                           2.10.0
idna                           2.10
imageio                        2.9.0
iniconfig                      1.1.1
ipdb                           0.13.4
ipython                        7.18.1
ipython-genutils               0.2.0
isort                          5.5.4
jedi                           0.17.2
joblib                         0.17.0
Keras-Preprocessing            1.1.2
kiwisolver                     1.2.0
lazy-object-proxy              1.4.3
llvmlite                       0.34.0
lockfile                       0.12.2
Mako                           1.1.3
Markdown                       3.3.3
MarkupSafe                     1.1.1
matplotlib                     3.3.2
mccabe                         0.6.1
more-itertools                 8.5.0
mpi4py                         3.0.3
numba                          0.51.2
numpy                          1.19.2
oauthlib                       3.1.0
opencv-python                  4.4.0.44
opt-einsum                     3.3.0
optuna                         2.3.0
packaging                      20.7
pandas                         1.1.2
parso                          0.7.1
pbr                            5.5.1
pexpect                        4.8.0
pickleshare                    0.7.5
Pillow                         7.2.0
pip                            20.2.2
pluggy                         0.13.1
prettytable                    0.7.2
progressbar2                   3.53.1
prompt-toolkit                 3.0.8
protobuf                       3.13.0
psutil                         5.7.3
ptyprocess                     0.6.0
py                             1.9.0
py-dateutil                    2.2
pyarrow                        2.0.0
pyasn1                         0.4.8
pyasn1-modules                 0.2.8
pybullet                       3.0.4
pybullet-robot-envs            0.0.1              
pycparser                      2.20
pyglet                         1.5.0
Pygments                       2.7.2
pylint                         2.6.0
pyparsing                      2.4.7
pyperclip                      1.8.1
pytest                         6.1.2
python-dateutil                2.8.1
python-editor                  1.0.4
python-utils                   2.4.0
pytz                           2020.1
PyYAML                         5.3.1
pyzmq                          20.0.0
requests                       2.25.0
requests-oauthlib              1.3.0
rsa                            4.6
scipy                          1.5.2
seaborn                        0.11.0
setuptools                     49.6.0.post20200925
six                            1.15.0
SQLAlchemy                     1.3.20
stable-baselines               2.10.1
stable-baselines3              0.10.0
stevedore                      3.3.0
tensorboard                    2.4.0
tensorboard-plugin-wit         1.7.0
tensorboardX                   2.1
tensorflow                     2.3.1
tensorflow-estimator           2.3.0
termcolor                      1.1.0
toml                           0.10.1
torch                          1.7.0
tqdm                           4.51.0
traitlets                      5.0.5
typing-extensions              3.7.4.3
urllib3                        1.26.2
wcwidth                        0.2.5
Werkzeug                       1.0.1
wheel                          0.35.1
wrapt                          1.12.1
zmq                            0.0.0

[feature request] Hyperparameter optimization improvements

I played around with the Optuna hyperparameter optimization a bit and some things came to my mind.
I would like to know if you are interested in having anything of this upstream. If so, I would create PRs for it.

1. Distributed optimization

Optuna gives the possibility to distribute the study on different machines by using an SQL database (link). I tried it out on my fork, and it seemed to work quiet well. This would add another optional parameter for the storage location, but does not require a lot of additional code.

2. Optimization direction

Optuna allows to minimize or maximize the return of the objective function. Currently the reward is returned like this:

cost = -1 * eval_callback.last_mean_reward

I found it very unintuitive first, that there were negative numbers as results of the trials (as my environment only returns positive rewards).
I would propose to set the optimization direction to 'maximize' and return the reward as it is. This requires only two lines of code change.

3. Save all hyperparameters

Currently only the hyperparameters that are choosen by Optuna are saved to the trial. It can happen that you want to run multiple optimizations with different fixed sets of parameters. In this case you have to keep track of them yourself.
It is possible to add further information to a study (or to a single trial) by using set_user_attr(key, value) (link).
I think it would be nice to save all fixed hyperparameters, so that it's easier to look up again afterwards. I didn't implement it yet, but I think that it should not require so much code.

By the way, thanks again for stable-baselines, it is really helpful and well documented :)

Releasing parts of this repo as a PyPi package for hyperparameter tuning [Enhancement]

Hi all,

First, I'd like to say thank you to all the contributors on this repo as well as on stable baselines. I have used both extensively in my own research and appreciate the effort that you all have made here.

I have been using this repo to do hyperparameter tuning on my own environments with stable baslines 3. I primarily just copy train.py and utils/ to my own directory and go from there. But since you all make continual improvements to this library, I was wondering if it would be possible to wrap up some of the files here as a PyPi package so that users that want to do hyperparameter tuning with this repo could easily install the appropriate files, use them and keep them up to date. I imagine that this would be very helpful for all regular users of stable baselines -- and perhaps this is something that would be better off integrated in the next stable baselines 3 release.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.