deepmind / dqn_zoo Goto Github PK

DQN Zoo is a collection of reference implementations of reinforcement learning agents developed at DeepMind based on the Deep Q-Network (DQN) agent.

License: Apache License 2.0

Dockerfile 0.56% Python 94.11% Shell 0.70% Jupyter Notebook 4.63%

dqn_zoo's Introduction

DQN Zoo

DQN Zoo is a collection of reference implementations of reinforcement learning agents developed at DeepMind based on the Deep Q-Network (DQN) agent.

It aims to be research-friendly, self-contained and readable. Each agent is implemented using JAX, Haiku and RLax, and is a best-effort replication of the corresponding paper implementation. Each agent reproduces results on the standard set of 57 Atari games, on average.

Directory	Paper
`dqn`	Human Level Control Through Deep Reinforcement Learning
`double_q`	Deep Reinforcement Learning with Double Q-learning
`prioritized`	Prioritized Experience Replay
`c51`	A Distributional Perspective on Reinforcement Learning
`qrdqn`	Distributional Reinforcement Learning with Quantile Regression
`rainbow`	Rainbow: Combining Improvements in Deep Reinforcement Learning
`iqn`	Implicit Quantile Networks for Distributional Reinforcement Learning

Plot of median human-normalized score over all 57 Atari games for each agent:

Quick start

NOTE: Only Python 3.9 and above and Linux is supported.

Follow these steps to quickly clone the DQN Zoo repository, install all required dependencies and start running DQN. Prerequisites for these steps are a NVIDIA GPU with recent CUDA drivers.

Install Docker version 19.03 or later (for the --gpus flag).
Install NVIDIA Container Toolkit.
Enable sudoless docker.
Verify the previous steps were successful e.g. by running:
docker run --gpus all --rm nvidia/cuda:11.1.1-base nvidia-smi
Download the script run.sh. This automatically downloads the Atari ROMs from http://www.atarimania.com. The ROMs are available here for free but make sure the respective license covers your particular use case.

Running this script will:

1.  Clone the DQN Zoo repository.
1.  Build a Docker image with all necessary dependencies and run unit tests.
1.  Start a short run of DQN on Pong in a GPU-accelerated container.

NOTE: run.sh, Dockerfile and docker_requirements.txt together provide a self-contained example of the dependencies and commands needed to run an agent in DQN Zoo. Using Docker is not a requirement and if Dockerfile is not used then the list of dependencies to install may have to be adapted depending on your environment. Also it is not a hard requirement to run on the GPU. Agents can be run on the CPU by specifying the flag --jax_platform_name=cpu.

Goals

Serve as a collection of reference implementations of DQN-based agents developed at DeepMind.
Reproduce results reported in papers, on average.
Implement agents purely in Python, using JAX, Haiku and RLax.
Have minimal dependencies.
Be easy to read.
Be easy to modify and customize after forking.

Non-goals

Be a library or framework (these agents are intended to be forked for research).
Be flexible, general and support multiple use cases (at odds with understandability).
Support many environments (users can easily add new ones).
Include every DQN variant that exists.
Incorporate many cool libraries (harder to read, easy for the user to do this after forking, different users prefer different libraries, less self-contained).
Optimize speed and efficiency at the cost of readability or matching algorithmic details in the papers (no C++, keep to a single stream of experience).

Code structure

Each directory contains a published DQN variant configured to run on Atari.
agent.py in each agent directory contains an agent class that includes reset(), step(), get_state(), set_state() methods.
parts.py contains functions and classes used by many of the agents including classes for accumulating statistics and the main training and evaluation loop run_loop().
replay.py contains functions and classes relating to experience replay.
networks.py contains Haiku networks used by the agents.
processors.py contains components for standard Atari preprocessing.

Implementation notes

Generally we went with a flatter approach for easier code comprehension. Excessive nesting, indirection and generalization have been avoided, but not to the extreme of having a single file per agent. This has resulted in some degree of code duplication, but this is less of a maintenance issue as the code base is intended to be relatively static.

Some implementation details:

The main training and evaluation loop parts.run_loop() is implemented as a generator to decouple it from other concerns like logging statistics and checkpointing.
We adopted the pattern of returning a new JAX PRNG key from jitted functions. This allows for splitting keys inside jitted functions which is currently more efficient than splitting outside and passing a key in.
Agent functions to be jitted are defined inline in the agent class __init__() instead of as decorated class methods. This emphasizes such functions should be free of side-effects; class methods are generally not pure as they often alter the class instance.
parts.NullCheckpoint is a placeholder for users to optionally plug in a checkpointing library appropriate for the file system they are using. This would allow resuming an interrupted training run.
The preprocessing and action repeat logic lives inside each agent. Doing this instead of taking the common approach of environment wrappers allows the run loop to see the "true" timesteps. This makes things like recording performance statistics and videos easier since the unmodified rewards and observations are readily available. It also allows us to express all relevant flag values in terms of environment frames, instead of a more confusing mix of environment frames and learning steps.

Learning curves

Learning curve data is included in results.tar.gz. The archive contains a CSV file for each agent, with statistics logged during training runs. These training runs span the standard set of 57 Atari games, 5 seeds each, using default agent settings. Note Gym was used instead of Xitari.

These CSV files can be theoretically equivalently generated by the following pseudocode:

for agent in "${AGENTS[@]}"; do
  for game in "${ATARI_GAMES[@]}"; do
    for seed in {1..5}; do
      python -m "dqn_zoo.${agent}.run_atari" \
          --environment_name="${game}" \
          --seed="${seed}" \
          --results_csv_path="/tmp/dqn_zoo/${agent}/${game}/${seed}/results.csv"
    done
  done
done

Each agent CSV file in results.tar.gz is then a concatenation of all associated results.csv files, with additional environment_name and seed fields. Note the learning curve data is missing state_value since logging for this quantity was added after the data was generated.

Plots show the average score at periodic evaluation phases during training. Each episode during evaluation starts with up to 30 random no-op actions and lasts a maximum of 30 minutes. To make the plots more readable, scores have been smoothed using a moving average with window size 10.

Plot of average score on each individual Atari game for each agent:

FAQ

Q: Do these agents replicate results from their respective papers?

We aim to replicate the mean and median human normalized score over all 57 Atari games and to implement the algorithm described in each paper as closely as possible.

However there are potential sources of differences at the level of an individual game. These include:

Differences between Gym + Arcade Learning Environment (ALE) and Xitari.
Changes in underlying libraries such as the exact image resizing algorithm used in the observation preprocessing.
Atari ROM version.

Q: Is the execution of these agents deterministic?

We try to allow for it on CPU. However it is easily broken and note that convolutions on GPU are not deterministic. To allow for determinism we:

Build a new environment at the start of every iteration.
Include in the training state:
- Random number generator state.
- Target network parameters (in addition to online network parameters).
- Evaluation agent.

Q: Why is DQN-based agent X not included?

There was a bias towards implementing the variants the authors are most familiar with. Also one or more of the following reasons may apply:

Did not get round to implementing X.
Have yet to replicate the algorithmic details and learning performance of X.
It is easy to create X from components in DQN Zoo.

Q: Why not incorporate library / environment X?

X is probably very useful, but every additional library or feature is another thing new users need to read and understand. Also everyone differs in the auxiliary libraries they like to use. So the recommendation is to fork the agent you want and incorporate the features you wish in the copy. This also gives us the usual benefits of keeping dependencies to a minimum.

Q: Can I generalize X, then I can do Y with minimal modifications?

Code generalization often makes code harder to read. This is not intended to be a library in the sense that you import an agent and inject customized components to do research. Instead it is designed to be easy to customize after forking. So rather than be everything for everyone, we aimed to keep things minimal. Then users can fork and generalize in the directions they specifically care about.

Q: Why Gym instead of Xitari?

Most DeepMind papers with experiments on Atari published results on Xitari, a fork of the Arcade Learning Environment (ALE). The learning performance of agents in DQN Zoo were also verified on Xitari. However since Gym and the ALE are more widely used we have chosen to open source DQN Zoo using Gym. This does introduce another source of differences, though the settings for the Gym Atari environments have been chosen so they behave as similar as possible to Xitari.

Contributing

Note we are currently not accepting contributions. See CONTRIBUTING.md for details.

Citing DQN Zoo

If you use DQN Zoo in your research, please cite the papers corresponding to the agents used and this repository:

@software{dqnzoo2020github,
  title = {{DQN} {Zoo}: Reference implementations of {DQN}-based agents},
  author = {John Quan and Georg Ostrovski},
  url = {http://github.com/deepmind/dqn_zoo},
  version = {1.2.0},
  year = {2020},
}

dqn_zoo's People

Contributors

Stargazers

Watchers

dqn_zoo's Issues

DQN epsilon-greedy strategy

Hello,

I notice a small difference between the implementation and the original paper description about the epsilon-greedy in DQN. Maybe I have a wrong understanding.

In particular, the original paper claims that the epsilon-greedy starts after 50K frames rather than (stacked frames) and stops to be 0.1 after 1M frames. However, it seems that the current code implements to start after 200K frames and stops after 4M frames.

My observation is based on lines 130-136 of the following file:

https://github.com/deepmind/dqn_zoo/blob/master/dqn_zoo/dqn/run_atari.py

Could you help explain this difference?

Thanks
Ziniu

Eating too much host memory

When I run the your code, specifically Rainbow Agent on Atari 100k, it consumes too much memory. One running need about 50GB memory per hour running. The below picture shows the memory overview when running two scripts.

Do you have any idea why it need such a big memory? Thanks for any advice.

How did you pick hyperparameter for the final run?

Do you choose the hyperparameters based of the different seeds from the ones you use for evaluation? I just want to understand how the hyperparameters were chosen and after choosing the hyperparameter if the algorithms were rerun.

Atari Result Summary Figure

Hi,

First, thanks for sharing the results.tar.gz.

I am interested in the summary figure provided. However, I cannot exactly reproduce the curves with the given CSV files. My implementation is as follows: for each algorithm (e.g., DQN)，

Compute the mean of normalized return over 5 random seeds for each environment.
Plot the solid line with the median and the shaded region with maximal and minimal values over 57 environments using the results from Step 1.

I can basically reproduce the solid lines but the shaded region is not expected.

Your clarification would be very helpful!

Thanks
Ziniu

Instructions for Reproducing Results in results.tar.gz?

The README helpfully contains instructions to launch a single agent (DQN) on a single Atari game (Pong). If I wanted to reproduce the results in results.tar.gz with all agents across all environments, what would I do? Could these instructions be added to the README?

Value 'sm_80' is not defined for option 'gpu-name'

I'm running your code with the docker built by run.sh and DockerFile. The GPU I use is Tesla A100, which has compute capability sm_80.

When I run the training code, I have the following error.

I0123 06:16:37.701375 139677848946496 run_atari.py:97] Rainbow on Atari on gpu.
2022-01-23 06:16:37.706684: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:97] Unknown compute capability (8, 0) .Defaulting to telling LLVM that we're compiling for sm_75
2022-01-23 06:16:37.736490: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:419] ptxas returned an error during compilation of ptx to sass: 'Internal: ptxas exited with non-zero error code 65280, output: ptxas fatal   : Value 'sm_80' is not defined for option 'gpu-name'
'  If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.
Fatal Python error: Aborted

Current thread 0x00007f094891c740 (most recent call first):
  File "/usr/local/lib/python3.6/dist-packages/jax/interpreters/xla.py", line 268 in xla_primitive_callable
  File "/usr/local/lib/python3.6/dist-packages/jax/interpreters/xla.py", line 228 in apply_primitive
  File "/usr/local/lib/python3.6/dist-packages/jax/core.py", line 273 in bind
  File "/usr/local/lib/python3.6/dist-packages/jax/lax/lax.py", line 342 in shift_right_logical
  File "/usr/local/lib/python3.6/dist-packages/jax/random.py", line 87 in PRNGKey
  File "/global_fs/dqn_zoo/dqn_zoo/rainbow/run_atari.py", line 100 in main
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250 in _run_main
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299 in run
  File "/global_fs/dqn_zoo/dqn_zoo/rainbow/run_atari.py", line 280 in <module>
  File "/usr/lib/python3.6/runpy.py", line 85 in _run_code
  File "/usr/lib/python3.6/runpy.py", line 193 in _run_module_as_main
Aborted

I guess the problem is raised by the imcompatibility bwtween A100 sm_80 and CUDA10.1. But I am only familiar with pytorch and completely new to jax and tensorflow. Can you tell me which package version in Dockerfile and docker_requirements.txt should be changed if I want to run your code on A100?

Thanks!

CUDA operation failed: device kernel image is invalid when running with gpu

System: Ubuntu 18.04
GPU: Geforce GTX 960M
NVIDIA drivers: 450.80.02

I get the following error when trying to run the run.sh script with gpu enabled:

./run.sh 

Successfully built b7ab61042bfc
Successfully tagged dqn_zoo:latest
Run DQN on GPU in a container named dqn_zoo_dqn
I1130 15:33:05.441911 139654470924096 run_atari.py:80] DQN on Atari on gpu.
I1130 15:33:07.016694 139654470924096 run_atari.py:103] Environment: pong
I1130 15:33:07.017373 139654470924096 run_atari.py:104] Action spec: DiscreteArray(shape=(), dtype=int32, name=action, minimum=0, maximum=5, num_values=6)
I1130 15:33:07.018801 139654470924096 run_atari.py:105] Observation spec: (Array(shape=(210, 160, 3), dtype=dtype('uint8'), name='rgb'), Array(shape=(), dtype=dtype('int32'), name='lives'))
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/workspace/dqn_zoo/dqn/run_atari.py", line 255, in <module>
    app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/workspace/dqn_zoo/dqn/run_atari.py", line 172, in main
    train_rng_key, eval_rng_key = jax.random.split(rng_key)
  File "/usr/local/lib/python3.6/dist-packages/jax/random.py", line 267, in split
    return _split(key, num)
  File "/usr/local/lib/python3.6/dist-packages/jax/api.py", line 170, in f_jitted
    name=flat_fun.__name__, donated_invars=donated_invars)
  File "/usr/local/lib/python3.6/dist-packages/jax/core.py", line 1100, in call_bind
    outs = primitive.impl(fun, *args, **params)
  File "/usr/local/lib/python3.6/dist-packages/jax/interpreters/xla.py", line 544, in _xla_call_impl
    return compiled_fun(*args)
  File "/usr/local/lib/python3.6/dist-packages/jax/interpreters/xla.py", line 775, in _execute_compiled
    out_bufs = compiled.execute(input_bufs)
RuntimeError: CUDA operation failed: device kernel image is invalid
Removing /tmp/dqn_zoo_20201130_173250_8Zxwwh

Things I've tried:

confirmed that the preinstallation steps are working

docker run --gpus all --rm nvidia/cuda:10.1-base nvidia-smi

Mon Nov 30 15:37:27 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   65C    P0    N/A /  N/A |    690MiB /  4043MiB |     13%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

attempted running the run script and also the RAINBOW agent without the docker container (to do this, I installed CUDA 10.1 and CuDNN 7.0 additionally to the other requirements)
and getting the same error

python3 dqn_zoo/rainbow/run_atari.py

I1130 16:52:19.897150 140224573081408 run_atari.py:97] Rainbow on Atari on gpu.
I1130 16:52:20.733672 140224573081408 run_atari.py:120] Environment: pong
I1130 16:52:20.734061 140224573081408 run_atari.py:121] Action spec: DiscreteArray(shape=(), dtype=int32, name=action, minimum=0, maximum=5, num_values=6)
I1130 16:52:20.734654 140224573081408 run_atari.py:122] Observation spec: (Array(shape=(210, 160, 3), dtype=dtype('uint8'), name='rgb'), Array(shape=(), dtype=dtype('int32'), name='lives'))
Traceback (most recent call last):
  File "dqn_zoo/rainbow/run_atari.py", line 277, in <module>
    app.run(main)
  File "/home/user/.local/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/user/.local/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "dqn_zoo/rainbow/run_atari.py", line 193, in main
    train_rng_key, eval_rng_key = jax.random.split(rng_key)
  File "/home/user/.local/lib/python3.6/site-packages/jax/random.py", line 267, in split
    return _split(key, num)
  File "/home/user/.local/lib/python3.6/site-packages/jax/api.py", line 170, in f_jitted
    name=flat_fun.__name__, donated_invars=donated_invars)
  File "/home/user/.local/lib/python3.6/site-packages/jax/core.py", line 1100, in call_bind
    outs = primitive.impl(fun, *args, **params)
  File "/home/user/.local/lib/python3.6/site-packages/jax/interpreters/xla.py", line 544, in _xla_call_impl
    return compiled_fun(*args)
  File "/home/user/.local/lib/python3.6/site-packages/jax/interpreters/xla.py", line 775, in _execute_compiled
    out_bufs = compiled.execute(input_bufs)
RuntimeError: CUDA operation failed: device kernel image is invalid

confirmed my NVIDIA drivers and CUDA version compatibility
ran successfully a code sample from CUDA's toolkit (deviceQuery utility sample) to confirm that the problem is not from there

Do you have any ideas of why this is happening or what I could try? Running on CPU works without a problem.

Is the AtariEnvironmentWrapper not being used?

Hi,
Is this function being used anywhere?

Docker: `RUN apt-get install -y python3-pip=20.0.2-5ubuntu1.6` fails

Currently, the Docker build of the script fails, because of Version '20.0.2-5ubuntu1.6' for 'python3-pip' was not found.

uild image with tag 'dqn_zoo:latest' and run tests
[+] Building 710.5s (9/33)                                                                                                                   docker:default
 => [internal] load .dockerignore                                                                                                                      0.0s
 => => transferring context: 2B                                                                                                                        0.0s
 => [internal] load build definition from Dockerfile                                                                                                   0.0s
 => => transferring dockerfile: 2.02kB                                                                                                                 0.0s
 => [internal] load metadata for docker.io/nvidia/cuda:11.1.1-cudnn8-devel-ubuntu20.04                                                                 1.5s
 => [auth] nvidia/cuda:pull token for registry-1.docker.io                                                                                             0.0s
 => [internal] load build context                                                                                                                      0.0s
 => => transferring context: 332.48kB                                                                                                                  0.0s
 => [ 1/28] FROM docker.io/nvidia/cuda:11.1.1-cudnn8-devel-ubuntu20.04@sha256:5b751d1720c635534bac0ffd4d77ddf21d5f1f814d57b72ff361f2700c94f712       669.4s
 => => resolve docker.io/nvidia/cuda:11.1.1-cudnn8-devel-ubuntu20.04@sha256:5b751d1720c635534bac0ffd4d77ddf21d5f1f814d57b72ff361f2700c94f712           0.0s
 => => sha256:5b751d1720c635534bac0ffd4d77ddf21d5f1f814d57b72ff361f2700c94f712 2.84kB / 2.84kB                                                         0.0s
 => => sha256:d694acee51c41c5404c7d37836b557b19efca2d05479e00ddd011865931da736 10.79MB / 10.79MB                                                       2.9s
 => => sha256:56e0351b98767487b3c411034be95479ed1710bb6be860db6df0be3a98653027 27.51MB / 27.51MB                                                       6.3s
 => => sha256:eceae4e0b416410cea1ffb811ddebd89c1884faaad412af5ec1788dbd5a19b16 7.94MB / 7.94MB                                                         3.6s
 => => sha256:be08d39904693fc307667f0726e1fd6a0b395b7d6944e14108843e01d5a89399 17.16kB / 17.16kB                                                       0.0s
 => => sha256:dd9eef69d4c16798a3fda6b9b4f9b8c22711d1e23dc7703c718a54511a628842 186B / 186B                                                             3.1s
 => => sha256:d2351b6b3529e61817fe5ccf29f1f072a9c5777c26541ef431f0d35bf662fadc 6.88kB / 6.88kB                                                         3.3s
 => => sha256:148309786e019674f606169af93cc6b1decc502616d79603df421ca35c6a8441 1.51GB / 1.51GB                                                       626.4s
 => => sha256:30431a42f9c525d046c48fced782d57744ee19f5b2a000e66220562fef90d27a 61.24kB / 61.24kB                                                       3.8s
 => => sha256:52512d7fe6e9910eabefc828412521131a955ed3b7414ffd3a1a866ce78a93eb 1.68kB / 1.68kB                                                         4.0s
 => => sha256:c562fd0c9fc50dd01a438b46afb955cb128e0a4de45b3c23ef3a8e8394018247 1.52kB / 1.52kB                                                         4.2s
 => => sha256:15b589b8315d98dc66061c4a23144a0e86d0b0d50e177459f42b5fbe31aaa7a1 1.60GB / 1.60GB                                                       606.8s
 => => extracting sha256:56e0351b98767487b3c411034be95479ed1710bb6be860db6df0be3a98653027                                                              0.4s
 => => sha256:4b439651229987f5111ebca8fae3aa7c3b7c7d0925e997c9aad87fa0f1bc7c44 84.08kB / 84.08kB                                                       6.7s
 => => sha256:40b57d883c72754db14f6ffcc6aee524808fa30cb36f4cc64519796eb1cddb6c 1.57GB / 1.57GB                                                       297.3s
 => => extracting sha256:eceae4e0b416410cea1ffb811ddebd89c1884faaad412af5ec1788dbd5a19b16                                                              0.1s
 => => extracting sha256:d694acee51c41c5404c7d37836b557b19efca2d05479e00ddd011865931da736                                                              0.1s
 => => extracting sha256:dd9eef69d4c16798a3fda6b9b4f9b8c22711d1e23dc7703c718a54511a628842                                                              0.0s
 => => extracting sha256:d2351b6b3529e61817fe5ccf29f1f072a9c5777c26541ef431f0d35bf662fadc                                                              0.0s
 => => extracting sha256:148309786e019674f606169af93cc6b1decc502616d79603df421ca35c6a8441                                                              8.1s
 => => extracting sha256:30431a42f9c525d046c48fced782d57744ee19f5b2a000e66220562fef90d27a                                                              0.0s
 => => extracting sha256:52512d7fe6e9910eabefc828412521131a955ed3b7414ffd3a1a866ce78a93eb                                                              0.0s
 => => extracting sha256:c562fd0c9fc50dd01a438b46afb955cb128e0a4de45b3c23ef3a8e8394018247                                                              0.0s
 => => extracting sha256:15b589b8315d98dc66061c4a23144a0e86d0b0d50e177459f42b5fbe31aaa7a1                                                             10.0s
 => => extracting sha256:4b439651229987f5111ebca8fae3aa7c3b7c7d0925e997c9aad87fa0f1bc7c44                                                              0.0s
 => => extracting sha256:40b57d883c72754db14f6ffcc6aee524808fa30cb36f4cc64519796eb1cddb6c                                                             15.2s
 => [ 2/28] RUN apt-get update                                                                                                                        32.1s
 => [ 3/28] RUN apt-get install -y python3.9=3.9.5-3ubuntu0~20.04.1                                                                                    6.1s 
 => ERROR [ 4/28] RUN apt-get install -y python3-pip=20.0.2-5ubuntu1.6                                                                                 1.2s 
------                                                                                                                                                      
 > [ 4/28] RUN apt-get install -y python3-pip=20.0.2-5ubuntu1.6:                                                                                            
0.459 Reading package lists...                                                                                                                              
0.990 Building dependency tree...                                                                                                                           
1.102 Reading state information...                                                                                                                          
1.114 E: Version '20.0.2-5ubuntu1.6' for 'python3-pip' was not found
------
Dockerfile:13
--------------------
  11 |     
  12 |     # Install pip.
  13 | >>> RUN apt-get install -y python3-pip=20.0.2-5ubuntu1.6
  14 |     RUN python3.9 -m pip install --upgrade pip==22.1.2
  15 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c apt-get install -y python3-pip=20.0.2-5ubuntu1.6" did not complete successfully: exit code: 100

This "fixes" the error (at least for build the image, I did not test it, yet):

diff --git a/Dockerfile b/Dockerfile
index aff620f..deeb3b7 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -10,7 +10,7 @@ ARG DEBIAN_FRONTEND=noninteractive
 RUN apt-get install -y python3.9=3.9.5-3ubuntu0~20.04.1
 
 # Install pip.
-RUN apt-get install -y python3-pip=20.0.2-5ubuntu1.6
+RUN apt-get install -y python3-pip
 RUN python3.9 -m pip install --upgrade pip==22.1.2
 
 # Install wget and unrar to download and extract ROMs.

Expectile Regression Implementation?

The readme doesn't mention an implementation of expectile regression (Statistics and Samples in Distributional Reinforcement Learning - http://proceedings.mlr.press/v97/rowland19a). Is one in the works or will be soon?

Erroneous Flag in Suggested Code

The psuedocode


      python -m "dqn_zoo.${agent}.run_atari" \
          --environment_name="${game}" \
          --seed="${seed}" \
          --results="/tmp/dqn_zoo/${agent}/${game}/${seed}/results.csv"

uses a flag results which doesn't exist. I think results_csv_path was meant.

Specifying CSV Write Path Creates FileNotFoundError

When I specify the --results_csv_path="/tmp/dqn_zoo/c51/pong/1/results.csv", I receive an error:

[Errno 2] No such file or directory: '/tmp/dqn_zoo/c51/pong/1/results.csv'

My suspicion is that nothing creates the necessary directories /tmp/, /tmp/dqn_zoo/, etc.

Memory issues when running with Docker

Hello,

I tried to run QR QDN on Atari keeping your initial setup (1 million frame per iteration and 200 iterations). The memory usage increases linearly until the system kills the process.

I used the run.sh that launches a docker image.

Would you please help with that?

Thanks

Atari no frame skip, no sticky actions?

I'm a bit confused. Aren't these procedures usually incorporated in Atari? Looking at gym_atari briefly, it seems like they're not incorporated here, correct me if I'm wrong. Also, I don't see where the image frame is down-sampled. In other libraries, like rlpyt, they reduce the size of the image in half (in addition to the other steps).

Issues with the human scores in atari

Hi,

Thank you for this script which provides the random and human scores in a helpful format: https://github.com/google-deepmind/dqn_zoo/blob/master/dqn_zoo/atari_data.py

But, comparing the score of alien for example with that in the DQN nature paper's Extended Data Table 2 (https://www.nature.com/articles/nature14236) there seem to be discrepancies: the script says human score for alien is 7127.7 and the nature paper says 6875.

Have the scores been updated since the original paper? Could you please point me to where I can find these scores?

Thanks a lot :)

results.csv on disk is blank

When you run an agent-env-seed combination, either using the Docker image or straight Python, no results are written to disk in the results.csv output file. I let the code run overnight on my cluster and nothing was flushed to the CSV. This was the output to stdout:

I1007 15:59:09.217102 47498856853312 run_atari.py:85] C51 on Atari on gpu.
I1007 15:59:10.231613 47498856853312 run_atari.py:111] Environment: pong
I1007 15:59:10.231938 47498856853312 run_atari.py:112] Action spec: DiscreteArray(shape=(), dtype=int32, name=action, minimum=0, maximum=5, num_values=6)
I1007 15:59:10.232475 47498856853312 run_atari.py:113] Observation spec: (Array(shape=(210, 160, 3), dtype=dtype('uint8'), name='rgb'), Array(shape=(), dtype=dtype('int32'), name='lives'))
I1007 15:59:18.271111 47498856853312 run_atari.py:220] Training iteration 0.
I1007 15:59:18.274485 47498856853312 run_atari.py:226] Evaluation iteration 0.
I1007 16:04:18.767457 47498856853312 run_atari.py:251] iteration:   0, frame:     0, eval_episode_return: -21.00, train_episode_return:  nan, eval_num_episodes: 164, train_num_episodes:   0, eval_frame_rate: 1664, train_frame_rate:  nan, train_exploration_epsilon: 1.000, normalized_return: -0.008, capped_normalized_return: -0.008, human_gap: 1.008
I1007 16:04:19.123273 47498856853312 run_atari.py:220] Training iteration 1.
I1007 16:06:28.051372 47498856853312 agent.py:163] Begin learning
I1007 16:19:37.069391 47498856853312 run_atari.py:226] Evaluation iteration 1.
I1007 16:24:43.913471 47498856853312 run_atari.py:251] iteration:   1, frame: 1000000, eval_episode_return: -21.00, train_episode_return: -20.14, eval_num_episodes: 164, train_num_episodes: 266, eval_frame_rate: 1630, train_frame_rate: 1089, train_exploration_epsilon: 0.802, normalized_return: -0.008, capped_normalized_return: -0.008, human_gap: 1.008
I1007 16:24:44.240913 47498856853312 run_atari.py:220] Training iteration 2.
I1007 16:41:06.477482 47498856853312 run_atari.py:226] Evaluation iteration 2.
I1007 16:46:11.618932 47498856853312 run_atari.py:251] iteration:   2, frame: 2000000, eval_episode_return: -20.99, train_episode_return: -20.27, eval_num_episodes: 164, train_num_episodes: 274, eval_frame_rate: 1639, train_frame_rate: 1018, train_exploration_epsilon: 0.555, normalized_return: -0.008, capped_normalized_return: -0.008, human_gap: 1.008
I1007 16:46:11.946672 47498856853312 run_atari.py:220] Training iteration 3.
I1007 17:02:34.446135 47498856853312 run_atari.py:226] Evaluation iteration 3.
I1007 17:07:38.262575 47498856853312 run_atari.py:251] iteration:   3, frame: 3000000, eval_episode_return: -21.00, train_episode_return: -20.43, eval_num_episodes: 164, train_num_episodes: 282, eval_frame_rate: 1646, train_frame_rate: 1018, train_exploration_epsilon: 0.307, normalized_return: -0.008, capped_normalized_return: -0.008, human_gap: 1.008
I1007 17:07:38.589899 47498856853312 run_atari.py:220] Training iteration 4.
I1007 17:24:00.612326 47498856853312 run_atari.py:226] Evaluation iteration 4.
I1007 17:29:06.500515 47498856853312 run_atari.py:251] iteration:   4, frame: 4000000, eval_episode_return: -21.00, train_episode_return: -20.68, eval_num_episodes: 164, train_num_episodes: 296, eval_frame_rate: 1635, train_frame_rate: 1018, train_exploration_epsilon: 0.060, normalized_return: -0.008, capped_normalized_return: -0.008, human_gap: 1.008
I1007 17:29:06.830286 47498856853312 run_atari.py:220] Training iteration 5.
I1007 17:45:27.496435 47498856853312 run_atari.py:226] Evaluation iteration 5.
I1007 17:50:33.692603 47498856853312 run_atari.py:251] iteration:   5, frame: 5000000, eval_episode_return: -20.45, train_episode_return: -20.86, eval_num_episodes:  94, train_num_episodes: 307, eval_frame_rate: 1633, train_frame_rate: 1020, train_exploration_epsilon: 0.010, normalized_return: 0.007, capped_normalized_return: 0.007, human_gap: 0.993
I1007 17:50:34.021842 47498856853312 run_atari.py:220] Training iteration 6.
I1007 18:06:57.161453 47498856853312 run_atari.py:226] Evaluation iteration 6.

and so on. But the results.csv file was created and remains empty.

Hard coded git version no longer available

Running the Dockerfile as is (with the git version 1:2.17.1-1ubuntu0.7 hard-coded) led to an error saying the git version could not be found. Removing the version in my fork resolved this.

Make Docker Container Publicly Available?

I hate to ask this, but can you make the Docker container publicly available? I spent a few hours today trying to build it myself and I kept running into issues.

Running on CPU May Require GPU?

Continuing my comments from Issue 2, it appears that the requirements

jax==0.1.72
jaxlib @ https://storage.googleapis.com/jax-releases/cuda101/jaxlib-0.1.49-cp36-none-linux_x86_64.whl

means that a GPU is required to run the code even with the flag --jax_platform_name=cpu.

pip install -r requirements.txt 
Requirement 'jaxlib @ https://storage.googleapis.com/jax-releases/cuda101/jaxlib-0.1.49-cp36-none-linux_x86_64.whl' looks like a filename, but the file does not exist
Collecting asn1crypto==0.24.0 (from -r requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/ea/cd/35485615f45f30a510576f1a56d1e0a7ad7bd8ab5ed7cdc600ef7cd06222/asn1crypto-0.24.0-py2.py3-none-any.whl (101kB)
    100% |████████████████████████████████| 102kB 2.9MB/s 
Collecting cloudpickle==1.2.2 (from -r requirements.txt (line 3))
  Cache entry deserialization failed, entry ignored
  Downloading https://files.pythonhosted.org/packages/c1/49/334e279caa3231255725c8e860fa93e72083567625573421db8875846c14/cloudpickle-1.2.2-py2.py3-none-any.whl
Collecting cryptography==2.1.4 (from -r requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/4e/e0/4959b48f04c879414972048fe2bedc96825e39c5413ae241c230fba58783/cryptography-2.1.4-cp36-cp36m-manylinux1_x86_64.whl (2.2MB)
    100% |████████████████████████████████| 2.2MB 721kB/s 
Collecting future==0.18.2 (from -r requirements.txt (line 5))
  Cache entry deserialization failed, entry ignored
  Cache entry deserialization failed, entry ignored
  Downloading https://files.pythonhosted.org/packages/45/0b/38b06fd9b92dc2b68d58b75f900e97884c45bedd2ff83203d933cf5851c9/future-0.18.2.tar.gz (829kB)
    100% |████████████████████████████████| 829kB 1.6MB/s 
Collecting idna==2.6 (from -r requirements.txt (line 6))
  Cache entry deserialization failed, entry ignored
  Downloading https://files.pythonhosted.org/packages/27/cc/6dd9a3869f15c2edfab863b992838277279ce92663d334df9ecf5106f5c6/idna-2.6-py2.py3-none-any.whl (56kB)
    100% |████████████████████████████████| 61kB 5.9MB/s 
Collecting keyring==10.6.0 (from -r requirements.txt (line 7))
  Downloading https://files.pythonhosted.org/packages/b1/4a/89ab7aa2cf501a5e715c7bbb0df11af0c0b2b1d918cdfabca74984dd2c34/keyring-10.6.0-py2.py3-none-any.whl
Collecting keyrings.alt==3.0 (from -r requirements.txt (line 8))
  Downloading https://files.pythonhosted.org/packages/33/96/a2bc650d259ad510689656c50c06ae2de1299df59d369f21ec5ac0f20c8f/keyrings.alt-3.0-py2.py3-none-any.whl
Collecting opencv-python==4.2.0.34 (from -r requirements.txt (line 9))
  Cache entry deserialization failed, entry ignored
  Using cached https://files.pythonhosted.org/packages/72/c2/e9cf54ae5b1102020ef895866a67cb2e1aef72f16dd1fde5b5fb1495ad9c/opencv_python-4.2.0.34-cp36-cp36m-manylinux1_x86_64.whl
Collecting opt-einsum==3.2.1 (from -r requirements.txt (line 10))
  Cache entry deserialization failed, entry ignored
  Cache entry deserialization failed, entry ignored
  Downloading https://files.pythonhosted.org/packages/63/a5/e6c07b08b934831ccb8c98ee335e66b7761c5754ee3cabfe4c11d0b1af28/opt_einsum-3.2.1-py3-none-any.whl (63kB)
    100% |████████████████████████████████| 71kB 5.8MB/s 
Collecting pycrypto==2.6.1 (from -r requirements.txt (line 11))
  Downloading https://files.pythonhosted.org/packages/60/db/645aa9af249f059cc3a368b118de33889219e0362141e75d4eaf6f80f163/pycrypto-2.6.1.tar.gz (446kB)
    100% |████████████████████████████████| 450kB 2.5MB/s 
Collecting pyglet==1.3.2 (from -r requirements.txt (line 12))
  Cache entry deserialization failed, entry ignored
  Cache entry deserialization failed, entry ignored
  Downloading https://files.pythonhosted.org/packages/1c/fc/dad5eaaab68f0c21e2f906a94ddb98175662cc5a654eee404d59554ce0fa/pyglet-1.3.2-py2.py3-none-any.whl (1.0MB)
    100% |████████████████████████████████| 1.0MB 1.4MB/s 
Collecting pygobject==3.26.1 (from -r requirements.txt (line 13))
  Could not find a version that satisfies the requirement pygobject==3.26.1 (from -r requirements.txt (line 13)) (from versions: 3.27.0, 3.27.1, 3.27.2, 3.27.3, 3.27.4, 3.27.5, 3.28.0, 3.28.1, 3.28.2, 3.28.3, 3.29.1.dev0, 3.29.2.dev0, 3.29.3.dev0, 3.30.0, 3.30.1, 3.30.2, 3.30.3, 3.30.4, 3.30.5, 3.31.1.dev0, 3.31.2.dev0, 3.31.3.dev0, 3.31.4.dev0, 3.32.0, 3.32.1, 3.32.2, 3.33.1.dev0, 3.34.0, 3.36.0, 3.36.1, 3.38.0)
No matching distribution found for pygobject==3.26.1 (from -r requirements.txt (line 13))