The notebook is easy to get running, kudos for that. However the results do not match

rl_unplugged/rwrl_d4pg.ipynb does not reproduce about deepmind-research HOT 6 CLOSED

google-deepmind commented on July 28, 2024

rl_unplugged/rwrl_d4pg.ipynb does not reproduce

from deepmind-research.

Comments (6)

sergomezcol commented on July 28, 2024

Thanks Paul! I'm not sure I understand your question, but let me clarify a bit what we mean by training and evaluation loop and what the numbers you're seeing mean.

In the training loop, the agent reads data batches from the RWRL dataset and performs D4PG learning steps on them. Every step corresponds to a batch of data.

In the evaluation loop, the agent is kept fixed and is used to interact with the RWRL environment for a few episodes. We report the episode return to estimate agent's performance.

What you typically want to do in offline RL is to interleave training and evaluation, so you keep learning more and more from the data and evaluate periodically to estimate learning progress.

I hope that helps!

from deepmind-research.

pmineiro commented on July 28, 2024

Thanks Paul! I'm not sure I understand your question,

My claim is: what is checked in under https://github.com/deepmind/deepmind-research/blob/master/rl_unplugged/rwrl_d4pg.ipynb has cell output which is quite different than what is displayed when the notebook is downloaded and executed.

from deepmind-research.

sergomezcol commented on July 28, 2024

Oh, that is expected. The weights in the neural network are initialized randomly and the data is also randomly shuffled, so the loss values will be different every time you run this unless you fix the seed for the TF random number generator. Since weights are different, the actions during evaluation will also be different and episode returns will change too.

from deepmind-research.

pmineiro commented on July 28, 2024

Can you set and publish the seed(s) in the notebook?

I'm having trouble getting any episode return close to what is published in the notebook (was it a "lucky run"?).

from deepmind-research.

jerryli27 commented on July 28, 2024

Unfortunately releasing the key for this specific notebook result may require too much effort than its worth. As Sergio pointed out, It is totally possible to have a good or a bad run depending on the random seed and the notebook result can indeed be a lucky run. In our experiments we observed that D4PG runs tend to have high variance, making it less robust. The purpose of this colab is to be a starting point and to show D4PG as a baseline -- neither ~70 episode return nor ~136 episode return are high bars and should be easy to beat.

In the paper we average the result of three different runs, and I would suggest to follow similar protocols to avoid "lucky runs" affecting your experiment results.

from deepmind-research.

pmineiro commented on July 28, 2024

First, let me emphasize my appreciation for providing a benchmark to the community. The great thing about the notebook is that, once you reduce any policy (however obtained) to an acme.FeedForwardActor(), evaluation is straightforward. So I'm not actually blocked per se, and I have multiple tasks and difficulty levels to play with. However I'm cautious because I can't reproduce the baselines in the publication (indeed, I just have to read them off of a picture in the reference publication, there's no detail data that I could use to make a new plot with my results merged on it afaik).

Furthermore:

I would suggest to follow similar protocols to avoid "lucky runs" affecting your experiment results.

The point of a benchmark is reliable replication for the purpose of scientific comparison. So I would suggest that an even better benchmark would 1) actually reduce such proposed procedures to code and 2) provide a notebook coupled with the reference paper which produces the baseline results and proscribes the comparison procedure.

In any event, thanks again, I'll close the issue now.

from deepmind-research.

rl_unplugged/rwrl_d4pg.ipynb does not reproduce about deepmind-research HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs