GithubHelp home page GithubHelp logo

Comments (6)

sergomezcol avatar sergomezcol commented on July 28, 2024

Thanks Paul! I'm not sure I understand your question, but let me clarify a bit what we mean by training and evaluation loop and what the numbers you're seeing mean.

In the training loop, the agent reads data batches from the RWRL dataset and performs D4PG learning steps on them. Every step corresponds to a batch of data.

In the evaluation loop, the agent is kept fixed and is used to interact with the RWRL environment for a few episodes. We report the episode return to estimate agent's performance.

What you typically want to do in offline RL is to interleave training and evaluation, so you keep learning more and more from the data and evaluate periodically to estimate learning progress.

I hope that helps!

from deepmind-research.

pmineiro avatar pmineiro commented on July 28, 2024

Thanks Paul! I'm not sure I understand your question,

My claim is: what is checked in under https://github.com/deepmind/deepmind-research/blob/master/rl_unplugged/rwrl_d4pg.ipynb has cell output which is quite different than what is displayed when the notebook is downloaded and executed.

from deepmind-research.

sergomezcol avatar sergomezcol commented on July 28, 2024

Oh, that is expected. The weights in the neural network are initialized randomly and the data is also randomly shuffled, so the loss values will be different every time you run this unless you fix the seed for the TF random number generator. Since weights are different, the actions during evaluation will also be different and episode returns will change too.

from deepmind-research.

pmineiro avatar pmineiro commented on July 28, 2024

Can you set and publish the seed(s) in the notebook?

I'm having trouble getting any episode return close to what is published in the notebook (was it a "lucky run"?).

from deepmind-research.

jerryli27 avatar jerryli27 commented on July 28, 2024

Unfortunately releasing the key for this specific notebook result may require too much effort than its worth. As Sergio pointed out, It is totally possible to have a good or a bad run depending on the random seed and the notebook result can indeed be a lucky run. In our experiments we observed that D4PG runs tend to have high variance, making it less robust. The purpose of this colab is to be a starting point and to show D4PG as a baseline -- neither ~70 episode return nor ~136 episode return are high bars and should be easy to beat.

In the paper we average the result of three different runs, and I would suggest to follow similar protocols to avoid "lucky runs" affecting your experiment results.

from deepmind-research.

pmineiro avatar pmineiro commented on July 28, 2024

First, let me emphasize my appreciation for providing a benchmark to the community. The great thing about the notebook is that, once you reduce any policy (however obtained) to an acme.FeedForwardActor(), evaluation is straightforward. So I'm not actually blocked per se, and I have multiple tasks and difficulty levels to play with. However I'm cautious because I can't reproduce the baselines in the publication (indeed, I just have to read them off of a picture in the reference publication, there's no detail data that I could use to make a new plot with my results merged on it afaik).

Furthermore:

I would suggest to follow similar protocols to avoid "lucky runs" affecting your experiment results.

The point of a benchmark is reliable replication for the purpose of scientific comparison. So I would suggest that an even better benchmark would 1) actually reduce such proposed procedures to code and 2) provide a notebook coupled with the reference paper which produces the baseline results and proscribes the comparison procedure.

In any event, thanks again, I'll close the issue now.

from deepmind-research.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.