Comments (6)
Thanks Paul! I'm not sure I understand your question, but let me clarify a bit what we mean by training and evaluation loop and what the numbers you're seeing mean.
In the training loop, the agent reads data batches from the RWRL dataset and performs D4PG learning steps on them. Every step corresponds to a batch of data.
In the evaluation loop, the agent is kept fixed and is used to interact with the RWRL environment for a few episodes. We report the episode return to estimate agent's performance.
What you typically want to do in offline RL is to interleave training and evaluation, so you keep learning more and more from the data and evaluate periodically to estimate learning progress.
I hope that helps!
from deepmind-research.
Thanks Paul! I'm not sure I understand your question,
My claim is: what is checked in under https://github.com/deepmind/deepmind-research/blob/master/rl_unplugged/rwrl_d4pg.ipynb has cell output which is quite different than what is displayed when the notebook is downloaded and executed.
from deepmind-research.
Oh, that is expected. The weights in the neural network are initialized randomly and the data is also randomly shuffled, so the loss values will be different every time you run this unless you fix the seed for the TF random number generator. Since weights are different, the actions during evaluation will also be different and episode returns will change too.
from deepmind-research.
Can you set and publish the seed(s) in the notebook?
I'm having trouble getting any episode return close to what is published in the notebook (was it a "lucky run"?).
from deepmind-research.
Unfortunately releasing the key for this specific notebook result may require too much effort than its worth. As Sergio pointed out, It is totally possible to have a good or a bad run depending on the random seed and the notebook result can indeed be a lucky run. In our experiments we observed that D4PG runs tend to have high variance, making it less robust. The purpose of this colab is to be a starting point and to show D4PG as a baseline -- neither ~70 episode return nor ~136 episode return are high bars and should be easy to beat.
In the paper we average the result of three different runs, and I would suggest to follow similar protocols to avoid "lucky runs" affecting your experiment results.
from deepmind-research.
First, let me emphasize my appreciation for providing a benchmark to the community. The great thing about the notebook is that, once you reduce any policy (however obtained) to an acme.FeedForwardActor(), evaluation is straightforward. So I'm not actually blocked per se, and I have multiple tasks and difficulty levels to play with. However I'm cautious because I can't reproduce the baselines in the publication (indeed, I just have to read them off of a picture in the reference publication, there's no detail data that I could use to make a new plot with my results merged on it afaik).
Furthermore:
I would suggest to follow similar protocols to avoid "lucky runs" affecting your experiment results.
The point of a benchmark is reliable replication for the purpose of scientific comparison. So I would suggest that an even better benchmark would 1) actually reduce such proposed procedures to code and 2) provide a notebook coupled with the reference paper which produces the baseline results and proscribes the comparison procedure.
In any event, thanks again, I'll close the issue now.
from deepmind-research.
Related Issues (20)
- Question regarding training speed
- Nowcasting – Question regarding the Dataset
- Geometry optimisation with DM21 in PySCF?
- RL Unplugged - DM Lab colab broken
- [MeshGraphNets] cuda_blas.cc:428, failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED HOT 4
- [RL Unplugged] - Trained policies for finger_turn_hard do not match the datasets
- linear evaluation
- Code and data for A data-driven approach to learning to control computers HOT 1
- SOSCF with dm21 in pySCF HOT 4
- Input file for the charge delocalization
- Invalid open-source Kinetics dataset url.
- stochdepth_rate in NFNets HOT 1
- 'curl: (77) error setting certificate verify locations' error message when trying to download Basenji2 training data HOT 2
- Invalid download for wikigraph HOT 1
- /dev/shm/tmpk7uinr_c FileNotFoundError
- enformer SEQUENCE_LENGTH
- Request for dataset access for paper replication HOT 2
- About remesher of MeshGraphNets ?
- metadata in learning_to_simulate HOT 3
- MeshGraphNets sphere_dynamic HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepmind-research.