Comments (3)
Thanks for your questions. You're right that learning a policy in imagination always comes with bias from using the model rather than the actual environment. Reinforce is unbiased for the imagination MDP, not the actual environment. The return estimate (e.g. TD-lambda, Monte Carlo returns) is independent of the gradient estimator (e.g. Reinforce, straight-through). DreamerV2 uses TD-lambda returns and optimizes them via Reinforce. I'm sorry but I don't have time to discuss open research questions or hypotheses in detail here.
Regarding your second question, I'm not very confident that this is actually what's going on, but the idea is that the chain rule goes grad_params objective = grad_sample objective * grad_suffstats sample * grad_params suffstats
. For Gaussian latents, the grad_suffstats sample
term would be grad_{mean,stddev} (stddev * epsilon + mean)
. For categorical variables, we can't compute that term which is why reparameterization isn't possible. Straight-through gradients just ignore the term. Because the term would be either smaller or larger than 1, it would scale the gradient and potentially contribute vanishing or exploding gradients.
from dreamerv2.
From the paper:
Reinforce gradients and straight-through gradients, which backpropagate directly through the learned dynamics. Intuitively, the low-variance but biased dynamics backpropagation could learn faster initially and the unbiased but high-variance could to converge to a better solution.
Is the bias (with respect to the imagined mdp) a property of straight-through gradients? So would reparameterization such as in DreamerV1 also be unbiased (with respect to the imagined mdp)?
Best regards,
Tim
from dreamerv2.
Yes, that's right. But reparameterization only works for continuous latent variables and actions, not for categoricals.
from dreamerv2.
Related Issues (20)
- Should policy state be reset after every episode? HOT 1
- Straight-thru gradients vs Gumbel Softmax HOT 1
- Can't reproduce riverraid's results HOT 2
- replay data memory usage? HOT 1
- Why stop-grad on actor's input state in imagine() function ? HOT 1
- Questions on Imagination MDP and imagination horizon H = 15
- Questions about expl.py and updating the batch dataset HOT 2
- Why share states across random batches for training the world model? HOT 1
- ValueError: . Tensor must have rank 4. Received rank 3, shape (208, 64, 64) HOT 1
- Minimal evaluation/example using gym observation HOT 4
- Prediction returning the same action from different observations
- How does dreamerv2 perform on feature-based tasks? HOT 4
- Understanding re-clipping in Truncated Normal distribution HOT 1
- the Desire of Hyperparameters of Humanoid-Walk
- Reward different on evaluation HOT 1
- Performance difference between TruncNormal and TanhNormal
- Outdated dependencies and broken examples HOT 1
- How to reproduce DayDreamer's results in A1 simulator?
- Cannot reproduce Atari Pong scores
- Are the actions properly fed into the model?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dreamerv2.