Comments (5)
谢谢提醒,应该是
n_return[:, -1] = r[:, -1] * mask[:, -1]
for transition_idx in range(max_episode_len - 2, -1, -1):
n_return[:, transition_idx] = (r[:, transition_idx] + self.args.gamma * n_return[:, transition_idx + 1] * terminated[:, transition_idx]) * mask[:, transition_idx]
下次我会更新这里的错误。
另外对于terminated 和 padding,如你所说确实只有一步之差,因为最后一步需要特殊处理。你可以只用terminated 来实现。我是为了避免每次训练时都根据terminated来计算padding,所以直接将padding保存了下来,毕竟episode长度不一样,不能直接通过矩阵运算得到。
from marl-algorithms.
多谢回复。
不过想了想,我感觉后面这个循环里面其实也可以不用*terminated[:, transition_idx]吧?
n_return[:, -1] = r[:, -1] * mask[:, -1]
for transition_idx in range(max_episode_len - 2, -1, -1):
n_return[:, transition_idx] = (r[:, transition_idx] + self.args.gamma * n_return[:, transition_idx + 1]) * mask[:, transition_idx]
这就够了, 因为mask的原因,在边界(terminated)后面的n_return[:, transition_idx + 1])已经是0了。所以真实的最后一步(没padding的最后一步)的值仅仅会来自于 r[:, transition_idx]
当然*terminated[:, transition_idx]也没错,
这么理解对吗?
from marl-algorithms.
可以不用mask,但是terminated要用。当transition_idx是episode最后一步时,terminated=0, mask=1,这个时候要把下一步的return消除,只能用terminated。mask只是一个双保险,其实terminated就足够了。
from marl-algorithms.
当transition_idx是episode最后一步时,terminated=0, mask=1.
确实是。
不过这里有点tricky.
比如说max length = 5, 然后第3步结束,
这样. terminated: 00111
padded: 00011
mask= 1-padded: 11100
反转terminated: 11000
然后最后一步terminated=0, mask=1。 我的理解是这里虽然mask 为1. 下一步的reward n_return[:, transition_idx + 1] 由于是填充的 (填充的r是0, 而且下一步的mask是0, 所以循环肯定会先就把最后一步下一步的reward已经搞成0了)。
就是说最后一步的下一步的return已经是0了,所以是否有消除这个动作没关系?
from marl-algorithms.
可以这么理解。
from marl-algorithms.
Related Issues (20)
- 关于参数reuse_network HOT 3
- 关于COMA critic网络输入 HOT 3
- 关于g2anet中hard_weights的问题 HOT 1
- 可以使用其他的环境跑这里面的算法吗? HOT 1
- 自定义的环境能使用这里面的算法跑吗? HOT 1
- custom data traing HOT 1
- 策略函数中的eval_hidden和target_hidden如何理解 HOT 2
- None
- 关于qtran_base.py中_get_individual_q的一个小问题 HOT 2
- 关于qtran的问题 HOT 1
- Translate code comments to English
- Quick Start 会报错,请问是什么问题。 HOT 2
- 关于GA-Common和GA-AC的问题 HOT 1
- 关于evaluate的胜率
- 关于attention的训练依据的问题 HOT 1
- 关于在别的环境应用qmix出现episodes rewards dropout的问题 HOT 2
- 关于训练得到的模型的问题 HOT 1
- 关于get_action_weights的问题 HOT 1
- 关于涉及环境参数的一些疑问 HOT 1
- 关于QMIX的Trick:Eligibility traces HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from marl-algorithms.