why the average rewards reported in the paper is much higher than the code. it's ~6 af

The vis code is similar to <a href="https://github.com/shariqiqbal2810/maddpg-pytorch/

question about reward about maac HOT 10 CLOSED

shariqiqbal2810 commented on June 29, 2024

question about reward

from maac.

Comments (10)

shariqiqbal2810 commented on June 29, 2024

The code averages rewards over timesteps (25 steps in multi_speaker_listener), and the paper does not. So you need to multiply the rewards in the code by the number of timesteps to get the results in the paper (i.e. 6 * 25 = 125).

If your runs are reaching that level of rewards (around 6), then the speakers should be consistently reaching their targets. How are you checking this?

from maac.

ShuangLI59 commented on June 29, 2024

Thanks for answering. Yes, I visualize the rendered image after training. Does the PyTorch/OpenAI baselines/OpenAI Gym version influence the performance?

from maac.

shariqiqbal2810 commented on June 29, 2024

The fact that your runs are achieving that level of rewards indicates that they are training properly. Without more information I can't be sure what's wrong. Are you loading the parameters of the trained model before visualization? Can you share some examples of what the rendered images look like? Also, it would be useful to see the code you're using to visualize the policies.

from maac.

ShuangLI59 commented on June 29, 2024

The vis code is similar to https://github.com/shariqiqbal2810/maddpg-pytorch/blob/master/evaluate.py. The generated results are.

from maac.

shariqiqbal2810 commented on June 29, 2024

You should check whether the rollouts in evaluate.py are leading to the same amount of rewards that you see at the end of training. It's pretty clear that that is not the case here, which indicates there is a problem with how you are loading the parameters or something else along those lines.

from maac.

ShuangLI59 commented on June 29, 2024

test.zip
This is the code I used to visualize.

from maac.

shariqiqbal2810 commented on June 29, 2024

Sorry, I don't see anything that stands out as problematic in that code. Since you are getting good results during training, I would recommend trying to match the code within the training procedure as closely as possible and figuring out where the difference is.

from maac.

ShuangLI59 commented on June 29, 2024

I see, so this is different from your testing results, right? Maybe there some bugs in my code.

from maac.

shariqiqbal2810 commented on June 29, 2024

Yes, I was able to visualize successful trials where the listeners reach their targets, so I'm not exactly sure what's going wrong here. Good luck! I will close this issue for now, but feel free to comment if you have any other questions.

from maac.

ShuangLI59 commented on June 29, 2024

Thanks a lot for your help!

from maac.

question about reward about maac HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs