Comments (10)
The code averages rewards over timesteps (25 steps in multi_speaker_listener), and the paper does not. So you need to multiply the rewards in the code by the number of timesteps to get the results in the paper (i.e. 6 * 25 = 125).
If your runs are reaching that level of rewards (around 6), then the speakers should be consistently reaching their targets. How are you checking this?
from maac.
Thanks for answering. Yes, I visualize the rendered image after training. Does the PyTorch/OpenAI baselines/OpenAI Gym version influence the performance?
from maac.
The fact that your runs are achieving that level of rewards indicates that they are training properly. Without more information I can't be sure what's wrong. Are you loading the parameters of the trained model before visualization? Can you share some examples of what the rendered images look like? Also, it would be useful to see the code you're using to visualize the policies.
from maac.
The vis code is similar to https://github.com/shariqiqbal2810/maddpg-pytorch/blob/master/evaluate.py. The generated results are.
from maac.
You should check whether the rollouts in evaluate.py
are leading to the same amount of rewards that you see at the end of training. It's pretty clear that that is not the case here, which indicates there is a problem with how you are loading the parameters or something else along those lines.
from maac.
test.zip
This is the code I used to visualize.
from maac.
Sorry, I don't see anything that stands out as problematic in that code. Since you are getting good results during training, I would recommend trying to match the code within the training procedure as closely as possible and figuring out where the difference is.
from maac.
I see, so this is different from your testing results, right? Maybe there some bugs in my code.
from maac.
Yes, I was able to visualize successful trials where the listeners reach their targets, so I'm not exactly sure what's going wrong here. Good luck! I will close this issue for now, but feel free to comment if you have any other questions.
from maac.
Thanks a lot for your help!
from maac.
Related Issues (20)
- Problem of optimizing policy HOT 4
- Seeding fails to produce deterministic results HOT 9
- About SAC implementation HOT 1
- How to implement MADDPG+SAC and COMA+SAC HOT 2
- About query, key and value input embedding HOT 1
- How does the gradient back-propagate from Q to the action $a_i$? HOT 2
- When I run "python main.py fullobs_collect_treasure V1" I meet error "ImportError: cannot import name 'Wall'"
- Critic encoders as shared modules ? HOT 3
- Bias on value extractors ?
- Memory usage increases a lot when use the latest version of OpenAI baselines
- Memory Leak HOT 1
- How to solve env_id? HOT 2
- Where is the code to load the model?
- Critic function learning
- Why does your implementation of MADDPG not work in your fork of MPE?
- The function names of "update_policies" and "update_critic" are reversed
- How to visualize during training
- issue thanks!
- Is this code applicable to continuous actions?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from maac.