coac / commnet-bicnet Goto Github PK

View Code? Open in Web Editor NEW

54.0 54.0 18.0 68 KB

CommNet and BiCnet implementation in tensorflow

Python 100.00%

multi-agent-reinforcement-learning reinforcement-learning tensorflow

commnet-bicnet's People

Contributors

Stargazers

Watchers

Forkers

r-lipton bluecontra chenlheng rhoowd ysdtl yinjiangjin imxiaoxuesheng douglasrizzo zeroun yoko-xia wanliee timefly-1989 yaxuniu wangyy161 dameng123 gavin-tao

commnet-bicnet's Issues

Commnet on waterworld

I'd like to use commnet at waterworld because it seems like a good decision for agents to communicate and get a reward, but why didn't it converge in the end and not as good as maddpg

variable sharability among critic and actor

Thanks for reply, I have been busy at another project last few days, recently I get spare time.
I have noticed that at comm_net, the variables of communication part(maybe along with encoder part) are not shared between critic and actor,
I don't know whether it should be like these way in regular algorithms trained by DDPG like comm_net?

In the file named 'comm_net.py'

At row 104, the code is ''h = tf.slice(H, [j, 0], [1, HIDDEN_VECTOR_LEN]) ''
shouldn't it be h = tf.slice(H, [0,j,0],[-1, 1, HIDDEN_VECTOR_LEN]) ?

Actor loss function

Hi Coac, i really like your BicNet implementation! My goal is to run your BicNet implementation on an environment where every agent gets -1 reward for each time step it needs to finish the env. But there is a problem with your actor loss implementation, because the loss of the actor is defined as the prediction of the critic, the rewards needs to converges to zero if the agents performs perfect, isn't it?

loss_actor = -self.critic(state_batches, clear_action_batches).mean()

Can you explain to me why you implemented it this way? Also, is there a possibility that the reward doesn't converges to 0 when the Agents performs good (linke in the environment i mentioned above)?

coac / commnet-bicnet Goto Github PK

commnet-bicnet's People

Contributors

Stargazers

Watchers

Forkers

commnet-bicnet's Issues

Commnet on waterworld

variable sharability among critic and actor

In the file named 'comm_net.py'

Actor loss function

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs