GithubHelp home page GithubHelp logo

coac / commnet-bicnet Goto Github PK

View Code? Open in Web Editor NEW
53.0 4.0 18.0 68 KB

CommNet and BiCnet implementation in tensorflow

Python 100.00%
multi-agent-reinforcement-learning reinforcement-learning tensorflow

commnet-bicnet's Introduction

CommNet-BiCnet

CommNet and BiCnet implementation in tensorflow

Training

Train CommNet using DDPG algorithm

python train_comm_net.py

Hypersearch

To find the optimal hyperparameters such as actor_lr or critic_lr, a simple grid search has been implemented. It launches multiple instances of the trainer in parallel based on the number of CPU cores.

python hypersearch.py

Guessing sum environment

It is a simple game described in the BiCnet paper for testing if the communication works. The environment implements the crucial methods of the core gym interface from OpenAI

Each agent receives a scalar sampled between [โˆ’10, 10] under a truncated Gaussian. Each agent needs to output the sum of all inputs received among the agents. An agent gets a normalized reward between [0, 1] based on the absolute difference between the sum and its output.

Results

Training CommNet in the Guessing sum env with 2 agents

2_agents_commnet_training_reward

commnet-bicnet's People

Contributors

coac avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

commnet-bicnet's Issues

variable sharability among critic and actor

Thanks for reply, I have been busy at another project last few days, recently I get spare time.
I have noticed that at comm_net, the variables of communication part(maybe along with encoder part) are not shared between critic and actor,
I don't know whether it should be like these way in regular algorithms trained by DDPG like comm_net?

Actor loss function

Hi Coac, i really like your BicNet implementation! My goal is to run your BicNet implementation on an environment where every agent gets -1 reward for each time step it needs to finish the env. But there is a problem with your actor loss implementation, because the loss of the actor is defined as the prediction of the critic, the rewards needs to converges to zero if the agents performs perfect, isn't it?

loss_actor = -self.critic(state_batches, clear_action_batches).mean()

Can you explain to me why you implemented it this way? Also, is there a possibility that the reward doesn't converges to 0 when the Agents performs good (linke in the environment i mentioned above)?

In the file named 'comm_net.py'

At row 104, the code is ''h = tf.slice(H, [j, 0], [1, HIDDEN_VECTOR_LEN]) ''
shouldn't it be h = tf.slice(H, [0,j,0],[-1, 1, HIDDEN_VECTOR_LEN]) ?

Commnet on waterworld

I'd like to use commnet at waterworld because it seems like a good decision for agents to communicate and get a reward, but why didn't it converge in the end and not as good as maddpg

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.