GithubHelp home page GithubHelp logo

chenglongchen / pytorch-drl Goto Github PK

View Code? Open in Web Editor NEW
497.0 13.0 105.0 260 KB

PyTorch implementations of various Deep Reinforcement Learning (DRL) algorithms for both single agent and multi-agent.

License: MIT License

Python 100.00%
pytorch deep-reinforcement-learning multi-agent deep-q-network actor-critic advantage-actor-critic a2c proximal-policy-optimization ppo deep-deterministic-policy-gradient ddpg acktr rl drl madrl dqn reinforcement-learning

pytorch-drl's Introduction

pytorch-madrl

This project includes PyTorch implementations of various Deep Reinforcement Learning algorithms for both single agent and multi-agent.

  • A2C
  • ACKTR
  • DQN
  • DDPG
  • PPO

It is written in a modular way to allow for sharing code between different algorithms. In specific, each algorithm is represented as a learning agent with a unified interface including the following components:

  • interact: interact with the environment to collect experience. Taking one step forward and n steps forward are both supported (see _take_one_step_ and _take_n_steps, respectively)
  • train: train on a sample batch
  • exploration_action: choose an action based on state with random noise added for exploration in training
  • action: choose an action based on state for execution
  • value: evaluate value for a state-action pair
  • evaluation: evaluation the learned agent

Requirements

  • gym
  • python 3.6
  • pytorch

Usage

To train a model:

$ python run_a2c.py

Results

It's extremely difficult to reproduce results for Reinforcement Learning algorithms. Due to different settings, e.g., random seed and hyper parameters etc, you might get different results compared with the followings.

A2C

CartPole-v0

ACKTR

CartPole-v0

DDPG

Pendulum-v0

DQN

CartPole-v0

PPO

CartPole-v0

TODO

  • TRPO
  • LOLA
  • Parameter noise

Acknowledgments

This project gets inspirations from the following projects:

License

MIT

pytorch-drl's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-drl's Issues

The Actor Critic Structure in MAA2C

A little confused about your implementation of MAA2C. I don't think the input of the actor network is simply the ``joint state" of the agents. According to [1] the critic's input should be state of the environment (where agents' joint state is not necessarily defined) + the joint action of the agents, i.e., the critic here should be a Q-function for joint actions. And for the actor it should be something like a policy, where I am not quite understand why the actor network is implemented in this way. Appreciate if explained.

License?

Hi,

What is the license?

Hugh

About the computation of Advantage and State Value in PPO

In your implementation of Critic, you feed the network of the observation and action and output 1-dim value. Can I make the inference that It is Q(s,a) ?
But the advantage you given is
values = self.critic_target(states_var, actions_var).detach() advantages = rewards_var - values
It is the estimation of q_t minus Q(s_t,a)
I think it should be Advantage = q_t - V(s_t)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.