GithubHelp home page GithubHelp logo

dongminlee94 / deep_rl Goto Github PK

View Code? Open in Web Editor NEW
485.0 13.0 66.0 30.95 MB

PyTorch implementation of deep reinforcement learning algorithms

License: MIT License

Python 100.00%
deep-reinforcement-learning model-free-rl pytorch dqn ddqn a2c vpg npg trpo ppo

deep_rl's Introduction

Hi ๐Ÿ‘‹, I'm Dongmin Lee

dongmin-lee

ย dongminlee94

Facebook Badge Gmail Badge

deep_rl's People

Contributors

dongminlee94 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep_rl's Issues

pybullet support?

A very helpful repo for newcomer in RL .However any future plan for pybullet support?

(TRPO) Trivial KL divergence computation in hessian_vector_product.

Thanks for your great implementation of TRPO.

This is helpful for those (including me) who want to implement TRPO by themselves because there are a few implementations.

I just found a small mistake in your implementation, specifically in the hessian_vector_product method in TRPO agent (please, see below).
I think the KL divergence computation is incorrect because it is computing the KL divergence between two same distributions.
When I printed out the variable kl, it was always zero.

As far as I understand, it is true that the exact Fisher information matrix (FIM) should be computed by using only the current parameters.
However, the trick in TRPO is to approximate the FIM by computing the Hessian of the KL divergence between the old and current policy. This is reasonable when the old and current parameters are close enough.
This can be found in Section 6 Practical Algorithm in the TRPO paper.

Can you take a look at this issue?

Best regards,
Dongjin Lee

def hessian_vector_product(self, obs, p, damping_coeff=0.1):
      p.detach()
      kl = self.gaussian_kl(old_policy=self.policy, new_policy=self.policy, obs=obs)
      kl_grad = torch.autograd.grad(kl, self.policy.parameters(), create_graph=True)
      kl_grad = self.flat_grad(kl_grad)

      kl_grad_p = (kl_grad * p).sum() 
      kl_hessian = torch.autograd.grad(kl_grad_p, self.policy.parameters())
      kl_hessian = self.flat_grad(kl_hessian, hessian=True)
      return kl_hessian + p * damping_coeff

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.