dongminlee94 / deep_rl Goto Github PK
View Code? Open in Web Editor NEWPyTorch implementation of deep reinforcement learning algorithms
License: MIT License
PyTorch implementation of deep reinforcement learning algorithms
License: MIT License
Hi @dongminlee94 , I am unable to find the notion of different actors in the A2C and PPO implementation from a brief skim. It would be great if you could point me there. Thanks!
A very helpful repo for newcomer in RL .However any future plan for pybullet support?
Thanks for your great implementation of TRPO.
This is helpful for those (including me) who want to implement TRPO by themselves because there are a few implementations.
I just found a small mistake in your implementation, specifically in the hessian_vector_product
method in TRPO agent (please, see below).
I think the KL divergence computation is incorrect because it is computing the KL divergence between two same distributions.
When I printed out the variable kl
, it was always zero.
As far as I understand, it is true that the exact Fisher information matrix (FIM) should be computed by using only the current parameters.
However, the trick in TRPO is to approximate the FIM by computing the Hessian of the KL divergence between the old and current policy. This is reasonable when the old and current parameters are close enough.
This can be found in Section 6 Practical Algorithm in the TRPO paper.
Can you take a look at this issue?
Best regards,
Dongjin Lee
def hessian_vector_product(self, obs, p, damping_coeff=0.1):
p.detach()
kl = self.gaussian_kl(old_policy=self.policy, new_policy=self.policy, obs=obs)
kl_grad = torch.autograd.grad(kl, self.policy.parameters(), create_graph=True)
kl_grad = self.flat_grad(kl_grad)
kl_grad_p = (kl_grad * p).sum()
kl_hessian = torch.autograd.grad(kl_grad_p, self.policy.parameters())
kl_hessian = self.flat_grad(kl_hessian, hessian=True)
return kl_hessian + p * damping_coeff
Hey @dongminlee94 !
Thanks for an amazing library. Could you also upload TAC, ATAC, NPG Models as well? Will be really helpful.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.