GithubHelp home page GithubHelp logo

Comments (12)

Miffyli avatar Miffyli commented on May 22, 2024 1

This is a wild-guess, but since you mentioned grad_kl: Seems like this code uses single-sample estimate of KL (and then averages over), which is known to sometimes return negative values for KL (see lengthy discussion and update on SB3 here). This is simply based on the "something is negative but shouldn't be" part :D .

Only other tip I can give is looking at other implementations of TRPO and see what they did, e.g. spinning up (alas, they too only have TF1 version of TRPO).

from stable-baselines3-contrib.

cyprienc avatar cyprienc commented on May 22, 2024 1

Hi,

I've added an assert slightly earlier; inside the conjugate gradient algorithm.
But it points in the same direction, which is that the matrix defined in Hpv is not positive-definite.
image

@Miffyli Thanks for the approximation trick - Neat one. I'll have a look at it (and its gradients :) ). Other implementations usually use a distribution object (custom or from one of the major framework) which computes the KL directly. I also wanted to do that but wasn't sure where I could find a distribution object for the policy passed - but let me have a better look at it.

Thanks,

Cyprien

from stable-baselines3-contrib.

cyprienc avatar cyprienc commented on May 22, 2024 1

probably a better idea would be to create a new method get_distribution()

Done

a deepcopy should probably solve that issue, no?

Used a shallow copy; but I am wondering whether it makes more sense to avoid any kind of copy and do the necessary refactoring work to avoid the side-effect. Probably something for the future.


Using the pytorch distribution did the trick. I also refined a few things to avoid numerical instabilities stemming from the CG method.

How would you like to proceed @araffin ?

from stable-baselines3-contrib.

araffin avatar araffin commented on May 22, 2024 1

as mentioned in contrib contributing guide, next step is to match published results, i would start with pybullet envs (i had some results in SB2 zoo)

Regarding the benchmark, once you have created a fork of the rl zoo (cf. guide), I could help you to run it on a larger scale (I have access to a cluster).

from stable-baselines3-contrib.

cyprienc avatar cyprienc commented on May 22, 2024

@araffin What about ActorCriticPolicy.evaluate_actions returning the Distribution object directly instead of the entropy as the last output? This would allow access to the Distribution.distribution object inside the training loop and compute the analytical KL divergence instead of the sample estimate.

Also, the side-effect in Distribution.proba_distribution called in ActorCriticPolicy._get_action_dist_from_latent means it's not possible to compute the detached old distribution using ActorCriticPolicy.evaluate_actions in a no_grad block because it overrides the parameter of the new distribution (in the image below, the parameters of distribution are replaced with the ones from old_distribution because of the side-effect).
image

On a side-note, pytorch currently doesn't allow to "detach" a distribution easily, but maybe it could be implemented in SB3's Distribution class.

Cyprien

from stable-baselines3-contrib.

araffin avatar araffin commented on May 22, 2024

I will try to have a deeper look at it soon. In the meantime, I recommend reading part of John Schulman Thesis, notably the "Computing the Fisher-Vector Product" section ;)

What about ActorCriticPolicy.evaluate_actions returning the Distribution object directly instead of the entropy as the last output?

probably a better idea would be to create a new method get_distribution()

in the image below, the parameters of distribution are replaced with the ones from old_distribution because of the side-effect).

a deepcopy should probably solve that issue, no?

EDIT: you can also take a look at Theano implementation and Tianshou one

from stable-baselines3-contrib.

araffin avatar araffin commented on May 22, 2024

as mentioned in contrib contributing guide, next step is to match published results, i would start with pybullet envs (i had some results in SB2 zoo)
and at the same time open a draft PR ;)

from stable-baselines3-contrib.

cyprienc avatar cyprienc commented on May 22, 2024

Hi,

Sorry for the delay (holidays), I've pushed to a fork of rl zoo: https://github.com/cyprienc/rl-baselines3-zoo
I'll need some help running it on a larger scale since my local compute is not enough.
Let me know what I can do.
Thanks,

Cyprien

from stable-baselines3-contrib.

araffin avatar araffin commented on May 22, 2024

Could you also open a PR?
This will make it easier to review/use ;)

from stable-baselines3-contrib.

cyprienc avatar cyprienc commented on May 22, 2024

Sure: DLR-RM/rl-baselines3-zoo#163
I'll fill the PR message later, just opened the PR to avoid loosing time.
Thanks,

Cyprien

from stable-baselines3-contrib.

araffin avatar araffin commented on May 22, 2024

i meant a PR to sb3 contrib...

from stable-baselines3-contrib.

cyprienc avatar cyprienc commented on May 22, 2024

Indeed... #40

from stable-baselines3-contrib.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.