GithubHelp home page GithubHelp logo

When to Update about ppo-pytorch HOT 5 CLOSED

nikhilbarhate99 avatar nikhilbarhate99 commented on May 9, 2024
When to Update

from ppo-pytorch.

Comments (5)

nikhilbarhate99 avatar nikhilbarhate99 commented on May 9, 2024 1

PPO Algorithm (paper):

for iteration=1, 2, . . . do
  for actor=1, 2, . . . , N do
    Run policy πθold in environment for T timesteps
    Compute advantage estimates A
  end for
Optimize surrogate L wrt θ, with K epochs and minibatch size M ≤ NT
θold ← θ
end for

In this repo, N = 1 (one actor), batch size M = T. i.e the sample is the entire batch.

Given that performance of the algorithm is dependent on the environment, I am not sure as to how this will affect its overall efficiency. It is a hyper parameter and need to be tuned according to the environment.

But Using parallel workers (N>1) is generally more useful since the expectations are approximated with experience generated by different random seeds.

from ppo-pytorch.

xunzhang avatar xunzhang commented on May 9, 2024

In PPO.py the T=300(max_timesteps=300) and the M=2000(update_timestep=2000), why you said M=T? Little confused here. Do you want to simulate multiple actors(N) by setting M > T. So in the PPO.py example, 300(T) * 6.66(N) = 2000(M). Correct me if I am wrong.

from ppo-pytorch.

nikhilbarhate99 avatar nikhilbarhate99 commented on May 9, 2024

Update Timestep (T) = 2000
Mini-Batch size (M) = 2000

max_timesteps is the maximum timesteps in ONE episode. One update may have experience from multiple episodes.

for iteration=1, 2, . . . do
  for actor=1, 2, . . . , N do
    Run policy πθold in environment for T timesteps
    Compute advantage estimates A
  end for
Optimize surrogate L wrt θ, with K epochs and minibatch size M ≤ NT
θold ← θ
end for

Using Multiple Actors (N), means to run multiple instances of actors (Parallel / Multithreaded), all collecting experience of length T.
For updating, Mini-batch size(M) can NOT be greater than the total batch size (NT)

from ppo-pytorch.

xunzhang avatar xunzhang commented on May 9, 2024

I see. I misread the max_timesteps in your code as T in the paper. I think update_timestep in your code is =M, =T.

One more confusion with multiple actors, it makes sense to use parallel environments, but why I can't use the N*T sequential process to simulate parallel environments?

from ppo-pytorch.

nikhilbarhate99 avatar nikhilbarhate99 commented on May 9, 2024

All the instances will be running with different random seeds. This will lead to more varied experience, thus approximating the expectation better.

Source: skip to 54:19 of (https://www.youtube.com/watch?v=EKqxumCuAAY&list=PLkFD6_40KJIwhWJpGazJ9VSj9CFMkb79A&index=6)

from ppo-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.