Comments (5)
PPO Algorithm (paper):
for iteration=1, 2, . . . do
for actor=1, 2, . . . , N do
Run policy πθold in environment for T timesteps
Compute advantage estimates A
end for
Optimize surrogate L wrt θ, with K epochs and minibatch size M ≤ NT
θold ← θ
end for
In this repo, N = 1 (one actor), batch size M = T. i.e the sample is the entire batch.
Given that performance of the algorithm is dependent on the environment, I am not sure as to how this will affect its overall efficiency. It is a hyper parameter and need to be tuned according to the environment.
But Using parallel workers (N>1) is generally more useful since the expectations are approximated with experience generated by different random seeds.
from ppo-pytorch.
In PPO.py
the T=300(max_timesteps=300
) and the M=2000(update_timestep=2000
), why you said M=T? Little confused here. Do you want to simulate multiple actors(N) by setting M > T. So in the PPO.py
example, 300(T) * 6.66(N) = 2000(M). Correct me if I am wrong.
from ppo-pytorch.
Update Timestep (T) = 2000
Mini-Batch size (M) = 2000
max_timesteps
is the maximum timesteps in ONE episode. One update may have experience from multiple episodes.
for iteration=1, 2, . . . do
for actor=1, 2, . . . , N do
Run policy πθold in environment for T timesteps
Compute advantage estimates A
end for
Optimize surrogate L wrt θ, with K epochs and minibatch size M ≤ NT
θold ← θ
end for
Using Multiple Actors (N), means to run multiple instances of actors (Parallel / Multithreaded), all collecting experience of length T.
For updating, Mini-batch size(M) can NOT be greater than the total batch size (NT)
from ppo-pytorch.
I see. I misread the max_timesteps
in your code as T in the paper. I think update_timestep
in your code is =M, =T.
One more confusion with multiple actors, it makes sense to use parallel environments, but why I can't use the N*T
sequential process to simulate parallel environments?
from ppo-pytorch.
All the instances will be running with different random seeds. This will lead to more varied experience, thus approximating the expectation better.
Source: skip to 54:19 of (https://www.youtube.com/watch?v=EKqxumCuAAY&list=PLkFD6_40KJIwhWJpGazJ9VSj9CFMkb79A&index=6)
from ppo-pytorch.
Related Issues (20)
- The reward function for training? HOT 1
- policy.eval() after load_state_dict() HOT 1
- How are you ensuring that actions are in range of (-1,1) after sampling in continuous action HOT 1
- How to improve the performance based on your code? HOT 1
- how can I use this code for a problem with 3 different actions? HOT 1
- About environment configuration HOT 2
- Convolutional? HOT 1
- Confusion about the loss function HOT 1
- roboschool is deprecated HOT 1
- error HOT 5
- Setting Model to eval() mode in test.py
- Would a shared network work ?
- Test results are not good
- Continuous action space should use Independent Normal instead of MultivariateNormal HOT 1
- optimize the existing Chinese generation model
- question
- ValueError: expected sequence of length 8 at dim 1 (got 0) HOT 1
- policy_old完全看不出作用 HOT 6
- the version problem about gym and roboschool HOT 1
- (Solved) No env.reset() at the end of each training epoch. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ppo-pytorch.