Are you requesting a feature or an implementation?

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

In the meantime, you could try increasing the sequence length (<code class="notranslat

Real recurrent policy supported about slm-lab HOT 2 OPEN

kengz commented on August 25, 2024

Real recurrent policy supported

from slm-lab.

Comments (2)

kengz commented on August 25, 2024

Hi @yangysc , thanks for testing the RNN. The shared network from the spec ppo_rnn_shared_cartpole works slightly better because there are less hyperparameters to run. It yields slightly better results:

[2019-07-14 22:53:07,321 PID:73583 INFO __init__.py log_summary] Trial 0 session 0 ppo_rnn_shared_cartpole_t0_s0 [train_df] epi: 169  t: 200  wall_t: 360  opt_step: 234560  frame: 23465  fps: 65.1806  total_reward: 200  total_reward_ma: 173.03  loss: 0.0292752  lr: 4.55652e-17  explore_var: nan  entropy_coef: 0.001  entropy: 0.112986  grad_norm: nan
[2019-07-14 22:53:10,775 PID:73583 INFO __init__.py log_summary] Trial 0 session 0 ppo_rnn_shared_cartpole_t0_s0 [train_df] epi: 170  t: 185  wall_t: 363  opt_step: 236480  frame: 23650  fps: 65.1515  total_reward: 185  total_reward_ma: 173.2  loss: 0.679745  lr: 4.55652e-17  explore_var: nan  entropy_coef: 0.001  entropy: 0.228988  grad_norm: nan
[2019-07-14 22:53:14,093 PID:73583 INFO __init__.py log_summary] Trial 0 session 0 ppo_rnn_shared_cartpole_t0_s0 [train_df] epi: 171  t: 200  wall_t: 367  opt_step: 238400  frame: 23850  fps: 64.9864  total_reward: 200  total_reward_ma: 173.35  loss: 0.624804  lr: 4.55652e-17  explore_var: nan  entropy_coef: 0.001  entropy: 0.315934  grad_norm: nan

We have not thoroughly tested RNNs yet, but your observation is true and the RecurrentNet class is limited in that sense. The hidden state is discarded and not used as input in the next forward pass. We can implement this by storing the hidden state alongside the state in agent Memory, and retrieve it during memory.sample().

This will take some time to implement, but we're currently busy with benchmarking tasks. I'm making this issue as a feature request so we can get on it as soon as we have time.

from slm-lab.

lgraesser commented on August 25, 2024

In the meantime, you could try increasing the sequence length (seq_len) in the net component of the spec file. This will persist the hidden state for more steps.

from slm-lab.

Recommend Projects

Real recurrent policy supported about slm-lab HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs