GithubHelp home page GithubHelp logo

Real recurrent policy supported about slm-lab HOT 2 OPEN

kengz avatar kengz commented on August 25, 2024
Real recurrent policy supported

from slm-lab.

Comments (2)

kengz avatar kengz commented on August 25, 2024

Hi @yangysc , thanks for testing the RNN. The shared network from the spec ppo_rnn_shared_cartpole works slightly better because there are less hyperparameters to run. It yields slightly better results:

[2019-07-14 22:53:07,321 PID:73583 INFO __init__.py log_summary] Trial 0 session 0 ppo_rnn_shared_cartpole_t0_s0 [train_df] epi: 169  t: 200  wall_t: 360  opt_step: 234560  frame: 23465  fps: 65.1806  total_reward: 200  total_reward_ma: 173.03  loss: 0.0292752  lr: 4.55652e-17  explore_var: nan  entropy_coef: 0.001  entropy: 0.112986  grad_norm: nan
[2019-07-14 22:53:10,775 PID:73583 INFO __init__.py log_summary] Trial 0 session 0 ppo_rnn_shared_cartpole_t0_s0 [train_df] epi: 170  t: 185  wall_t: 363  opt_step: 236480  frame: 23650  fps: 65.1515  total_reward: 185  total_reward_ma: 173.2  loss: 0.679745  lr: 4.55652e-17  explore_var: nan  entropy_coef: 0.001  entropy: 0.228988  grad_norm: nan
[2019-07-14 22:53:14,093 PID:73583 INFO __init__.py log_summary] Trial 0 session 0 ppo_rnn_shared_cartpole_t0_s0 [train_df] epi: 171  t: 200  wall_t: 367  opt_step: 238400  frame: 23850  fps: 64.9864  total_reward: 200  total_reward_ma: 173.35  loss: 0.624804  lr: 4.55652e-17  explore_var: nan  entropy_coef: 0.001  entropy: 0.315934  grad_norm: nan

We have not thoroughly tested RNNs yet, but your observation is true and the RecurrentNet class is limited in that sense. The hidden state is discarded and not used as input in the next forward pass. We can implement this by storing the hidden state alongside the state in agent Memory, and retrieve it during memory.sample().

This will take some time to implement, but we're currently busy with benchmarking tasks. I'm making this issue as a feature request so we can get on it as soon as we have time.

from slm-lab.

lgraesser avatar lgraesser commented on August 25, 2024

In the meantime, you could try increasing the sequence length (seq_len) in the net component of the spec file. This will persist the hidden state for more steps.

from slm-lab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.