GithubHelp home page GithubHelp logo

Comments (4)

araffin avatar araffin commented on August 20, 2024 1

Thanks for your reply =).

they added an additional term which is with respect to the position of the state in the trajectory (this is al). It makes intuitive sense, since in continuous control tasks the same state may appear at the end or the start of the trajectory (think cart pole states), and depending on that their returns may vary wildly.

ok, that makes sense (I was only thinking in term of intermediate reward, not return), especially for the fixed length environments like HalfCheetah. But I don't think this should be limited to continuous action tasks. Also, I still have some trouble understanding how larger power of "al" (al**2, al**3) can help more.

So, I think we need to ask @dementrock (rllab) and @joschu (openai baselines) to have a final answer of what the "al" variable mean.

(pinging @hill-a because I think he is also interested in the answer)

from mjrl.

sashank-tirumala avatar sashank-tirumala commented on August 20, 2024

This code is similar to the linear_feature_baseline code in rllab:
https://github.com/rll/rllab/blob/master/rllab/baselines/linear_feature_baseline.py

They released a paper with that code but they didn't explain their feature selection. In my opinion, they seem to be using a polynomial basis as explained in Sutton. However, they added an additional term which is with respect to the position of the state in the trajectory (this is al). It makes intuitive sense, since in continuous control tasks the same state may appear at the end or the start of the trajectory (think cart pole states), and depending on that their returns may vary wildly. I think we can get much better features if we design them for a specific problem. (See Sutton and Barto)

from mjrl.

sashank-tirumala avatar sashank-tirumala commented on August 20, 2024

from mjrl.

araffin avatar araffin commented on August 20, 2024

I contacted @dementrock directly and got the final response:

Thank you for your interest. Unfortunately I do not recall the original motivation, except that it might simply be lazy naming and picking “a”range of “l” as the variable name.

You are right that it’s encoding information about time, which is important in finite-horizon problems.

from mjrl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.