Hello, looking at the code in that repo, I came across this variable

I contacted <a class="user-mention notranslate" data-hovercard-type="user" data-hoverc

[Question] Meaning of the "al" variable? about mjrl HOT 4 CLOSED

aravindr93 commented on August 20, 2024

[Question] Meaning of the "al" variable?

from mjrl.

Comments (4)

araffin commented on August 20, 2024 1

Thanks for your reply =).

they added an additional term which is with respect to the position of the state in the trajectory (this is al). It makes intuitive sense, since in continuous control tasks the same state may appear at the end or the start of the trajectory (think cart pole states), and depending on that their returns may vary wildly.

ok, that makes sense (I was only thinking in term of intermediate reward, not return), especially for the fixed length environments like HalfCheetah. But I don't think this should be limited to continuous action tasks. Also, I still have some trouble understanding how larger power of "al" (al**2, al**3) can help more.

So, I think we need to ask @dementrock (rllab) and @joschu (openai baselines) to have a final answer of what the "al" variable mean.

(pinging @hill-a because I think he is also interested in the answer)

from mjrl.

sashank-tirumala commented on August 20, 2024

This code is similar to the linear_feature_baseline code in rllab:
https://github.com/rll/rllab/blob/master/rllab/baselines/linear_feature_baseline.py

They released a paper with that code but they didn't explain their feature selection. In my opinion, they seem to be using a polynomial basis as explained in Sutton. However, they added an additional term which is with respect to the position of the state in the trajectory (this is al). It makes intuitive sense, since in continuous control tasks the same state may appear at the end or the start of the trajectory (think cart pole states), and depending on that their returns may vary wildly. I think we can get much better features if we design them for a specific problem. (See Sutton and Barto)

from mjrl.

sashank-tirumala commented on August 20, 2024

Yeah true, The issue isn't closed yet. We should ask them.

…

On Fri, 17 May 2019, 18:26 Antonin RAFFIN, ***@***.***> wrote: Thanks for your reply =). they added an additional term which is with respect to the position of the state in the trajectory (this is al). It makes intuitive sense, since in continuous control tasks the same state may appear at the end or the start of the trajectory (think cart pole states), and depending on that their returns may vary wildly. ok, that makes sense (I was only thinking in term of intermediate reward, not return), especially for the fixed length environments like HalfCheetah. But I don't think this should be limited to continuous action tasks. Also, I still have some trouble understanding how larger power of "al" (al**2, al**3) can help more. So, I think we need to ask @dementrock <https://github.com/dementrock> (rllab) and @joschu <https://github.com/joschu> (openai baselines) to have a final answer of what the "al" variable mean. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#13?email_source=notifications&email_token=AGF3MRI73I4TRVX6SO6R5ILPV2TO3A5CNFSM4HKWMDQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVUVSEI#issuecomment-493443345>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGF3MRLGEGCTKWXFBKBDTB3PV2TO3ANCNFSM4HKWMDQQ> .

from mjrl.

araffin commented on August 20, 2024

I contacted @dementrock directly and got the final response:

Thank you for your interest. Unfortunately I do not recall the original motivation, except that it might simply be lazy naming and picking “a”range of “l” as the variable name.

You are right that it’s encoding information about time, which is important in finite-horizon problems.

from mjrl.

[Question] Meaning of the "al" variable? about mjrl HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs