Comments (4)
Thanks for your reply =).
they added an additional term which is with respect to the position of the state in the trajectory (this is al). It makes intuitive sense, since in continuous control tasks the same state may appear at the end or the start of the trajectory (think cart pole states), and depending on that their returns may vary wildly.
ok, that makes sense (I was only thinking in term of intermediate reward, not return), especially for the fixed length environments like HalfCheetah. But I don't think this should be limited to continuous action tasks. Also, I still have some trouble understanding how larger power of "al" (al**2
, al**3
) can help more.
So, I think we need to ask @dementrock (rllab) and @joschu (openai baselines) to have a final answer of what the "al" variable mean.
(pinging @hill-a because I think he is also interested in the answer)
from mjrl.
This code is similar to the linear_feature_baseline code in rllab:
https://github.com/rll/rllab/blob/master/rllab/baselines/linear_feature_baseline.py
They released a paper with that code but they didn't explain their feature selection. In my opinion, they seem to be using a polynomial basis as explained in Sutton. However, they added an additional term which is with respect to the position of the state in the trajectory (this is al). It makes intuitive sense, since in continuous control tasks the same state may appear at the end or the start of the trajectory (think cart pole states), and depending on that their returns may vary wildly. I think we can get much better features if we design them for a specific problem. (See Sutton and Barto)
from mjrl.
from mjrl.
I contacted @dementrock directly and got the final response:
Thank you for your interest. Unfortunately I do not recall the original motivation, except that it might simply be lazy naming and picking “a”range of “l” as the variable name.
You are right that it’s encoding information about time, which is important in finite-horizon problems.
from mjrl.
Related Issues (20)
- Is mean KL always zero? HOT 1
- Value function approximator HOT 1
- Much worse learning performance with new code base HOT 4
- Unable to complete the installation from the provided yml file HOT 1
- No RBF code?
- Sampler timeouts when running many concurrent trainings
- why are the advantages multiplied by 1e-2 in dapg.py? HOT 1
- Pickling of _VariableFunctions not compatible with PyTorch 1.5.0 HOT 1
- Unnecessary imports. Can be cleaned
- Experimental results of MoREL for D4RL benchmarks HOT 8
- Actions not clipped when generating synthetic trajectories
- RuntimeError: CUDA out of memory.
- Not learning reward? HOT 3
- NPG kl_mean is always 0
- For Morel, you truncate the uncertain rollouts instead of setting the negative reward?
- Understanding obs_mask HOT 1
- Hyperparams for D4RL Mujoco tasks
- gail
- What's the intuition behind act_repeat ?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mjrl.