Hi! I just noticed that there is a difference in the reward function between the paper

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Reward function mismatch between paper and code about walk-these-ways HOT 5 CLOSED

improbable-ai commented on June 12, 2024

Reward function mismatch between paper and code

from walk-these-ways.

Comments (5)

gmargo11 commented on June 12, 2024

Hi @anahrendra , thanks for this note. You're right that the implementation and paper differ slightly, and I didn't catch this before! The code is correct, the paper contains a mistake here.

I think the effect of this should be a constant scaling of the reward function, since these two reward terms are added together at every timestep: $[(1.0-C_\text{foot}^\text{cmd}) + (C_\text{foot}^\text{cmd})] = 1.0$. So, I would expect the impact of this change on the final policy to be minor. But do let me know if you see otherwise (there are a lot of operations being composed on this term)

-Gabe

from walk-these-ways.

anahrendra commented on June 12, 2024

Hi! Thanks for your implementation. That makes sense.

However, I tried to run a training with your code (nothing is changed at all), but I could not obtain the results as what you have in the pretrained weight. A bit of summary is as follows:

4000 iterations, still able to walk, but cannot track given gaits well
45000 iterations, completely fail to even hold its position.

Could you by any chance directly train using your github code to verify if there is something missing? And how many iterations are required to produce a good result?

Thanks in advance for your help!

from walk-these-ways.

gmargo11 commented on June 12, 2024

@anahrendra , I did some local testing and I suspect this is because of the larger gravity range in this repo's default config: https://github.com/Improbable-AI/walk-these-ways/blob/master/scripts/train.py#L49

In the paper, we used max gravity randomization 1.0 but I seem to have provided the config with more challenging max gravity randomization of 2.0 here. Training with this range might be unstable. Sorry about that!

Please try changing line 49 of train.py to:

Cfg.domain_rand.gravity_range = [-1.0, 1.0]

And let me know if that fixes the issue. On my machine, it converges after around 10k iterations.

By the way, the friction range in this codebase is also a bit wider than in the paper, and the max footswing height is higher. See appendix Table 6 https://arxiv.org/pdf/2212.03238.pdf for the values used in the paper. I may just update the repo in a bit to have all the original parameters

P.S. to debug with a bit faster convergence and verify that everything else is working properly, you can try turning off gravity randomization entirely ( Cfg.domain_rand.gravity_range = [-0.0, 0.0]).

from walk-these-ways.

anahrendra commented on June 12, 2024

Hi!

Thanks a lot for your detailed support. I will try your fix as soon as possible and let you know about the results.

Have a nice weekend!

from walk-these-ways.

gmargo11 commented on June 12, 2024

I have updated the default parameters in train.py (728058d)

from walk-these-ways.

Recommend Projects

Reward function mismatch between paper and code about walk-these-ways HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs