I tried to add an reward term with bias = 0.5, i.e., it should be larger than 0.5 anyw

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How are episode rewards calculated? about legged_gym HOT 2 CLOSED

zita-ch commented on August 16, 2024

How are episode rewards calculated?

from legged_gym.

Comments (2)

EricVoll commented on August 16, 2024

@zita-ch
The reward-scale is multiplied with env.dt (default is 0.02). Then the rewards are summed in the env.episode_sums dictionary and printed here.

from legged_gym.

zita-ch commented on August 16, 2024

I got it. Thanks for your detailed explanation.

So first we get the average of the episode reward over the reset envs, then such a mean value is divided by the max_episode_length (in second, default 20). However, at the very beginning, most of the reset envs cannot stay alive to the last step of the horizon. So the episode rewards at the first dozens of iterations are very small, e.g., reward * 0.02/20.
I believe it makes sense, but this could be a little misleading. I think typically we do not average the episode reward on the horizon length especially when there is reset, although it does not affect the display of learning progress. Sometimes we may just want to observe the value averaged over the real episode length (that could be much smaller than max_episode_length). I personally suggest that in the next update you could add some explanations in tensorboard tab or code comments, or add another tensorboard tab.

Your works are quite inspiring and made me cling to your DRL framework now.
Best wishes ;)

from legged_gym.

Recommend Projects

How are episode rewards calculated? about legged_gym HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs