Hi there, Thanks for your excellent code. I am running your code usi

Trajectory optimization not stable about gps HOT 1 OPEN

cbfinn commented on July 27, 2024

Trajectory optimization not stable

from gps.

Comments (1)

yongxf commented on July 27, 2024

The instability of iLQR comes from the eta update in iLQR.
The eta penalizes one of the KL divergence in iLQR, and is tuned by comparing kl_div with kl_step.
The problem is:

when mc cost increases ==> new_mult < 1 ==> step decreases (since actual improvement becomes much smaller than predicted improvement and algorithm tries to reduce the step size)
step decreases ==> con > 0 (since kl_step = step * kl_base, thus the theoretical bound becomes more strict. You refered kl_step in the code as epsilon, which is not correct since epsilon controls the other KL divergence term)
con > 0 ==> eta increases (since more strict constraint on kl divergence makes current kl divergence violated the constraint, so more penalty will be added (i.k.i eta increases)

In summary:
when actual cost increases ==> penalization of KL divergence increases.

This is not reasonable, since more effort on KL divergence term will make the loss term becomes even more larger. After several iterations, the robot waived crazy.

The first several iterations is normal though. I guess the scaling of the improvement in new_multi calculation matters.

Any comment on this?

from gps.

Recommend Projects