Regarding the first step in the peg_method of the paper, we need to sample the goals,

how to sample goals when training from scratch about peg HOT 5 CLOSED

IDayday commented on May 29, 2024

how to sample goals when training from scratch

from peg.

Comments (5)

edwhu commented on May 29, 2024

Hi, that is a good question. One is to use some known distribution over state dimensions, like if you have a robot joint limit, then you can sample from a uniform distribution defined over the joint limits. Another way is to sample goals from the replay buffer.

The first way works well if you have prior knowledge about the state space. The second way works without any knowledge. However, you need to sample enough goal states to have a diverse candidate pool for MPPI optimization. An interesting future extension here is to figuring out how to efficiently sample diverse goals.

from peg.

IDayday commented on May 29, 2024

Hi, I agree with you, but I have more detailed questions about sampling the goals from the replay buffer.

As far as I know, before we sample goals in the replay buffer, we still have to do some exploration, and where do we get the goals for these explorations?

Could you please explain how this is implemented in your PEG? (Maybe there is still a prior distribution, or make several key states as goals?)

from peg.

IDayday commented on May 29, 2024

As a general case, if we don't have any knowledge about the envrionment, the only way I can think of is to use a randomized policy. ( maybe $\pi(s)$ not $\pi(s,g)$ )

from peg.

edwhu commented on May 29, 2024

At the very start of training, there are no goals in the replay buffer. So how can we run any goal-directed exploration strategy?

One way is to just run a non-goal-conditioned policy, like the P2E exploration policy $\pi(s)$ to gather some initial trajectories to fill the replay buffer. After this, we can pick goals from the replay buffer for Go-Explore.

Another way, is to just use an arbitrary goal, like all 0s, and run Go-Explore with this arbitrary goal.

For PEG, since we know the bounds of the state space, we do not sample the initial goals from the replay buffer, we just sample candidates from the known bounds of the state space.

from peg.

IDayday commented on May 29, 2024

Thanks for your patience. good luck!

from peg.

how to sample goals when training from scratch about peg HOT 5 CLOSED

Comments (5)

Related Issues (4)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs