Hey there, First of all great job on the AttentionNeuron paper and c

For non-vision tasks (Ant and CartPole) we used LSTM but for vision tasks (Pong

AttentionNeuron - Permutation invariance about brain-tokyo-workshop HOT 4 CLOSED

google commented on August 16, 2024

AttentionNeuron - Permutation invariance

from brain-tokyo-workshop.

Comments (4)

lerrytang commented on August 16, 2024

Hi Jan, thank you for the question, again an insightful one :)

Maybe you are referring to re-shuffling the observations during a rollout?
If this is the case, you are partially right because when the observations are reshuffled, self.hx has accumulated info about the previous observations and the agent's performance will be interrupted when the observations are no longer in the same order. However, in our experiments where we reshuffle the observations every t steps, we found that AttentionNeuron was able to "reset" its internal state and recover the performance (see the table below), though the recovery speed depends on the underlying task.

from brain-tokyo-workshop.

jankrepl commented on August 16, 2024

Thank you for the quick reply! Yes, I am referring to the re-shuffling of the observations during a rollout.

I actually have 2 additional questions related to this.

In your paper, you provide generic mathematical definitions of the AttentionNeuron layer and you state that
. However, shouldn't it be ? Since there is an LSTM cell and its hidden states.
Do you think it would be possible to have a setup where the agent only looks at the current observation and the previous action (IMO this would be a truly permutation invariant agent w.r.t. current observation)? Or in other words, an agent that would give you exactly the same performance no matter how often you reshuffle during your rollout.

[EDIT]

from brain-tokyo-workshop.

lerrytang commented on August 16, 2024

For non-vision tasks (Ant and CartPole) we used LSTM but for vision tasks (Pong and CarRacing) we used stacked frames and MLPs to avoid large computational graphs. Admittedly, we could have used separate and more precise notations for f_k and f_v, but we thought a generic formula (though less accurate) was better for understanding the general idea of the paper.
For this question, I can only reason based on my experience. Learning a PI agent that only depends on (o_t, a_{t-1}) is hard. For example, if your control signal is force/torque, they contribute to the dynamics in second order (i.e., force->velocity->position), so what o_t reflects may not be the result of applying a_{t-1}. You may wonder if (o_t, a_{t-1}, a_{t-2}) will work, I think it depends on the task. For CartPole, I was able to train the agent by stacking k=4 obs, and I guess k<4 may work as well (I didn't try though). For Ant or other locomotion tasks, the contact/friction with ground will likely make k larger. On the other hand, it may be possible for other tasks and is an exciting direction for future works.

Let me know if the above answer your questions :)

from brain-tokyo-workshop.

jankrepl commented on August 16, 2024

Makes perfect sense!

Thank you again!!!

from brain-tokyo-workshop.

AttentionNeuron - Permutation invariance about brain-tokyo-workshop HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs