GithubHelp home page GithubHelp logo

Comments (4)

lerrytang avatar lerrytang commented on August 16, 2024

Hi Jan, thank you for the question, again an insightful one :)

Maybe you are referring to re-shuffling the observations during a rollout?
If this is the case, you are partially right because when the observations are reshuffled, self.hx has accumulated info about the previous observations and the agent's performance will be interrupted when the observations are no longer in the same order. However, in our experiments where we reshuffle the observations every t steps, we found that AttentionNeuron was able to "reset" its internal state and recover the performance (see the table below), though the recovery speed depends on the underlying task.
table

from brain-tokyo-workshop.

jankrepl avatar jankrepl commented on August 16, 2024

Thank you for the quick reply! Yes, I am referring to the re-shuffling of the observations during a rollout.

I actually have 2 additional questions related to this.

  1. In your paper, you provide generic mathematical definitions of the AttentionNeuron layer and you state that
    fk_paper. However, shouldn't it be fk_actual? Since there is an LSTM cell and its hidden states.

  2. Do you think it would be possible to have a setup where the agent only looks at the current observation and the previous action (IMO this would be a truly permutation invariant agent w.r.t. current observation)? Or in other words, an agent that would give you exactly the same performance no matter how often you reshuffle during your rollout.

[EDIT]

from brain-tokyo-workshop.

lerrytang avatar lerrytang commented on August 16, 2024
  1. For non-vision tasks (Ant and CartPole) we used LSTM but for vision tasks (Pong and CarRacing) we used stacked frames and MLPs to avoid large computational graphs. Admittedly, we could have used separate and more precise notations for f_k and f_v, but we thought a generic formula (though less accurate) was better for understanding the general idea of the paper.
  2. For this question, I can only reason based on my experience. Learning a PI agent that only depends on (o_t, a_{t-1}) is hard. For example, if your control signal is force/torque, they contribute to the dynamics in second order (i.e., force->velocity->position), so what o_t reflects may not be the result of applying a_{t-1}. You may wonder if (o_t, a_{t-1}, a_{t-2}) will work, I think it depends on the task. For CartPole, I was able to train the agent by stacking k=4 obs, and I guess k<4 may work as well (I didn't try though). For Ant or other locomotion tasks, the contact/friction with ground will likely make k larger. On the other hand, it may be possible for other tasks and is an exciting direction for future works.

Let me know if the above answer your questions :)

from brain-tokyo-workshop.

jankrepl avatar jankrepl commented on August 16, 2024

Makes perfect sense!

Thank you again!!!

from brain-tokyo-workshop.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.