GithubHelp home page GithubHelp logo

Comments (4)

kyr-pol avatar kyr-pol commented on May 28, 2024

Hi @ManuelM95 ,

This a good question, it will take some augmentations on the code base that would be good for the project in general in my opinion.

A simple first approach would be train multiple controllers, one for each distinct task, or subtask, depending on how you want to structure it. This doesn't solve your case, but it might be helpful step.

A similar functionality in the original PILCO implementation allows for multiple starting states, that induce different predicted trajectories, and a single control policy is trained jointly for all cases.

To have a single policy for the distinct targets, you'd have to alter the training process in a similar way. The training is based on predicted trajectories, and the predictions are Gaussian. The extra dimension you want to introduce would have arbitrarily large initial variance, if you want to alter the targets freely. Then, the Gaussian estimate for the next state(s), would also be very uncertain, and planning would be very hard. I think the best approach would be to train on a number of distinct trajectories, corresponding to different targets. If these are reasonably representative of the possible targets, the policy trained on all of them should be able to generalise to new targets too.

To be more specific, one way to implement this, assuming the GP model from mgpr.py remains unchanged would be:

  • change the pilco._build_likelihood so that it combines (adding probably) multiple predicted rewards, one for each predicted trajectory with its corresponding target
  • you would need different reward functions, with different targets (either different instances of the rewards we currently have or a new reward class)
  • a controller class that doesn't just take the state as input but the target also, as you suggested.
    I am pretty sure that other smaller changes will be needed as you go along with the implementation.

By the way, a good simple case study for this would be the openAI gym Reacher-v2, where a simple robotic arm has to reach a specific target with its end point, and the target varies from episode to episode. It should be a nice minimal example of the functionality you are looking for.

I am also interested in this and will probably try a few things in the next few weeks, keep me posted if you make any progress, and I will mention this issue in any relevant commits. Good luck and have fun!

from pilco.

ManuelM95 avatar ManuelM95 commented on May 28, 2024

Hi @kyr-pol ,
many thanks for your detailed answer, I was getting nervous because of the lack of progress ;). I will discuss your input with my tutor and check with him how we plan to proceed. I will keep you posted.

Thanks, Manuel

from pilco.

ManuelM95 avatar ManuelM95 commented on May 28, 2024

Hey @kyr-pol ,
I spoke with my tutor and since my deadline is in 2 months and I also need to write the semester thesis, I won't be able to implement those changes :( .Sorry for that and good luck with the project.

Thanks for the help,
Manuel

from pilco.

kyr-pol avatar kyr-pol commented on May 28, 2024

Ok, no problem, good luck with the thesis!

from pilco.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.