Hey, I'm a student from TUM using your PILCO implementation. I want to optimize th

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Extra control dimension for varying target values about pilco HOT 4 CLOSED

ManuelM95 commented on May 28, 2024

Extra control dimension for varying target values

from pilco.

Comments (4)

kyr-pol commented on May 28, 2024

Hi @ManuelM95 ,

This a good question, it will take some augmentations on the code base that would be good for the project in general in my opinion.

A simple first approach would be train multiple controllers, one for each distinct task, or subtask, depending on how you want to structure it. This doesn't solve your case, but it might be helpful step.

A similar functionality in the original PILCO implementation allows for multiple starting states, that induce different predicted trajectories, and a single control policy is trained jointly for all cases.

To have a single policy for the distinct targets, you'd have to alter the training process in a similar way. The training is based on predicted trajectories, and the predictions are Gaussian. The extra dimension you want to introduce would have arbitrarily large initial variance, if you want to alter the targets freely. Then, the Gaussian estimate for the next state(s), would also be very uncertain, and planning would be very hard. I think the best approach would be to train on a number of distinct trajectories, corresponding to different targets. If these are reasonably representative of the possible targets, the policy trained on all of them should be able to generalise to new targets too.

To be more specific, one way to implement this, assuming the GP model from mgpr.py remains unchanged would be:

change the pilco._build_likelihood so that it combines (adding probably) multiple predicted rewards, one for each predicted trajectory with its corresponding target
you would need different reward functions, with different targets (either different instances of the rewards we currently have or a new reward class)
a controller class that doesn't just take the state as input but the target also, as you suggested.
I am pretty sure that other smaller changes will be needed as you go along with the implementation.

By the way, a good simple case study for this would be the openAI gym Reacher-v2, where a simple robotic arm has to reach a specific target with its end point, and the target varies from episode to episode. It should be a nice minimal example of the functionality you are looking for.

I am also interested in this and will probably try a few things in the next few weeks, keep me posted if you make any progress, and I will mention this issue in any relevant commits. Good luck and have fun!

from pilco.

ManuelM95 commented on May 28, 2024

Hi @kyr-pol ,
many thanks for your detailed answer, I was getting nervous because of the lack of progress ;). I will discuss your input with my tutor and check with him how we plan to proceed. I will keep you posted.

Thanks, Manuel

from pilco.

ManuelM95 commented on May 28, 2024

Hey @kyr-pol ,
I spoke with my tutor and since my deadline is in 2 months and I also need to write the semester thesis, I won't be able to implement those changes :( .Sorry for that and good luck with the project.

Thanks for the help,
Manuel

from pilco.

kyr-pol commented on May 28, 2024

Ok, no problem, good luck with the thesis!

from pilco.

Extra control dimension for varying target values about pilco HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs