Comments (4)
Hi @ManuelM95 ,
This a good question, it will take some augmentations on the code base that would be good for the project in general in my opinion.
A simple first approach would be train multiple controllers, one for each distinct task, or subtask, depending on how you want to structure it. This doesn't solve your case, but it might be helpful step.
A similar functionality in the original PILCO implementation allows for multiple starting states, that induce different predicted trajectories, and a single control policy is trained jointly for all cases.
To have a single policy for the distinct targets, you'd have to alter the training process in a similar way. The training is based on predicted trajectories, and the predictions are Gaussian. The extra dimension you want to introduce would have arbitrarily large initial variance, if you want to alter the targets freely. Then, the Gaussian estimate for the next state(s), would also be very uncertain, and planning would be very hard. I think the best approach would be to train on a number of distinct trajectories, corresponding to different targets. If these are reasonably representative of the possible targets, the policy trained on all of them should be able to generalise to new targets too.
To be more specific, one way to implement this, assuming the GP model from mgpr.py
remains unchanged would be:
- change the
pilco._build_likelihood
so that it combines (adding probably) multiple predicted rewards, one for each predicted trajectory with its corresponding target - you would need different reward functions, with different targets (either different instances of the rewards we currently have or a new reward class)
- a controller class that doesn't just take the state as input but the target also, as you suggested.
I am pretty sure that other smaller changes will be needed as you go along with the implementation.
By the way, a good simple case study for this would be the openAI gym Reacher-v2, where a simple robotic arm has to reach a specific target with its end point, and the target varies from episode to episode. It should be a nice minimal example of the functionality you are looking for.
I am also interested in this and will probably try a few things in the next few weeks, keep me posted if you make any progress, and I will mention this issue in any relevant commits. Good luck and have fun!
from pilco.
Hi @kyr-pol ,
many thanks for your detailed answer, I was getting nervous because of the lack of progress ;). I will discuss your input with my tutor and check with him how we plan to proceed. I will keep you posted.
Thanks, Manuel
from pilco.
Hey @kyr-pol ,
I spoke with my tutor and since my deadline is in 2 months and I also need to write the semester thesis, I won't be able to implement those changes :( .Sorry for that and good luck with the project.
Thanks for the help,
Manuel
from pilco.
Ok, no problem, good luck with the thesis!
from pilco.
Related Issues (20)
- Computation of cross-covariance of state and action
- Question about MGPR.
- Error with cloudpickle
- Computation time for policy optimization HOT 3
- Reference for predicting with uncertain inputs with SMGPR HOT 1
- Gradient based policy optimisation. HOT 4
- SMGPR : the induced points are different for each model HOT 1
- calculate_factorizations question HOT 1
- Cost for trajectory following HOT 3
- Cholesky decomposition was not successful. The input might not be valid. HOT 2
- [BUG] mountain_car.py fails due to missing import
- What is the V for in the predict_given_factorizations HOT 1
- installation: issue with gast, tensorflow HOT 6
- How do you save your trained model? HOT 2
- Could you please share exact version of some dependency packages
- Performance issue in the definition of create_models, pilco/controllers.py(P1)
- AttributeError: 'Parameter' object has no attribute 'value'
- NotImplementedError: Cannot convert a symbolic (graph mode) `DeferredTensor` to a numpy array. HOT 2
- Is squash_sin() right? HOT 1
- Bugs in model update? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pilco.