Comments (6)
Hi Wonchul, I've worked with the Pendulum-v0 environment too.
Firstly let me note that on the more_envs branch we are working on several extra environments (we have good results in a version Pendulum-v0, inverted double pendulum and the Swimmer), we are gonna merge some of the work there on master soon after cleaning it up a bit, but you can take a look if you are looking for extra applications.
The problem with the Pendulum-v0 is that although the dynamics are pretty simple, it's hard for PILCO to predict a trajectory, because the initial angle of the pendulum can be anything (I think it is initialised uniformly). That makes planning with a normally distributed prediction for every time-step, as PILCO does, not that useful. What I did was change this initialisation, from the gym source code, to a starting position with the pendulum at the bottom, with a reasonably small amount of starting uncertainty (~0.1). Now this is an easier task than what the original gym one, but since the pendulum swing-up task is a standard control benchmark, we might still want to solve it this way (the version used in the original PILCO paper is like this too).
Now for the memory issues, I have encountered them too, there a few things you can do, and they are related to T, the number of time-steps in the planning horizon and N, the number of runs and subsequently the number of data points you are working with.
- Reduce time horizon, possibly by using subsampling. During planning, a number of matrices are created and held in memory simultaneously, proportional to the number of time steps in the planning horizon. You might want to decrease that number. If you feel a longer time horizon is needed, you can use subsampling, basically repeating each action for m time-steps, and only showing PILCO the state every m time-steps. That way you can plan ahead long enough, without the memory problems. There is a simple way to implement that by changing the rollout function in
inverted_pendulum.py
. - Use sparse GPs. By using the num_induced_points argument when you call the PILCO constructor you can set the number of data points used for the GP inference.
Also be ware that the default reward function and initial state of PILCO won't work for Pendulum-v0. Copying from inverted_pendulum.py
in more_envs:
# NEEDS a different initialisation than the one in gym (change the reset() method),
# to (m_init, S_init)
SUBS=3
bf = 30
maxiter=50
max_action=2.0
target = np.array([1.0, 0.0, 0.0])
weights = np.diag([2.0, 2.0, 0.3])
m_init = np.reshape([-1.0, 0, 0.0], (1,3))
S_init = np.diag([0.01, 0.05, 0.01])
T = 40
J = 4
N = 8
restarts = True
SUBS is the subsampling rate, target and weights are the reward function parameters. With these parameters I've had consistent good performance.
from pilco.
from pilco.
Sure, it makes sense to want to use it without mujoco.
For the memory problem yes, you can do it the standard tensorflow way. What we use when running on a GPU is something like:
config = tf.ConfigProto()
gpu_id = kwargs.get('gpu_id', "1")
config.gpu_options.visible_device_list = gpu_id
config.gpu_options.per_process_gpu_memory_fraction = 0.80
sess = tf.Session(graph=tf.Graph(), config=config)
with sess:
before making the environment.
I am not sure this is gonna help, since it restricts the memory tf is taking, but if you are running out of it, it probably means that tf used up all available memory and it still wasn't enough. If you have many data points and/or long planning horizons, especially in higher dimensional problems, think about subsampling or sparse GPs.
from pilco.
from pilco.
Hey @wonchul-kim you might wanna check the pull request I added here, it should be more clear and easier to make sense than the more_envs branch I mentioned above. It includes 3 extra environments, including the Pendulum-v0.
from pilco.
I am closing this for now, if there are more questions feel free to re-open it.
from pilco.
Related Issues (20)
- Computation of cross-covariance of state and action
- Question about MGPR.
- Error with cloudpickle
- Computation time for policy optimization HOT 3
- Reference for predicting with uncertain inputs with SMGPR HOT 1
- Gradient based policy optimisation. HOT 4
- SMGPR : the induced points are different for each model HOT 1
- calculate_factorizations question HOT 1
- Cost for trajectory following HOT 3
- Cholesky decomposition was not successful. The input might not be valid. HOT 2
- [BUG] mountain_car.py fails due to missing import
- What is the V for in the predict_given_factorizations HOT 1
- installation: issue with gast, tensorflow HOT 6
- How do you save your trained model? HOT 2
- Could you please share exact version of some dependency packages
- Performance issue in the definition of create_models, pilco/controllers.py(P1)
- AttributeError: 'Parameter' object has no attribute 'value'
- NotImplementedError: Cannot convert a symbolic (graph mode) `DeferredTensor` to a numpy array. HOT 2
- Is squash_sin() right? HOT 1
- Bugs in model update? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pilco.