florensacc / rllab-curriculum Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
Hi, could you please tell me how to render Arm3d?
Thank you!
I am trying to run your algorithm on ant-maze. However, GaussianMLPPolicy gives the following assertion error.
assert isinstance(env_spec.action_space, Box)
AssertionError
Any idea, why is that a problem?
Hi,
According to your paper, you directly use the original hyperparameters reported in Mao et al. (2017) where a = -1, b = 1,
and c = 0. But I am confused about the discriminator loss in this line:
rllab-curriculum/curriculum/gan/gan.py
Line 382 in f55b502
It seems that you use a = 1 in the third term of the discriminator loss: D(G(z)-a)^2
Can you explain why the code here is tf.square(self._generator_output - 1)?
Thank you very much.
Hi,
When running the code maze-ant-gan, I get the problem "libmujoco131.so: cannot open shared object file: No such file or directory" while running the line self._handle = _dlopen(self._name, mode). Could you please how can I solve this problem?
Thanks a lot
I was trying to run experiments with arm3d envs.
$python3 curriculum/experiments/starts/arm3d/arm3d_key/arm3d_key_brownian.py
However I am getting the following error.
Traceback (most recent call last):
File "/home/arpit/RL3/rllab-curriculum/scripts/run_experiment_lite.py", line 139, in
run_experiment(sys.argv)
File "/home/arpit/RL3/rllab-curriculum/scripts/run_experiment_lite.py", line 123, in run_experiment
method_call(variant_data)
File "/home/arpit/RL3/rllab-curriculum/curriculum/experiments/starts/arm3d/arm3d_key/arm3d_key_brownian_algo.py", line 101, in run_task
open(osp.join(config.PROJECT_PATH, load_dir, 'all_feasible_states.pkl'), 'rb'))
FileNotFoundError: [Errno 2] No such file or directory: '/home/arpit/RL3/rllab-curriculum/data_upload/state_collections/all_feasible_states.pkl'
Could you kindly let me know if I have to use another command to generate pkl file before running the script for experiments?
The definition of 'done' in envs/arm3d/arm3d_disc_env.py as follow:
done = False if self.kill_outside and (distance_to_goal > self.kill_radius): print("******** OUT of region ********") done = True
is that means ‘done’ is work only the disc is far from the goal?
And in envs/arm3d/arm3d_move_peg_env.py:
if reward > -0.02: done = True
Hi,
I understand that for evaluation phase for the AntMaze you just let the Ant start from init_pos
https://github.com/florensacc/rllab-curriculum/blob/master/curriculum/experiments/starts/maze/maze_ant/maze_ant_brownian_algo.py#L140-L155
But for PointMass is it possible to have a list of init_pos too?
I tried to understand your code :
apparently you are using test_policy methods
But I couldn t find the method : update_init_selector that apparently modifies the initial states when resetting the env?
Thank you very much.
The first index of center corresponds to x axis but it is the other way around in plotting.
The robot seems not working after training 5000 iters on key_insertion task using the default command. It seems to pull the key out with using sim_policy.py. Not sure if I miss something?
The definition of 'done' in envs/arm3d/arm3d_disc_env.py as follow:
done = False if self.kill_outside and (distance_to_goal > self.kill_radius): print("******** OUT of region ********") done = True
is that means ‘done’ is work only the disc is far from the goal?
And in envs/arm3d/arm3d_move_peg_env.py:
if reward > -0.02: done = True
means 'done' is work when the disc is close to the goal enough.
I can understand the second definition but not the first. Could you please answer my question?
Thanks for releasing the code. I was wondering if you made an example with OpenAI gym environment format as it is more widely accepted. I have some environments which are in OpenAI gym's format. It would really helpful if you can share your example if any.
Hope you are doing well and appreciate your repository.
I want to visualize the training and specifically the ant actions in Mujoco environment while it is training, could you give me some guide?
I run the following:
python3.6 curriculum/experiments/goals/maze_ant/maze_ant_gan.py
Hi,
Here is my screen , do you know where can from the rendering issue?
also when I do env.reset()
I got the -5.84 ,3.77 x,y coordinates... and the goal is (4.4)
is there any correlation btw the MAZE STRUCTURE and the (x,y) coordinates (is it a 1-1 scale ) ?
(I took maze id=11 for the point mass env, its the right id I guess? )
Thanks you very much !
Hi,
I played with the env. module
and try to set arbitrarily the start seed close to the goal
goal_coordinates=[0, 4, 0.55, 1, 0, 0, 0, 0, 1, 0, -1, 0, -1, 0, 1]
env.reset(init_state=goal_coordinates)
next_state, reward, done, d= env.step("some action")
x,y=env.wrapped_env.get_body_com("torso")[:2]
Even when I have x=0 and y=4 , I get a reward of 0...
I noticed this line in your code
https://github.com/florensacc/rllab-curriculum/blob/master/curriculum/envs/maze/maze_env.py#L250
where self.coef_inner_rew=0...
I noticed that you modifiy the "step" method compared to the rllab one:
https://github.com/rll/rllab/blob/master/rllab/envs/mujoco/maze/maze_env.py#L297-L301
How is working your step method?
Thank you,
curriculum.envs.maze.maze_ant.ant_maze_start_env import AntMazeEnv has 41 dimensional states and from curriculum.envs.maze.maze_ant.ant_maze_env import AntMazeEnv gives 131 dimensional states. The paper mentions that the dimensions are 41 but the default implementation here uses 131. It was a tricky one to notice!
hi,
I believe that the file to load to get the evaluation points for insertion key are here :
https://github.com/florensacc/rllab-curriculum/blob/master/data_upload/state_collections/read_me
which files was used to get the result in the paper??
Thank you !
Hi,
According to your paper you apply a brownian motion to generate new seeds states (Normal 0 variance 1 ) but according to this line
https://github.com/florensacc/rllab-curriculum/blob/master/curriculum/envs/start_env.py#L233
It seems that you are applying a random uniform action with range env.action_space.bounds for the AntMaze Environment.
Can you explain why the action is not a N(0,I) ??
Thank you very much.
Hi,
when using the wrapped env I get nan value when I reset the env:
env.reset()
array([ 0.00000000e+000, 4.00000000e+000, 1.58101007e-322,
1.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
4.09214812e-316, 0.00000000e+000, nan,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 1.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
1.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 1.00000000e+000, nan,
nan, nan, 0.00000000e+000,
4.00000000e+000])
and before that I meet some cuda problem
*mod.cu(3025): warning: conversion from a string literal to "char " is deprecated
Do you have any idea where it can comes from ?
thank you !
Hi,
Wonderful work and thank you for sharing the code.
I am just curious, could goal GAN solve the sparse reward atari game Montezuma Revenge? Did you try this game?
Hi, I'm running the key-hole manipulation experiments with the command in the README and it's been running for over an hour locally on my machine (2014 MBP quadcore). It seems a lot of reruns are happening - was wondering if it's possible there's some convergence criteria that's not being met or is the expected duration on the order of hours?
Hi,
I was wondering if you can tell us how long (or at least an estimation with ur setup) last the training for each environment (with the same parameters from the paper )?
few days?
Thank you very much
Hi,
in the paper you said that for PointMass and Ant env you used a batch size of 50,000 but reading your code it seems that it is a batch size of 20,000 repeated 5 times:
https://github.com/florensacc/rllab-curriculum/blob/master/curriculum/experiments/starts/maze/maze_brownian.py#L94-L95
can you precise one point:
in the performance curve you reported in the paper what correspoind 1 learning iteration? ( Figure 2 a) page 7/14 )
How I understand it is: 5 inner iters of TRPO and each inner_iters has a batch size of 50,000 ??
can you also explicite it for Ant Maze?
https://github.com/florensacc/rllab-curriculum/blob/master/curriculum/experiments/starts/maze/maze_ant/maze_ant_brownian.py#L108-L109
Thank you very much for your time !
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.