florensacc / rllab-curriculum Goto Github PK

License: Other

Python 86.46% C 7.47% C++ 2.60% Makefile 0.01% Ruby 0.45% Mako 0.17% CSS 0.30% JavaScript 1.04% Shell 0.17% HTML 1.24% Dockerfile 0.09%

rllab-curriculum's People

Contributors

Stargazers

Watchers

rllab-curriculum's Issues

how to render?

Hi, could you please tell me how to render Arm3d?
Thank you!

Problem running MazeAntEnv

I am trying to run your algorithm on ant-maze. However, GaussianMLPPolicy gives the following assertion error.

assert isinstance(env_spec.action_space, Box)
AssertionError

Any idea, why is that a problem?

Question about the lsgan loss(hyperparameters a, b and c)

Hi,
According to your paper, you directly use the original hyperparameters reported in Mao et al. (2017) where a = -1, b = 1,
and c = 0. But I am confused about the discriminator loss in this line:

rllab-curriculum/curriculum/gan/gan.py

Line 382 in f55b502

tf.square(self._generator_output - 1)

It seems that you use a = 1 in the third term of the discriminator loss: D(G(z)-a)^2

Can you explain why the code here is tf.square(self._generator_output - 1)?
Thank you very much.

libmujoco131.so: cannot open shared object file: No such file or directory

Hi,
When running the code maze-ant-gan, I get the problem "libmujoco131.so: cannot open shared object file: No such file or directory" while running the line self._handle = _dlopen(self._name, mode). Could you please how can I solve this problem?
Thanks a lot

running arm3d env experiments

I was trying to run experiments with arm3d envs.
$python3 curriculum/experiments/starts/arm3d/arm3d_key/arm3d_key_brownian.py
However I am getting the following error.

Traceback (most recent call last):
File "/home/arpit/RL3/rllab-curriculum/scripts/run_experiment_lite.py", line 139, in
run_experiment(sys.argv)
File "/home/arpit/RL3/rllab-curriculum/scripts/run_experiment_lite.py", line 123, in run_experiment
method_call(variant_data)
File "/home/arpit/RL3/rllab-curriculum/curriculum/experiments/starts/arm3d/arm3d_key/arm3d_key_brownian_algo.py", line 101, in run_task
open(osp.join(config.PROJECT_PATH, load_dir, 'all_feasible_states.pkl'), 'rb'))
FileNotFoundError: [Errno 2] No such file or directory: '/home/arpit/RL3/rllab-curriculum/data_upload/state_collections/all_feasible_states.pkl'

Could you kindly let me know if I have to use another command to generate pkl file before running the script for experiments?

the definition of 'done' in step function from arm3d_disc_env

The definition of 'done' in envs/arm3d/arm3d_disc_env.py as follow:
done = False if self.kill_outside and (distance_to_goal > self.kill_radius): print("******** OUT of region ********") done = True
is that means ‘done’ is work only the disc is far from the goal？
And in envs/arm3d/arm3d_move_peg_env.py:
if reward > -0.02: done = True

Evaluation point for PointMass environment

Hi,

I understand that for evaluation phase for the AntMaze you just let the Ant start from init_pos
https://github.com/florensacc/rllab-curriculum/blob/master/curriculum/experiments/starts/maze/maze_ant/maze_ant_brownian_algo.py#L140-L155

But for PointMass is it possible to have a list of init_pos too?
I tried to understand your code :
apparently you are using test_policy methods

rllab-curriculum/curriculum/experiments/starts/maze/maze_brownian_algo.py

Line 157 in a1eb78b

 test_and_plot_policy(policy, env, as_goals=False, max_reward=v['max_reward'], sampling_res=sampling_res, 

But I couldn t find the method : update_init_selector that apparently modifies the initial states when resetting the env?

rllab-curriculum/curriculum/envs/maze/maze_evaluate.py

Line 219 in 81a3714

train_env.update_init_selector(FixedStateGenerator(init_state))

https://github.com/florensacc/rllab-curriculum/search?q=update_init_selector&unscoped_q=update_init_selector

Thank you very much.

x, y limits are interchanged

The first index of center corresponds to x axis but it is the other way around in plotting.

rllab-curriculum/curriculum/logging/visualization.py

Line 183 in a1eb78b

ax.set_ylim(center[0] - limit, center[0] + limit)

rllab-curriculum/curriculum/logging/visualization.py

Line 184 in a1eb78b

ax.set_xlim(center[1] - limit, center[1] + limit)

Doesn't work after training?

The robot seems not working after training 5000 iters on key_insertion task using the default command. It seems to pull the key out with using sim_policy.py. Not sure if I miss something?

the definition of 'done' in step function from arm3d_disc_env

And in envs/arm3d/arm3d_move_peg_env.py:
if reward > -0.02: done = True
means 'done' is work when the disc is close to the goal enough.
I can understand the second definition but not the first. Could you please answer my question?

using OpenAI gym env

Thanks for releasing the code. I was wondering if you made an example with OpenAI gym environment format as it is more widely accepted. I have some environments which are in OpenAI gym's format. It would really helpful if you can share your example if any.

Visualize the training

Hope you are doing well and appreciate your repository.

I want to visualize the training and specifically the ant actions in Mujoco environment while it is training, could you give me some guide?

I run the following:
python3.6 curriculum/experiments/goals/maze_ant/maze_ant_gan.py

Problem rendering PointMazeEnv and coordinates reading

Hi,

Here is my screen , do you know where can from the rendering issue?
also when I do env.reset()
I got the -5.84 ,3.77 x,y coordinates... and the goal is (4.4)

is there any correlation btw the MAZE STRUCTURE and the (x,y) coordinates (is it a 1-1 scale ) ?
(I took maze id=11 for the point mass env, its the right id I guess? )

Thanks you very much !

Question about the reward

Hi,

I played with the env. module
and try to set arbitrarily the start seed close to the goal

goal_coordinates=[0, 4, 0.55, 1, 0, 0, 0, 0, 1, 0, -1, 0, -1, 0, 1]
env.reset(init_state=goal_coordinates)

next_state, reward, done, d= env.step("some action")

x,y=env.wrapped_env.get_body_com("torso")[:2]

Even when I have x=0 and y=4 , I get a reward of 0...
I noticed this line in your code
https://github.com/florensacc/rllab-curriculum/blob/master/curriculum/envs/maze/maze_env.py#L250
where self.coef_inner_rew=0...

I noticed that you modifiy the "step" method compared to the rllab one:
https://github.com/rll/rllab/blob/master/rllab/envs/mujoco/maze/maze_env.py#L297-L301
How is working your step method?

Thank you,

Wrong environments in the default setup

curriculum.envs.maze.maze_ant.ant_maze_start_env import AntMazeEnv has 41 dimensional states and from curriculum.envs.maze.maze_ant.ant_maze_env import AntMazeEnv gives 131 dimensional states. The paper mentions that the dimensions are 41 but the default implementation here uses 131. It was a tricky one to notice!

which file was used for the evaluation points of Key insertion environment?

hi,

I believe that the file to load to get the evaluation points for insertion key are here :
https://github.com/florensacc/rllab-curriculum/blob/master/data_upload/state_collections/read_me

which files was used to get the result in the paper??

Thank you !

Question about generate_starts function for reverse curriculum

Hi,

According to your paper you apply a brownian motion to generate new seeds states (Normal 0 variance 1 ) but according to this line
https://github.com/florensacc/rllab-curriculum/blob/master/curriculum/envs/start_env.py#L233

It seems that you are applying a random uniform action with range env.action_space.bounds for the AntMaze Environment.

Can you explain why the action is not a N(0,I) ??

Thank you very much.

Having "nan" value when using wrapped env "GoalStartExplorationEnv"

Hi,

when using the wrapped env I get nan value when I reset the env:
env.reset()

array([ 0.00000000e+000, 4.00000000e+000, 1.58101007e-322,
1.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
4.09214812e-316, 0.00000000e+000, nan,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 1.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
1.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 1.00000000e+000, nan,
nan, nan, 0.00000000e+000,
4.00000000e+000])

and before that I meet some cuda problem

*mod.cu(3025): warning: conversion from a string literal to "char " is deprecated

Do you have any idea where it can comes from ?

thank you !

Could goal GAN solve the sparse reward atari game Montezuma Revenge?

Hi,
Wonderful work and thank you for sharing the code.
I am just curious, could goal GAN solve the sparse reward atari game Montezuma Revenge? Did you try this game?

Convergence criteria for arm3d?

Hi, I'm running the key-hole manipulation experiments with the command in the README and it's been running for over an hour locally on my machine (2014 MBP quadcore). It seems a lot of reruns are happening - was wondering if it's possible there's some convergence criteria that's not being met or is the expected duration on the order of hours?

How long approximatively it takes to run on AntMaze?

Hi,

I was wondering if you can tell us how long (or at least an estimation with ur setup) last the training for each environment (with the same parameters from the paper )?
few days?

Thank you very much

Precision about the training parameters? (mainly TRPO )

Hi,

in the paper you said that for PointMass and Ant env you used a batch size of 50,000 but reading your code it seems that it is a batch size of 20,000 repeated 5 times:
https://github.com/florensacc/rllab-curriculum/blob/master/curriculum/experiments/starts/maze/maze_brownian.py#L94-L95

can you precise one point:
in the performance curve you reported in the paper what correspoind 1 learning iteration? ( Figure 2 a) page 7/14 )
How I understand it is: 5 inner iters of TRPO and each inner_iters has a batch size of 50,000 ??

can you also explicite it for Ant Maze?
https://github.com/florensacc/rllab-curriculum/blob/master/curriculum/experiments/starts/maze/maze_ant/maze_ant_brownian.py#L108-L109

Thank you very much for your time !

florensacc / rllab-curriculum Goto Github PK

rllab-curriculum's People

Contributors

Stargazers

Watchers

Forkers

rllab-curriculum's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs