GithubHelp home page GithubHelp logo

florensacc / rllab-curriculum Goto Github PK

View Code? Open in Web Editor NEW
127.0 127.0 43.0 204.14 MB

License: Other

Python 86.46% C 7.47% C++ 2.60% Makefile 0.01% Ruby 0.45% Mako 0.17% CSS 0.30% JavaScript 1.04% Shell 0.17% HTML 1.24% Dockerfile 0.09%

rllab-curriculum's People

Contributors

astooke avatar coorsbenjamin avatar davheld avatar dementrock avatar djfoote avatar florensacc avatar gwthomas avatar haarnoja avatar hrtang avatar iclavera avatar jonasschneider avatar joschu avatar markusrw avatar neocxi avatar openai-sys-okta-integration avatar pcmoritz avatar reinhouthooft avatar shhuang avatar young-geng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rllab-curriculum's Issues

the definition of 'done' in step function from arm3d_disc_env

The definition of 'done' in envs/arm3d/arm3d_disc_env.py as follow:
done = False if self.kill_outside and (distance_to_goal > self.kill_radius): print("******** OUT of region ********") done = True
is that means ‘done’ is work only the disc is far from the goal?

And in envs/arm3d/arm3d_move_peg_env.py:
if reward > -0.02: done = True
means 'done' is work when the disc is close to the goal enough.
I can understand the second definition but not the first. Could you please answer my question?

how to render?

Hi, could you please tell me how to render Arm3d?
Thank you!

Precision about the training parameters? (mainly TRPO )

Hi,

in the paper you said that for PointMass and Ant env you used a batch size of 50,000 but reading your code it seems that it is a batch size of 20,000 repeated 5 times:
https://github.com/florensacc/rllab-curriculum/blob/master/curriculum/experiments/starts/maze/maze_brownian.py#L94-L95

can you precise one point:
in the performance curve you reported in the paper what correspoind 1 learning iteration? ( Figure 2 a) page 7/14 )
How I understand it is: 5 inner iters of TRPO and each inner_iters has a batch size of 50,000 ??

can you also explicite it for Ant Maze?
https://github.com/florensacc/rllab-curriculum/blob/master/curriculum/experiments/starts/maze/maze_ant/maze_ant_brownian.py#L108-L109

Thank you very much for your time !

Question about the lsgan loss(hyperparameters a, b and c)

Hi,
According to your paper, you directly use the original hyperparameters reported in Mao et al. (2017) where a = -1, b = 1,
and c = 0. But I am confused about the discriminator loss in this line:

tf.square(self._generator_output - 1)

It seems that you use a = 1 in the third term of the discriminator loss: D(G(z)-a)^2

Can you explain why the code here is tf.square(self._generator_output - 1)?
Thank you very much.

Having "nan" value when using wrapped env "GoalStartExplorationEnv"

Hi,

when using the wrapped env I get nan value when I reset the env:
env.reset()

array([ 0.00000000e+000, 4.00000000e+000, 1.58101007e-322,
1.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
4.09214812e-316, 0.00000000e+000, nan,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 1.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
1.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 1.00000000e+000, nan,
nan, nan, 0.00000000e+000,
4.00000000e+000])

and before that I meet some cuda problem

*mod.cu(3025): warning: conversion from a string literal to "char " is deprecated

Do you have any idea where it can comes from ?

thank you !

running arm3d env experiments

I was trying to run experiments with arm3d envs.
$python3 curriculum/experiments/starts/arm3d/arm3d_key/arm3d_key_brownian.py
However I am getting the following error.

Traceback (most recent call last):
File "/home/arpit/RL3/rllab-curriculum/scripts/run_experiment_lite.py", line 139, in
run_experiment(sys.argv)
File "/home/arpit/RL3/rllab-curriculum/scripts/run_experiment_lite.py", line 123, in run_experiment
method_call(variant_data)
File "/home/arpit/RL3/rllab-curriculum/curriculum/experiments/starts/arm3d/arm3d_key/arm3d_key_brownian_algo.py", line 101, in run_task
open(osp.join(config.PROJECT_PATH, load_dir, 'all_feasible_states.pkl'), 'rb'))
FileNotFoundError: [Errno 2] No such file or directory: '/home/arpit/RL3/rllab-curriculum/data_upload/state_collections/all_feasible_states.pkl'

Could you kindly let me know if I have to use another command to generate pkl file before running the script for experiments?

Problem rendering PointMazeEnv and coordinates reading

Hi,

Here is my screen , do you know where can from the rendering issue?
also when I do env.reset()
I got the -5.84 ,3.77 x,y coordinates... and the goal is (4.4)

is there any correlation btw the MAZE STRUCTURE and the (x,y) coordinates (is it a 1-1 scale ) ?
(I took maze id=11 for the point mass env, its the right id I guess? )

20181022_135701

Thanks you very much !

Evaluation point for PointMass environment

Hi,

I understand that for evaluation phase for the AntMaze you just let the Ant start from init_pos
https://github.com/florensacc/rllab-curriculum/blob/master/curriculum/experiments/starts/maze/maze_ant/maze_ant_brownian_algo.py#L140-L155

But for PointMass is it possible to have a list of init_pos too?
I tried to understand your code :
apparently you are using test_policy methods

test_and_plot_policy(policy, env, as_goals=False, max_reward=v['max_reward'], sampling_res=sampling_res,

But I couldn t find the method : update_init_selector that apparently modifies the initial states when resetting the env?

train_env.update_init_selector(FixedStateGenerator(init_state))

https://github.com/florensacc/rllab-curriculum/search?q=update_init_selector&unscoped_q=update_init_selector

Thank you very much.

Problem running MazeAntEnv

I am trying to run your algorithm on ant-maze. However, GaussianMLPPolicy gives the following assertion error.

assert isinstance(env_spec.action_space, Box)
AssertionError

Any idea, why is that a problem?

using OpenAI gym env

Thanks for releasing the code. I was wondering if you made an example with OpenAI gym environment format as it is more widely accepted. I have some environments which are in OpenAI gym's format. It would really helpful if you can share your example if any.

Doesn't work after training?

The robot seems not working after training 5000 iters on key_insertion task using the default command. It seems to pull the key out with using sim_policy.py. Not sure if I miss something?

Wrong environments in the default setup

curriculum.envs.maze.maze_ant.ant_maze_start_env import AntMazeEnv has 41 dimensional states and from curriculum.envs.maze.maze_ant.ant_maze_env import AntMazeEnv gives 131 dimensional states. The paper mentions that the dimensions are 41 but the default implementation here uses 131. It was a tricky one to notice!

Convergence criteria for arm3d?

Hi, I'm running the key-hole manipulation experiments with the command in the README and it's been running for over an hour locally on my machine (2014 MBP quadcore). It seems a lot of reruns are happening - was wondering if it's possible there's some convergence criteria that's not being met or is the expected duration on the order of hours?

Visualize the training

Hope you are doing well and appreciate your repository.

I want to visualize the training and specifically the ant actions in Mujoco environment while it is training, could you give me some guide?

I run the following:
python3.6 curriculum/experiments/goals/maze_ant/maze_ant_gan.py

Question about the reward

Hi,

I played with the env. module
and try to set arbitrarily the start seed close to the goal

goal_coordinates=[0, 4, 0.55, 1, 0, 0, 0, 0, 1, 0, -1, 0, -1, 0, 1]
env.reset(init_state=goal_coordinates)

next_state, reward, done, d= env.step("some action")

x,y=env.wrapped_env.get_body_com("torso")[:2]

Even when I have x=0 and y=4 , I get a reward of 0...
I noticed this line in your code
https://github.com/florensacc/rllab-curriculum/blob/master/curriculum/envs/maze/maze_env.py#L250
where self.coef_inner_rew=0...

I noticed that you modifiy the "step" method compared to the rllab one:
https://github.com/rll/rllab/blob/master/rllab/envs/mujoco/maze/maze_env.py#L297-L301
How is working your step method?

Thank you,

the definition of 'done' in step function from arm3d_disc_env

The definition of 'done' in envs/arm3d/arm3d_disc_env.py as follow:
done = False if self.kill_outside and (distance_to_goal > self.kill_radius): print("******** OUT of region ********") done = True
is that means ‘done’ is work only the disc is far from the goal?
And in envs/arm3d/arm3d_move_peg_env.py:
if reward > -0.02: done = True

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.