GithubHelp home page GithubHelp logo

jonasrothfuss / promp Goto Github PK

View Code? Open in Web Editor NEW
225.0 225.0 48.0 54.98 MB

Implementation of Proximal Meta-Policy Search (ProMP) as well as related Meta-RL algorithm. Includes a useful experiment framework for Meta-RL.

Home Page: https://sites.google.com/view/pro-mp

License: MIT License

Python 100.00%
machine-learning meta-learning reinforcement-learning

promp's People

Contributors

atomicvar avatar dennisl88 avatar iclavera avatar jonasrothfuss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

promp's Issues

It seems that MPI is not really needed cause it is used only in logger

Hey, you say Many components of the Meta-RL algorithm are parallelized either using either MPI or Tensorflow in order to ensure efficient use of all CPU cores. in the document but MPI is only found in logger which means that it is not necessarily used since codes won't be exected in MPI. Am I missing something?

Mujoco env versions

Thanks for sharing the repo!

I was wondering which version of Mujoco envs does the repo use?
131, 150, 200?

confusion about experiments in full_code branch

Hey, could you please check the experiments code in full_code branch, e.g., in the variance comparison experiments, I can't not see where is your proposed algorithm, except dice_maml, vpg_maml, vpd_dice_maml, which are confusing since they seems not appears in paper and names are kind of confusing.
Besides, are you assuming that each dimension is independent when computing the variance of gradient?

Saywer Experiments

I see that in envs, you have some Sawyer push / sliding tasks. Do you have any experimental results on these? I couldn't find them in your paper.

Question about Figures in the paper and log files

Hi,

Thanks for sharing this repo.

I was wondering which column(s) of progress.csv was used to plot Figure 2,3, and 7? Based on my understanding from the paper, I figure it is "Step_1-AverageReturn", is it right?

In addation, were log data for Figure 2,3 and 7 collected by running the followings?
python run_scripts/maml_run_mujoco.py
python run_scripts/pro-mp_run_mujoco.py

And finally, where were the 3 ranodm seeds?

That would be great if you please let me know about these.

Thanks.

Why Pickle Environment

Hi, Jonas. Thanks for sharing this great project. I am little confused on why we pickle the environment when creating the process in the MetaParallelEnvExecutor:

        self.ps = [
             Process(target=worker, args=(work_remote, remote, pickle.dumps(env), envs_per_task, max_path_length, seed))
             for (work_remote, remote, seed) in zip(self.work_remotes, self.remotes, seeds)]  # Why pass work remotes?

and then unpickle it in the process:

envs = [pickle.loads(env_pickle) for _ in range(n_envs)]

Why we do not just initialize the environment in the process worker, which eliminates the code for pickle and unpicke.

How to use your model?

Thanks for your code. After runinng the programs in run_scripts folder, we get "Training finished".
Could you tell me how to use the model created?

Difference between maml and e-maml

Hi, thanks for opensourcing the code!

I am wondering what's the difference between e-maml and maml code? A line-by-line comparison seems to suggest that they are identical.

Would appreciate your insights. Thanks!

Regarding the Walker2d-Randparams Environment

Thank you for your inspirational great work and open sourcing the code. I have a question regarding the walker random environment. The walker can not walk when the reward is 800, this is the case for all the algorithms that use this environment. When doing a comparative analysis are we supposed to only compare the reward or is the reward scaled for this environment?

KeyError: 'default'

python experiments/all_envs_eval/trpo_run_all.py

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:

Traceback (most recent call last):
File "experiments/all_envs_eval/trpo_run_all.py", line 138, in
run_sweep(run_experiment, sweep_params, EXP_NAME, INSTANCE_TYPE)
File "/home/ljy/文档/ProMP-full_code/experiment_utils/run_sweep.py", line 27, in run_sweep
local_output_dir=os.path.join(config.DATA_DIR, 'local', exp_name))
File "/home/ljy/文档/ProMP-full_code/doodad/doodad/easy_sweep/launcher.py", line 38, in init
self.mount_out_s3 = mount.MountS3(s3_path='exp_logs', mount_point=docker_output_dir, output=True)
File "/home/ljy/文档/ProMP-full_code/doodad/doodad/mount.py", line 99, in init
s3_bucket = AUTOCONFIG.s3_bucket()
File "/home/ljy/文档/ProMP-full_code/doodad/doodad/ec2/autoconfig.py", line 16, in s3_bucket
return self.config['default']['s3_bucket_name']
File "/home/ljy/anaconda3/envs/mujoco-py/lib/python3.5/configparser.py", line 956, in getitem
raise KeyError(key)
KeyError: 'default'

What maybe the cause?

Error of "Expired activation key"

Hi, I have setup ProMP on my server with docker. But when I tried to run the pro-mp_run_point_mass.py script, it returned an error of "Expired activation key". Do you know why this happens? I'm also not sure about where to put my mujoco license. But this problem seems not to be caused by mujoco, as point mass does not require it. Thanks.

Which name correspond to ProMP in the full_code branch?

Hi. I am trying to reproduce the results about gradient variance comparison. I know from previous issues that the codes are in https://github.com/jonasrothfuss/ProMP/tree/full_code/experiments/gradient_variance. There seem to be three algorithms tested in run_sweep.py, with the names VPG, VPG_DICE and DICE, respectively.

But in your paper, you show two curves: LVC and DICE, and I am confused about that. Does the DICE curve correspond to the mane 'DICE' in the script? Then which name does the curve LVC correspond to? Also, what does the third name correspond to?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.