jonasrothfuss / promp Goto Github PK

Implementation of Proximal Meta-Policy Search (ProMP) as well as related Meta-RL algorithm. Includes a useful experiment framework for Meta-RL.

Home Page: https://sites.google.com/view/pro-mp

License: MIT License

Python 100.00%

reinforcement-learning meta-learning machine-learning

promp's Introduction

ProMP: Proximal Meta-Policy Search

Implementations corresponding to ProMP (Rothfuss et al., 2018). Overall this repository consists of two branches:

master: lightweight branch that provides the necessary code to run Meta-RL algorithms such as ProMP, E-MAML, MAML. This branch is meant to provide an easy start with Meta-RL and can be integrated into other projects and setups.
full-code: branch that provides the comprehensive code that was used to produce the experimental results in Rothfuss et al. (2018). This includes experiment scripts and plotting scripts that can be used to reproduce the experimental results in the paper.

The code is written in Python 3 and builds on Tensorflow. Many of the provided reinforcement learning environments require the Mujoco physics engine. Overall the code was developed under consideration of modularity and computational efficiency. Many components of the Meta-RL algorithm are parallelized either using either MPI or Tensorflow in order to ensure efficient use of all CPU cores.

Documentation

An API specification and explanation of the code components can be found here. Also the documentation can be build locally by running the following commands

# ensure that you are in the root folder of the project
cd docs
# install the sphinx documentaiton tool dependencies
pip install requirements.txt
# build the documentaiton
make clean && make html
# now the html documentation can be found under docs/build/html/index.html

Installation / Dependencies

The provided code can be either run in A) docker container provided by us or B) using python on your local machine. The latter requires multiple installation steps in order to setup dependencies.

A. Docker

If not installed yet, set up docker on your machine. Pull our docker container jonasrothfuss/promp from docker-hub:

docker pull jonasrothfuss/promp

All the necessary dependencies are already installed inside the docker container.

B. Anaconda or Virtualenv

B.1. Installing MPI

Ensure that you have a working MPI implementation (see here for more instructions).

For Ubuntu you can install MPI through the package manager:

sudo apt-get install libopenmpi-dev

B.2. Create either venv or conda environment and activate it

Virtualenv

pip install --upgrade virtualenv
virtualenv <venv-name>
source <venv-name>/bin/activate

Anaconda

If not done yet, install anaconda by following the instructions here. Then reate a anaconda environment, activate it and install the requirements in requirements.txt.

conda create -n <env-name> python=3.6
source activate <env-name>

B.3. Install the required python dependencies

pip install -r requirements.txt

B.4. Set up the Mujoco physics engine and mujoco-py

For running the majority of the provided Meta-RL environments, the Mujoco physics engine as well as a corresponding python wrapper are required. For setting up Mujoco and mujoco-py, please follow the instructions here.

Running ProMP

In order to run the ProMP algorithm point environment (no Mujoco needed) with default configurations execute:

python run_scripts/pro-mp_run_point_mass.py

To run the ProMP algorithm in a Mujoco environment with default configurations:

python run_scripts/pro-mp_run_mujoco.py

The run configuration can be change either in the run script directly or by providing a JSON configuration file with all the necessary hyperparameters. A JSON configuration file can be provided through the flag. Additionally the dump path can be specified through the dump_path flag:

python run_scripts/pro-mp_run.py --config_file <config_file_path> --dump_path <dump_path>

Additionally, in order to run the the gradient-based meta-learning methods MAML and E-MAML (Finn et. al., 2017 and Stadie et. al., 2018) in a Mujoco environment with the default configuration execute, respectively:

python run_scripts/maml_run_mujoco.py 
python run_scripts/e-maml_run_mujoco.py

Cite

To cite ProMP please use

@article{rothfuss2018promp,
  title={ProMP: Proximal Meta-Policy Search},
  author={Rothfuss, Jonas and Lee, Dennis and Clavera, Ignasi and Asfour, Tamim and Abbeel, Pieter},
  journal={arXiv preprint arXiv:1810.06784},
  year={2018}
}

Acknowledgements

This repository includes environments introduced in (Duan et al., 2016, Finn et al., 2017).

promp's People

Contributors

Stargazers

Watchers

promp's Issues

Saywer Experiments

I see that in envs, you have some Sawyer push / sliding tasks. Do you have any experimental results on these? I couldn't find them in your paper.

Mujoco env versions

Thanks for sharing the repo!

I was wondering which version of Mujoco envs does the repo use?
131, 150, 200?

Why Pickle Environment

Hi, Jonas. Thanks for sharing this great project. I am little confused on why we pickle the environment when creating the process in the MetaParallelEnvExecutor:

        self.ps = [
             Process(target=worker, args=(work_remote, remote, pickle.dumps(env), envs_per_task, max_path_length, seed))
             for (work_remote, remote, seed) in zip(self.work_remotes, self.remotes, seeds)]  # Why pass work remotes?

and then unpickle it in the process:

envs = [pickle.loads(env_pickle) for _ in range(n_envs)]

Why we do not just initialize the environment in the process worker, which eliminates the code for pickle and unpicke.

Regarding the Walker2d-Randparams Environment

Thank you for your inspirational great work and open sourcing the code. I have a question regarding the walker random environment. The walker can not walk when the reward is 800, this is the case for all the algorithms that use this environment. When doing a comparative analysis are we supposed to only compare the reward or is the reward scaled for this environment?

Which name correspond to ProMP in the full_code branch?

Hi. I am trying to reproduce the results about gradient variance comparison. I know from previous issues that the codes are in https://github.com/jonasrothfuss/ProMP/tree/full_code/experiments/gradient_variance. There seem to be three algorithms tested in run_sweep.py, with the names VPG, VPG_DICE and DICE, respectively.

But in your paper, you show two curves: LVC and DICE, and I am confused about that. Does the DICE curve correspond to the mane 'DICE' in the script? Then which name does the curve LVC correspond to? Also, what does the third name correspond to?

confusion about experiments in full_code branch

Hey, could you please check the experiments code in full_code branch, e.g., in the variance comparison experiments, I can't not see where is your proposed algorithm, except dice_maml, vpg_maml, vpd_dice_maml, which are confusing since they seems not appears in paper and names are kind of confusing.
Besides, are you assuming that each dimension is independent when computing the variance of gradient?

is there a script to run experiment with dice for comparison

Hi, is there a script to run experiments with dice for comparing with other algorithms? I didn't see a script to do this under run_script directory. Thank you

It seems that MPI is not really needed cause it is used only in logger

Hey, you say Many components of the Meta-RL algorithm are parallelized either using either MPI or Tensorflow in order to ensure efficient use of all CPU cores. in the document but MPI is only found in logger which means that it is not necessarily used since codes won't be exected in MPI. Am I missing something?

KeyError: 'default'

python experiments/all_envs_eval/trpo_run_all.py

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:

https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

Traceback (most recent call last):
File "experiments/all_envs_eval/trpo_run_all.py", line 138, in
run_sweep(run_experiment, sweep_params, EXP_NAME, INSTANCE_TYPE)
File "/home/ljy/文档/ProMP-full_code/experiment_utils/run_sweep.py", line 27, in run_sweep
local_output_dir=os.path.join(config.DATA_DIR, 'local', exp_name))
File "/home/ljy/文档/ProMP-full_code/doodad/doodad/easy_sweep/launcher.py", line 38, in init
self.mount_out_s3 = mount.MountS3(s3_path='exp_logs', mount_point=docker_output_dir, output=True)
File "/home/ljy/文档/ProMP-full_code/doodad/doodad/mount.py", line 99, in init
s3_bucket = AUTOCONFIG.s3_bucket()
File "/home/ljy/文档/ProMP-full_code/doodad/doodad/ec2/autoconfig.py", line 16, in s3_bucket
return self.config['default']['s3_bucket_name']
File "/home/ljy/anaconda3/envs/mujoco-py/lib/python3.5/configparser.py", line 956, in getitem
raise KeyError(key)
KeyError: 'default'

What maybe the cause?

Neural Network support for Complex neural network architecture (e.g. seq2seq)

Hi,

Thanks for sharing this repo. The code is awesome.

I find that currently this repo only supports MLP. If I would like to use RNN or seq2seq neural network (NMT) to be my policy network, how can I implement the forward function like forward_mlp? Any suggestions for this problem?

Thank you very much.

Error of "Expired activation key"

Hi, I have setup ProMP on my server with docker. But when I tried to run the pro-mp_run_point_mass.py script, it returned an error of "Expired activation key". Do you know why this happens? I'm also not sure about where to put my mujoco license. But this problem seems not to be caused by mujoco, as point mass does not require it. Thanks.

Run meta_test.py with ProMP-trained policy and get very bad result.

I ran ppo_run.py and got a .pkl file for HopperRandParamsEnv, of which the average reward was about 200
But when I ran meta_test.py with ProMP-trained policy, the average reward dropped to around 10...
I don't understand where the problem is.
Can somebody help me?

How to use your model?

Thanks for your code. After runinng the programs in run_scripts folder, we get "Training finished".
Could you tell me how to use the model created?

Difference between maml and e-maml

Hi, thanks for opensourcing the code!

I am wondering what's the difference between e-maml and maml code? A line-by-line comparison seems to suggest that they are identical.

Would appreciate your insights. Thanks!

Question about Figures in the paper and log files

Hi,

Thanks for sharing this repo.

I was wondering which column(s) of progress.csv was used to plot Figure 2,3, and 7? Based on my understanding from the paper, I figure it is "Step_1-AverageReturn", is it right?

In addation, were log data for Figure 2,3 and 7 collected by running the followings?
python run_scripts/maml_run_mujoco.py
python run_scripts/pro-mp_run_mujoco.py

And finally, where were the 3 ranodm seeds?

That would be great if you please let me know about these.

Thanks.