GithubHelp home page GithubHelp logo

google-research / dads Goto Github PK

View Code? Open in Web Editor NEW
182.0 7.0 50.0 68 KB

Code for 'Dynamics-Aware Unsupervised Discovery of Skills' (DADS). Enables skill discovery without supervision, which can be combined with model-based control.

License: Apache License 2.0

Python 100.00%
reinforcement-learning skill-discovery unsupervised-learning model-based-rl deep-learning

dads's Introduction

Dynamics-Aware Discovery of Skills (DADS)

This repository is the open-source implementation of Dynamics-Aware Unsupervised Discovery of Skills (project page, arXiv). We propose an skill-discovery method which can learn skills for different agents without any rewards, while simultaneously learning dynamics model for the skills which can be leveraged for model-based control on the downstream task. This work was published in International Conference of Learning Representations (ICLR), 2020.

We have also included an improved off-policy version of DADS, coined off-DADS. The details have been released in Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning.

In case of problems, contact Archit Sharma.

Table of Contents

Setup

(1) Setup MuJoCo

Download and setup mujoco in ~/.mujoco. Set the LD_LIBRARY_PATH in your ~/.bashrc:

LD_LIBRARY_PATH='~/.mujoco/mjpro150/bin':$LD_LIBRARY_PATH

(2) Setup environment

Clone the repository and setup up the conda environment to run DADS code:

cd <path_to_dads>
conda env create -f env.yml
conda activate dads-env

Usage

We give a high-level explanation of how to use the code. More details pertaining to hyperparameters can be found in the the configs/template_config.txt, dads_off.py and the Appendix A of paper.

Every training run will require an experimental logging directory and a configuration file, which can be created started from the configs/template_config.txt. There are two phases: (a) Training where the new skills are learnt along with their skill-dynamics models and (b) evaluation where the learnt skills are evaluated on the task associated with the environment.

For training, ensure --run_train=1 is set in the configuration file. For on-policy optimization, set --clear_buffer_every_iter=1 and ensure the replay buffer size is bigger than the number of steps collected in every iteration. For off-policy optimization (details yet to be released), set --clear_buffer_every_iter=0. Set the environment name (ensure the environment is listed in get_environment() in dads_off.py). To change the observation for skill-dynamics (for example to learn in x-y space), set --reduced_observation and correspondingly configure process_observation() in dads_off.py. The skill space can be configured to be discrete or continuous. The optimization parameters can be tweaked, and some basic values have been set in (more details in the paper).

For evaluation, ensure --run_eval=1 and the experimental directory points to the same directory in which the training happened. Set --num_evals if you want to record videos of randomly sampled skills from the prior distribution. After that, the script will use the learned models to execute MPC on the latent space to optimize for the task-reward. By default, the code will call get_environment() to load FLAGS.environment + '_goal', and will go through the list of goal-coordinates specified in the eval section of the script.

We have provided the configuration files in configs/ to reproduce results from the experiments in the paper. Goal evaluation is currently only setup for MuJoCo Ant environement. The goal distribution can be changed in dads_off.py in evaluation part of the script.

cd <path_to_dads>
python unsupervised_skill_learning/dads_off.py --logdir=<path_for_experiment_logs> --flagfile=configs/<config_name>.txt

The specified experimental log directory will contain the tensorboard files, the saved checkpoints and the skill-evaluation videos.

Citation

To cite Dynamics-Aware Unsupervised Discovery of Skills:

@article{sharma2019dynamics,
  title={Dynamics-aware unsupervised discovery of skills},
  author={Sharma, Archit and Gu, Shixiang and Levine, Sergey and Kumar, Vikash and Hausman, Karol},
  journal={arXiv preprint arXiv:1907.01657},
  year={2019}
}

To cite off-DADS and Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning:

@article{sharma2020emergent,
    title={Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning},
    author={Sharma, Archit and Ahn, Michael and Levine, Sergey and Kumar, Vikash and Hausman, Karol and Gu, Shixiang},
    journal={arXiv preprint arXiv:2004.12974},
    year={2020}
}

Disclaimer

This is not an officially supported Google product.

dads's People

Contributors

architsharma97 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

dads's Issues

Config for Humanoid environment.

Hi Archit,

Thanks for sharing the code of this great work.
I would like to ask if you can provide the config file of off-policy DADS for Humanoid environment. It seems to me that the configuration for off-policy DADS would be a bit different from the numbers specified in the paper. It would be helpful if we can have access to it. Thanks!

Problem reproducing humanoid results

Hi there. I tried to reproduce the humanoid skills demonstrated on your site, but it seems that neither of these two humanoid config files can reproduce the results. It would be great if you check this. The commands I use are:

python unsupervised_skill_learning/dads_off.py --logdir=./exp --flagfile=configs/humanoid_offpolicy.txt
and
python unsupervised_skill_learning/dads_off.py --logdir=./exp --flagfile=configs/humanoid_onpolicy.txt

DADS reward implementation

Thank you for sharing your great code :)

I think I found that the reward function is a little different from what was defined in the paper(iclr2020):

# final DADS reward
intrinsic_reward = np.log(num_reps + 1) - np.log(1 + np.exp(
np.clip(logp_altz - logp.reshape(1, -1), -50, 50)).sum(axis=0))

As far as I understand, the first reward term defined in eq. 6 of the paper is log q(s'|s,z) - log(\sum_{i=1}^{L}{q(s'|s,z_i)}). But the reward in this repo is defined as \sum_{i=1}^{L} {log q(s'|s,z) - log q(s'|s,z_i)} with numpy's broadcasting functionality. May I ask if I misunderstood or if there is any practical technique I'm missing?

tf-agents and tensorflow versions incompatible

The current environment setup produces error AttributeError: module 'tf_agents.policies.py_policy' has no attribute 'Base'. This seems to be caused by the incompatibility between tensorflow==2.2.0 and tf-agents==0.4.0.

My suggestion is to update tf-agents==0.4.0 to tf-agents==0.5.0 and add cloudpickle==1.4.1, which work on my end.

GMM explainations

Sorry if this is a little open ended. I am making a pytorch dads implementation and have never worked with GMMs before. I am curious about the behavior I should expect from a GMM. Currently I have a toy problem just to show what kind of data I'm playing with:

hot_air_ballon_start_states=torch.rand((5,1))
hot_air_ballon_start_actions=torch.rand((5,1))-0.5
hot_air_ballon_next_states=hot_air_ballon_start_states+hot_air_ballon_start_actions

next_timesteps=hot_air_ballon_next_states-hot_air_ballon_start_states    
x=torch.hstack([hot_air_ballon_start_states,hot_air_ballon_start_actions])
dist=gmm(x)

Full example and code: here
Note: I do plan to use batch norm and feed in a fully connected layer into the GMM like in the DADS implementation but I trying to get an mvp right now.

From my understanding, the GMM can:

  • Be used to predict what the next state will literally look like.
  • Used to predict the current probability of the next state given the input.
    So basically, GMMs can do double-duty of generating and evaluating data which is cool.

Some questions:

  • Can GMMs overfit? For debugging purposes I am wondering whether I should expect my GMM to be exact or whether I can only hope that it gets somewhere close. For example below:
    image

  • Is there any guide on how many components I should use for a GMM? At what point should the number of components increase? Is this mostly trial and error?

  • When should you fix_variance?

  • Is use_modal_mean only possible when the number of components > 1?

    • Does a GMM perform worse if components > 1 and use_modal_mean==False?
  • How do GMMs scale? If I flatten an image, does this become intractable?

Also let me know if there is a better forum to ask you all about this :)

Need to edit env.yml

The recent version of cloudpickle is not compatible with some library and package like tf-agent and tensorflow.

So I recommend to edit the env.yml to include - cloudpickle==1.4.1 for specifying the version of cloudpickle.

Thank you in advance.

Some problems when using this code in discrete action gym environment

At first, Thank you very much for sharing this code!!!

Now I'm applying this open source code to discrete action environment of gym, but I've come across the following problems.

(1) From value of the info['logp_altz'] of agent.train_loop(), I find that for the same (s, s'), different skills get very similar log_probability, which seems that skill (i.e z) has very very little influence on q (s' | s, z). Do you have any guidance for this problem?

(2) And from the variation tendency of info['logp_altz'],the log_probability is always converge a terrible value in (-2, -1.8). So the probability is always converge a value located in (0.13, 0.17). Surely, the situation is different when I use different gym environment,but all of them converge a not good value.

(3) How can I record the loss value when I train skill dynamics? I'd like to know the trend of skill dynamics training.

(4) Is the skill type related to the action type (discrete or continue) of the environment? What I mean is can I use 'cont_uniform' of skill_type in a discrete action environment of gym. From reading of this code, I think the skill_type has nothing to do with action type.

Thank you for reading my questions. And I would appreciate it if you could answer some of them!!!

env.yml

Hi, Archit, thanks for sharing your code.

I have the following issue: The package "tf-nightly==2.2.0.dev20200229" in env.yml could not be installed. I installed with the latest version "2.5.0.dev20200629" but there is some version issue. from tensorflow.python.autograph.core import naming ImportError: cannot import name 'naming'.

Thanks

Check failed: work_element_count > 0

Hello Archit, thanks for the code!!

I downloaded the dads repository and created a conda environment using the env.yml file.
When I run this command:
python unsupervised_skill_learning/dads_off.py --logdir=logs/ --flagfile=configs/ant_xy_onpolicy.txt
(after changing the Ant environment to Ant-v3)

I get this error:
2020-03-24 14:20:44.165003: F ./tensorflow/core/util/gpu_launch_config.h:129] Check failed: work_element_count > 0 (0 vs. 0)
Fatal Python error: Aborted

Thread 0x00007f67acff9700 (most recent call first):
File "/home/dpshah2/miniconda3/envs/dads-env/lib/python3.6/threading.py", line 295 in wait
File "/home/dpshah2/miniconda3/envs/dads-env/lib/python3.6/queue.py", line 164 in get
File "/home/dpshah2/.local/lib/python3.6/site-packages/tensorflow_core/python/summary/writer/event_file_writer.py", line 159 in run
File "/home/dpshah2/miniconda3/envs/dads-env/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/home/dpshah2/miniconda3/envs/dads-env/lib/python3.6/threading.py", line 884 in _bootstrap

Thread 0x00007f71d3555740 (most recent call first):
File "/home/dpshah2/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1445 in _call_tf_sessionrun
File "/home/dpshah2/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1352 in _run_fn
File "/home/dpshah2/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1367 in _do_call
File "/home/dpshah2/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1361 in _do_run
File "/home/dpshah2/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1183 in _run
File "/home/dpshah2/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 960 in run
File "/data/dpshah2/SaurabhG/dads/unsupervised_skill_learning/skill_dynamics.py", line 399 in train
File "unsupervised_skill_learning/dads_off.py", line 1406 in main
File "/home/dpshah2/.local/lib/python3.6/site-packages/absl/app.py", line 250 in _run_main
File "/home/dpshah2/.local/lib/python3.6/site-packages/absl/app.py", line 299 in run
File "/home/dpshah2/.local/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40 in run
File "unsupervised_skill_learning/dads_off.py", line 1711 in
Aborted (core dumped)

Could you help out here!! Thank you :)

Environment configuration error

Hi, Archit, thanks for sharing your code.

I had some trouble when configuring the environment,

(1) What's the version of mujoco you used in your project? In README.md, the version of mujoco is 150, but in env.yml, it require "mujoco-py==2.0.2.5".

(2) The package "tf-nightly==1.15.0.dev20190819" in env.yml could not be installed. I checked the site https://pypi.org/project/tf-nightly/#history, but I can't find the version of 1.15.0.dev20190819.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.