GithubHelp home page GithubHelp logo

sonsang / diffrl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nvlabs/diffrl

3.0 1.0 2.0 1.3 GB

[ICLR 2022] Accelerated Policy Learning with Parallel Differentiable Simulation

Home Page: https://short-horizon-actor-critic.github.io/

License: Other

Shell 1.97% C++ 2.81% Python 90.97% C 3.96% MATLAB 0.17% Makefile 0.06% Batchfile 0.07%

diffrl's Introduction

SHAC

This repository contains the implementation for the paper Accelerated Policy Learning with Parallel Differentiable Simulation (ICLR 2022).

In this paper, we present a GPU-based differentiable simulation and propose a policy learning method named SHAC leveraging the developed differentiable simulation. We provide a comprehensive benchmark set for policy learning with differentiable simulation. The benchmark set contains six robotic control problems for now as shown in the figure below.

envs

Installation

  • git clone https://github.com/NVlabs/DiffRL.git --recursive

  • The code has been tested on

    • Operating System: Ubuntu 16.04, 18.04, 20.04, 21.10, 22.04
    • Python Version: 3.7, 3.8
    • GPU: TITAN X, RTX 1080, RTX 2080, RTX 3080, RTX 3090, RTX 3090 Ti

Prerequisites

  • In the project folder, create a virtual environment in Anaconda:

    conda env create -f diffrl_conda.yml
    conda activate shac
    
  • dflex

    cd dflex
    pip install -e .
    
  • rl_games, forked from rl-games (used for PPO and SAC training):

    cd externals/rl_games
    pip install -e .
    
  • Install an older version of protobuf required for TensorboardX:

    pip install protobuf==3.20.0
    

Test Examples

A test example can be found in the examples folder.

python test_env.py --env AntEnv

If the console outputs Finish Successfully in the last line, the code installation succeeds.

Training

Running the following commands in examples folder allows to train Ant with SHAC.

python train_shac.py --cfg ./cfg/shac/ant.yaml --logdir ./logs/Ant/shac

We also provide a one-line script in the examples/train_script.sh folder to replicate the results reported in the paper for both our method and for baseline method. The results might slightly differ from the paper due to the randomness of the cuda and different Operating System/GPU/Python versions. The plot reported in paper is produced with TITAN X on Ubuntu 16.04.

SHAC (Our Method)

For example, running the following commands in examples folder allows to train Ant and SNU Humanoid (Humanoid MTU in the paper) environments with SHAC respectively for 5 individual seeds.

python train_script.py --env Ant --algo shac --num-seeds 5
python train_script.py --env SNUHumanoid --algo shac --num-seeds 5

Baseline Algorithms

For example, running the following commands in examples folder allows to train Ant environment with PPO implemented in RL_games for 5 individual seeds,

python train_script.py --env Ant --algo ppo --num-seeds 5

Testing

To test the trained policy, you can input the policy checkpoint into the training script and use a --play flag to indicate it is for testing. For example, the following command allows to test a trained policy (assume the policy is located in logs/Ant/shac/policy.pt)

python train_shac.py --cfg ./cfg/shac/ant.yaml --checkpoint ./logs/Ant/shac/policy.pt --play [--render]

The --render flag indicates whether to export the video of the task execution. If does, the exported video is encoded in .usd format, and stored in the examples/output folder. To visualize the exported .usd file, refer to USD at NVIDIA.

Citation

If you find our paper or code is useful, please consider citing:

  @inproceedings{xu2021accelerated,
    title={Accelerated Policy Learning with Parallel Differentiable Simulation},
    author={Xu, Jie and Makoviychuk, Viktor and Narang, Yashraj and Ramos, Fabio and Matusik, Wojciech and Garg, Animesh and Macklin, Miles},
    booktitle={International Conference on Learning Representations},
    year={2021}
  }

diffrl's People

Contributors

eanswer avatar sonsang avatar viktorm avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.