GithubHelp home page GithubHelp logo

zjq0717 / safe-slac Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lava-lab/safe-slac

0.0 0.0 0.0 23.74 MB

Safe SLAC, an algorithm for safe cost-constrained reinforcement learning in high-dimensional POMDPs.

License: MIT License

Python 100.00%

safe-slac's Introduction

Safe Stochastic Latent Actor-Critic in PyTorch

This is a PyTorch implementation of Safe Stochastic Latent Actor-Critic, proposed in:

The implementation is based on Toshiki Watanake's PyTorch implementation of SLAC[1].

Sample rollouts

Shown here are some videos of example rollouts after training Safe SLAC in the SafetyGym6 environments. The leftmost image shows the observation, the middle is the latent variable model reconstruction and the rightmost image shows an overview of the environment. The objective is for the red agent to avoid blue and purple hazards while reaching the green goal. The environments shown are PointGoal1, PointGoal2, DoggoGoal1, CarGoal1, PointButton1, PointPush1.

Setup

The dependencies can be installed using pip install -r requirements.txt. Please see mujoco-py for instructions on setting up Mujoco. Mujoco 2.0 requires a license file, which can be obtained freely together with the binaries at roboti.us.

Since not all dependencies are available at PyPi for the most recent versions of Python, the easiest way to get started is to install the dependencies inside a conda virtual environment based on Python 3.8, i.e.

conda create -n "safe-slac" python==3.8
conda activate safe-slac
pip install -r requirements.txt

Depending on your machine, you may need to deviate from the specified PyTorch installation. In that case, please install PyTorch following instructions here.

Should you encounter Numpy compatibility issues, a possible fix is to install Numpy 1.22.4 instead of the version required by SafetyGym: pip install numpy==1.22.4.

You can train Safe SLAC as shown in the following example. Hyperparameters are constant across the various tasks.

python train_safe.py --domain_name Safexp --task_name PointGoal1 --seed 0 --cuda
> python train_safe.py --help
usage: train_safe.py [-h] [--num_steps NUM_STEPS] [--domain_name DOMAIN_NAME] [--task_name TASK_NAME] [--seed SEED]
                     [--cuda]

optional arguments:
  -h, --help            show this help message and exit
  --num_steps NUM_STEPS
                        Number of training steps
  --domain_name DOMAIN_NAME
                        Name of the domain
  --task_name TASK_NAME
                        Name of the task
  --seed SEED           Random seed
  --cuda                Train using GPU with CUDA

References

[1] Lee, Alex X., et al. "Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model." arXiv preprint arXiv:1907.00953 (2019).

safe-slac's People

Contributors

tdsimao avatar yhogewind avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.