GithubHelp home page GithubHelp logo

lx10077 / rlpy Goto Github PK

View Code? Open in Web Editor NEW
1.0 0.0 1.0 4.21 MB

A pytorch-version implementation of RL algorithms. Now it collects TRPO, ClipPPO, A2C, GAIL and ADCV.

Python 100.00%
reinforcement-learning trpo ppo a2c mujoco-environments pytorch control-variates

rlpy's Introduction

PyTorch implementation of reinforcement learning algorithms

Important notes

  • To run mujoco environments, first install mujoco-py and suggested modified version of gym which supports mujoco 1.50.
  • Make sure the version of Pytorch is at least 0.4.0.
  • If you have a GPU, you are recommended to set the OMP_NUM_THREADS to 1, since PyTorch will create additional threads when performing computations which can damage the performance of multiprocessing. (This problem is most serious with Linux, where multiprocessing can be even slower than a single thread):
export OMP_NUM_THREADS=1
  • Code structure: Agent collects samples; Trainer facilitates learning and training; Evaluator tests trained models in new environments. All examples are placed under config file.
  • After training several agents on one environment, you can plot the training process in one figure by
python utils/plot.py --env-name <ENVIRONMENT_NAME> --algo <ALGORITHM1,...,ALGORITHMn>  --x_len <ITERATION_NUM> --save_data

Policy Gradient Methods

Example

python config/pg/ppo_gym.py --env-name Hopper-v2 --max-iter-num 1000 --gpu

Reference

Results

We test the policy gradient codes in these Mujoco environments with default parameters.

Generative Adversarial Imitation Learning

To save trajectory

If you want to do GAIL but without existing expert trajectories, TrajGiver will help us generate it. However, make sure the export policy has been generated and saved (i.e. train a TRPO or PPO agent on the same environment) such that TrajGiver would automatically first find the export directory, then load the policy network and running states, and eventually run the well-trained policy on desired environment.

To do imitation learning

python config/gail/gail_gym.py --env-name Hopper-v2 --max-iter-num 1000  --gpu

Action Dependent Control Variate

Example

python config/adcv/v_gym.py --env-name Walker2d-v2 --max-iter-num 1000 --variate mlp --opt minvar --gpu

Results

rlpy's People

Contributors

lx10077 avatar

Stargazers

 avatar

Forkers

liuqi8827

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.