GithubHelp home page GithubHelp logo

1jsingh / rl_reacher Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 1.0 22.11 MB

Train double-jointed arms to reach target locations using Proximal Policy Optimization (PPO) in Pytorch

License: MIT License

Jupyter Notebook 92.78% Python 7.22%
unity-environment reacher-environment proximal-policy-optimization ppo ddpg pytorch

rl_reacher's Introduction

About

Train double-jointed arms to reach target locations using Proximal Policy Optimization (PPO) in Pytorch

Table of Contents

Agent Output Demo

Single Agent

Trained_agent

Multiple Agents

Trained_agent

Reacher Environment

  • Set-up: Double-jointed arm which can move to target locations.
  • Goal: The agents must move it's hand to the goal location, and keep it there.
  • Agents: The environment contains 10 agent linked to a single Brain.
  • Agent Reward Function (independent):
    • +0.1 Each step agent's hand is in goal location.
  • Brains: One Brain with the following observation/action space.
    • Vector Observation space: 26 variables corresponding to position, rotation, velocity, and angular velocities of the two arm Rigidbodies.
    • Vector Action space: (Continuous) Size of 4, corresponding to torque applicable to two joints.
    • Visual Observations: None.
  • Reset Parameters: Two, corresponding to goal size, and goal movement speed.
  • Benchmark Mean Reward: 30

Setup

System Configuration

The project was built with the following configuration:

  • Ubuntu 16.04
  • CUDA 10.0
  • CUDNN 7.4
  • Python 3.6 (currently ml-agents unity package does not work with python=3.7)
  • Pytorch 1.0

Though not tested, the project can still be expected to work out of the box for most reasonably deviant configurations.

Environment Setup

  • Create separate virtual environment for the project using the provided environment.yml file
conda env create -f environment.yml
conda activate reacher

Instructions for getting started!

  1. Clone the repository (if you haven't already!)
git clone https://github.com/1jsingh/rl_reacher.git
cd rl_reacher
  1. Download the environment from one of the links below. You need only select the environment that matches your operating system:

    (For AWS) If you'd like to train the agent on AWS (and have not enabled a virtual screen), then please use this link to obtain the "headless" version of the environment. You will not be able to watch the agent without enabling a virtual screen, but you will be able to train the agent. (To watch the agent, you should follow the instructions to enable a virtual screen, and then download the environment for the Linux operating system above.)

  2. Place the downloaded file in the unity_envs directory and unzip it.

mkdir unity_envs && cd unity_envs
unzip Reacher_Linux.zip
  1. Follow along with Reacher-ppo.ipynb or Reacher-ddpg.ipynb to train your own RL agent.

Project Structure

  • model.py: code for actor and critic class
  • ddpg.py: DDPG agent with experience replay and OU Noise
  • Reacher-ppo.ipynb: notebook for training PPO based RL agent
  • Reacher-ddpg.ipynb: notebook for training DDPG based RL agent
  • unity_envs: directory for Reacher unity environments
  • trained_models: directory for saving trained RL agent models

Reward Curve

  • PPO agent reward_curve-ppo
  • DDPG agent reward_curve-ddpg

Note: DDPG has higher sample efficiency than PPO

Distributed Training

The environment consists of 20 parallel agents which is useful for algorithms like PPO, A3C, and D4PG that use multiple (non-interacting, parallel) copies of the same agent to distribute the task of gathering experience.

rl_reacher's People

Contributors

1jsingh avatar

Watchers

 avatar  avatar

Forkers

xunyiljg

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.