GithubHelp home page GithubHelp logo

cxia30 / deeprl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from arnomoonens/yarll

0.0 1.0 0.0 1.1 MB

Combining deep learning and reinforcement learning.

License: MIT License

Python 96.37% Jupyter Notebook 3.63%

deeprl's Introduction

Deep Reinforcement Learning

Codacy Badge

This code is part of my master thesis at the VUB, Brussels.

Status

Different algorithms have currently been implemented:

Sarsa + function approximation

The following parts are combined to learn to act in the Mountain Car environment:

  • Sarsa
  • Eligibility traces
  • epsilon-greedy action selection policy
  • Function approximation using tile coding

Example of a run after training with a total greedy action selection policy for 729 episodes of each 200 steps: Example run

Total reward per episode: Total reward per episode

Note that, after a few thousand episodes, the algorithm still isn't capable of consistently reaching the goal in less than 200 steps.

REINFORCE

Adapted version of this code in order to work with Tensorflow. Total reward per episode when applying this algorithm on the CartPole-v0 environment: Total reward per episode using REINFORCE

Karpathy Policy Gradient

Adapted version of the code of this article of Andrej Karpathy. Total reward per episode when applying this algorithm on the CartPole-v0 environment: Total reward per episode using Karpathy

How quickly the optimal reward is reached and kept heavily varies however because of randomness. Results of an earlier execution are also posted on the OpenAI Gym.

Advantage Actor Critic

Total reward per episode when applying this algorithm on the CartPole-v0 environment: Total reward per episode using A2C

OpenAI Gym page

Asynchronous Advantage Actor Critic

Total reward per episode when applying this algorithm on the CartPole-v0 environment: Total reward per episode using A3C

This only shows the results of one of the A3C threads. Results of another execution are also posted on the OpenAI Gym. Results of an execution using the Acrobot-v1 environment can also be found on OpenAI Gym.

How to run

First, install the requirements using pip:

pip install -r requirements.txt

Algorithms/experiments

You can run algorithms by passing an experiment specification (in json format) to main.py:

python main.py <experiment_description>

Example of an experiment specification

Statistics

Statistics can be plot using:

python misc/plot_statistics.py <path_to_stats>

<path_to_stats> can be one of 2 things:

  • A json file generated using gym.wrappers.Monitor, in case it plots the episode lengths and total reward per episode.
  • A directory containing TensorFlow scalar summaries for different tasks, in which case all of the found scalars are plot.

Help about other arguments (e.g. for using smoothing) can be found by executing python misc/plot_statistics.py -h.

Alternatively, it is also possible to use Tensorboard to show statistics in the browser by passing the directory with the scalar summaries as logdir argument.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.