GithubHelp home page GithubHelp logo

arnomoonens / yarll Goto Github PK

View Code? Open in Web Editor NEW
84.0 3.0 28.0 2.9 MB

Combining deep learning and reinforcement learning.

License: MIT License

Python 98.00% Jupyter Notebook 2.00%
reinforcement-learning a3c openai-gym policy-gradient reinforcement-learning-algorithms deep-reinforcement-learning tensorflow sarsa python soft-actor-critic

yarll's Introduction

Yet Another Reinforcement Learning Library (YARLL)

Codacy Badge

Update 14/05/2021: Added PyTorch implementation of REINFORCE.
Update 11/05/2021: Added PyTorch implementation of SAC.
Update 13/04/2021: Converted DDPG to Tensorflow 2.

Status

Different algorithms have currently been implemented (in no particular order):

Asynchronous Advantage Actor Critic (A3C)

The code for this algorithm can be found here. Example run after training using 16 threads for a total of 5 million timesteps on the PongDeterministic-v4 environment:

Pong example run

How to run

First, install the library using pip (you can first remove OpenCV from the setup.py file if it is already installed):

pip install yarll

To use the library on a specific branch or to use it while changing the code, you can add the path to the library to your $PYTHONPATH (e.g., in your .bashrc or .zshrc file):

export PYTHONPATH=/path/to/yarll:$PYTHONPATH

Alternatively, you can add a symlink from your site-packages to the yarll directory.

Algorithms/experiments

You can run algorithms by passing the path to an experiment specification (which is a file in json format) to main.py:

python yarll/main.py <path_to_experiment_specification>

You can see all the possible arguments by running python yarll/main.py -h.

Examples of experiment specifications can be found in the experiment_specs folder.

Statistics

Statistics can be plot using:

python -m yarll.misc.plot_statistics <path_to_stats>

<path_to_stats> can be one of 2 things:

  • A json file generated using gym.wrappers.Monitor, in case it plots the episode lengths and total reward per episode.
  • A directory containing TensorFlow scalar summaries for different tasks, in which case all of the found scalars are plot.

Help about other arguments (e.g. for using smoothing) can be found by executing python -m yarll.misc.plot_statistics -h.

Alternatively, it is also possible to use Tensorboard to show statistics in the browser by passing the directory with the scalar summaries as --logdir argument.

yarll's People

Contributors

arnomoonens avatar codacy-badger avatar dependabot-preview[bot] avatar dependabot[bot] avatar gitter-badger avatar hknozturk avatar plibin-vub avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

yarll's Issues

Closest String Problem Inquiry

Hello there,

I'm currently researching and working on the Closest String Problem and came across your code implementing an MMAS and ACS ant colony algorithm. Your code was super helpful! Would you be able to point me towards the exact paper that your code is implemented from?

Thanks so much!
Shirley

Dependabot can't evaluate your Python dependency files

Dependabot can't evaluate your Python dependency files.

As a result, Dependabot couldn't check whether any of your dependencies are out-of-date.

The error Dependabot encountered was:

InstallationError("Invalid requirement: 'gast=0.2.2' (from line 3 of /home/dependabot/dependabot-updater/dependabot_tmp_dir/requirements.txt)\nHint: = is not a valid operator. Did you mean == ?")

View the update logs.

Actor Critic for Mountain Car

Hi,
Thanks a lot for the wonderful work, the code is well written and modular, it helped me a lot to play with RL in general. I want to know if you have an opinion on using Actor Critic methods on control problem such as Mountain Car. I added Mountain Car to the environment and ran A2C and A3C, both of them never converged to a solution. I modified the hyperparameter discount factor to 0.9 and got both Actor loss and Critic Loss to be zero but the reward never reduced below -200. I then tried adding Entropy to improve exploaration but that leads to exploding gradients. Looking deeper, it looks like Actor Critic methods are not good at exploring the space if the return is constant.

I looked at the other implementations of solving Mountain Car and it is solved either using Function Approximation (Tile coding) or DPG, there was no one who had used Actor Critic.

yarll.misc.exceptions.ClassNotRegisteredError

@arnomoonens
Hi,

I tried running the CartPole-v0-FittedQ-experiment, but I got the error below. Could you please take a look at it? Thank you!

I observed that the register_agent is not called from the main file.

yarll.misc.exceptions.ClassNotRegisteredError: The agent FittedQIteration for state dimensionality continuous, action space discrete and RNN=False is not registered.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.