GithubHelp home page GithubHelp logo

cpnota / autonomous-learning-library Goto Github PK

View Code? Open in Web Editor NEW
638.0 23.0 72.0 6.39 MB

A PyTorch library for building deep reinforcement learning agents.

License: MIT License

Python 99.83% Makefile 0.17%
reinforcement-learning reinforcement-learning-algorithms deep-reinforcement-learning soft-actor-critic proximal-policy-optimization deep-q-learning advantage-actor-critic deep-deterministic-policy-gradient sac a2c

autonomous-learning-library's Introduction

The Autonomous Learning Library: A PyTorch Library for Building Reinforcement Learning Agents

The autonomous-learning-library is an object-oriented deep reinforcement learning (DRL) library for PyTorch. The goal of the library is to provide the necessary components for quickly building and evaluating novel reinforcement learning agents, as well as providing high-quality reference implementations of modern DRL algorithms. The full documentation can be found at the following URL: https://autonomous-learning-library.readthedocs.io.

Tools for Building New Agents

The primary goal of the autonomous-learning-library is to facilitate the rapid development of new reinforcement learning agents by providing common tools for building and evaluation agents, such as:

  • A flexible function Approximation API that integrates features such as target networks, gradient clipping, learning rate schedules, model checkpointing, multi-headed networks, loss scaling, logging, and more.
  • Various memory buffers, including prioritized experience replay (PER), generalized advantage estimation (GAE), and more.
  • A torch-based Environment interface that simplies agent implementations by cutting out the numpy middleman.
  • Common wrappers and agent enhancements for replicating standard benchmarks.
  • Slurm integration for running large-scale experiments.
  • Plotting and logging utilities including tensorboard integration and utilities for generating common plots.

See the documentation guide for a full description of the functionality provided by the autonomous-learning-library. Additionally, we provide an example project which demonstrates the best practices for building new agents.

High-Quality Reference Implementations

The autonomous-learning-library separates reinforcement learning agents into two modules: all.agents, which provides flexible, high-level implementations of many common algorithms which can be adapted to new problems and environments, and all.presets which provides specific instansiations of these agents tuned for particular sets of environments, including Atari games, classic control tasks, and MuJoCo/Pybullet robotics simulations. Some benchmark results showing results on-par with published results can be found below:

atari40 atari40 pybullet

As of today, all contains implementations of the following deep RL algorithms:

  • Advantage Actor-Critic (A2C)
  • Categorical DQN (C51)
  • Deep Deterministic Policy Gradient (DDPG)
  • Deep Q-Learning (DQN) + extensions
  • Proximal Policy Optimization (PPO)
  • Rainbow (Rainbow)
  • Soft Actor-Critic (SAC)

It also contains implementations of the following "vanilla" agents, which provide useful baselines and perform better than you may expect:

  • Vanilla Actor-Critic
  • Vanilla Policy Gradient
  • Vanilla Q-Learning
  • Vanilla Sarsa

Installation

First, you will need a new version of PyTorch (>1.3), as well as Tensorboard. Then, you can install the core autonomous-learning-library through PyPi:

pip install autonomous-learning-library

You can also install all of the extras (such as Gym environments) using:

pip install autonomous-learning-library[all]

Finally, you can install directly from this repository including the dev dependencies using:

git clone https://github.com/cpnota/autonomous-learning-library.git
cd autonomous-learning-library
pip install -e .[dev]

Running the Presets

If you just want to test out some cool agents, the library includes several scripts for doing so:

all-atari Breakout a2c

You can watch the training progress using:

tensorboard --logdir runs

and opening your browser to http://localhost:6006. Once the model is fully trained, you can watch the trained model play using:

all-watch-atari Breakout "runs/a2c_[id]/preset.pt"

where id is the ID of your particular run. You should should be able to find it using tab completion or by looking in the runs directory. The autonomous-learning-library also contains presets and scripts for classic control and PyBullet environments.

If you want to test out your own agents, you will need to define your own scripts. Some examples can be found in the examples folder). See the docs for information on building your own agents!

Note

This library was built in the Autonomous Learning Laboratory (ALL) at the University of Massachusetts, Amherst. It was written and is currently maintained by Chris Nota (@cpnota). The views expressed or implied in this repository do not necessarily reflect the views of the ALL.

Citing the Autonomous Learning Library

We recommend the following citation:

@misc{nota2020autonomous,
  author = {Nota, Chris},
  title = {The Autonomous Learning Library},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/cpnota/autonomous-learning-library}},
}

autonomous-learning-library's People

Contributors

andrewsmike avatar benblack769 avatar cpnota avatar jkterry1 avatar mctigger avatar michalgregor avatar ryannavillus avatar scottjordan avatar tipavlos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

autonomous-learning-library's Issues

Refactor approximation module

There is a lot of repeated/inconsistent code. The Network classes in particular should probably have more shared structure.

GaussianPolicy uses tanh two times

Hey,
can you explain to me why tanh is used here:

means = self._squash(torch.tanh(outputs[:, 0:action_dim]))
if not self.training:
return means
logvars = outputs[:, action_dim:] * self._scale
std = logvars.exp_()
return Independent(Normal(means, std), 1)
def _squash(self, x):
return torch.tanh(x) * self._scale + self._center

and in _sqash ?

Thank you :)

make `layer` module an extension of `nn`

That way, we can use one import to construct our models. E.g.:

from all import nn

model = nn.Sequential(
    nn.Conv2d(frames, 32, 8, stride=4),
    nn.ReLU(),
    nn.Conv2d(32, 64, 4, stride=2),
    nn.ReLU(),
    nn.Conv2d(64, 64, 3, stride=1),
    nn.ReLU(),
    nn.Flatten(),
    nn.Linear(3456, 512),
    nn.ReLU(),
    nn.Linear0(512, env.action_space.n)
)

Where Flatten and Linear0 come from all and the rest come directly from torch.

README

Add a README for the repository.

Documentation on StochasticPolicy

Hi, I am trying to use the PPO algorithm; however, it's not clear how to construct the stochastic policy. Should I use the Gaussian policy network?

Cool library by the way; I like the modularity!

Abstract Classes

We should make our abstract classes (interfaces) explicit. At least:

  • Agent
  • Policy
  • Approximation
  • Basis
  • Environment

The design choices regarding

Hi @cpnota, thanks for the great work, I found it to be a very high-quality reference implementation. There is something the caught my attention when I went through the documentation though. Specifically, the design of the agents' interface is

class Agent(ABC):
    @abstractmethod
    def act(self, state, reward):
        pass

    @abstractmethod
    def eval(self, state, reward):
        pass

But why does the act method require a reward to be passed? My understanding is that the agent only receives a reward once the action has been commited and then the env returns a reward. Would love to hear the reasons behind your design choices.

Unused variables in atari_wrappers.py

justinkterry@preacherMan environments % flake8 --ignore "E501,E731,E74,E402,F401,W503,E128"
./atari_wrappers.py:58:27: E221 multiple spaces before operator
./atari_wrappers.py:77:30: F821 undefined name 'kwargs'
./atari_wrappers.py:80:36: F821 undefined name 'kwargs'
./atari_wrappers.py:85:16: F821 undefined name 'lives'
./atari_wrappers.py:85:39: F821 undefined name 'lives'

vanilla series agents

The "non-deep" agents should be grouped as part of the "vanilla" (V) series of agents. That is:

  • vac (vanilla actor-critic)
  • vpg (vanilla policy gradient i.e. REINFORCE)
  • vqn (vanilla q-network)
  • vsarsa (vanilla sarsa)

Make writer level togglable

It is convenient for the writer to output the loss locally, but on big benchmark runs it is too much data. There should be an easy way to modify what is logged and what is not.

Motivation behind `nn.Linear0`

Hi, in a lot of the preset models, I saw the use of nn.Linear0 layer, whiuch initialize the weights and biases to be zero. I have never seen such initialization technique. the initialization technique I have seen is usually orthogonal initialization with different scale. If you do not mind, could you elaborate your justification on its usage? Furthermore, have you had experiments that suggest the usageof nn.Linear0 is important compared to just using a vanilla nn.Linear?

Thanks.

Enforce PEP 8

  • Add Pylint, set to enforce PEP 8
  • Make code meet PEP 8 standards

Explicit usage of target network

Right now, target network usage is implicit by calling q.eval(state, action). A different function should be added that makes this explicit: q.target(state, action). This removes the ambiguity, and allows the q.eval to be used more explicitly for the purpose of not creating the computation graph.

Simplify Experiment API

Right now the Experiment API is like:

# old, bad way
experiment = Experiment(env)
exeriment.run(agent)

It should be simplified to just:

Experiment(agent, env).run()

In the future, we could even add additional methods:

Experiment(agent, env)
    .run()
    .log(format='json')
    .show()

Or something similar. Alternately, we could do something like:

Experiments(agents, envs).run()

Idk.

Imports in scripts

@cpnota you may want to read through the imports in the files in the scripts folder. I'm pretty sure a decent number are unnecessary.

[Question] Change output path

Hello!

Really clean and good looking RL lib for Pytorch. I tried all of them, and that's the most I liked so far!

However I wonder how to change output folder path. Seems like I need to change the log_path, defined inside ExperimentWriter class, but I guess there is no way to overwrite it, without changing all the run_experiment internals?

Make package structure flatter

e.g.:

# already have this: good
from all.agents import Sarsa

# current: bad
from all.approximation.bases import FourierBasis
# desired: good
from all.approximation import FourierBasis

We would like to do this without modifying the folder structure. Not sure how to set up the __init__.py to expose nested submodules.

Learning Curve

We need a way to run simple experiments, probably starting with a simple learning curve.

Evaluation Metrics

We could use some additional evaluation metrics, such as:

  1. Max reward achieved so far
  2. Average/max reward over the last 100 episodes

Save and Load Models

One problem is that there is no way to save or load models that have been trained. This should be added!

Test every nth episode

Currently the experiment runner first runs the training, then the testing. However, often it is interesting to get intermediate test results while training. E.g. run (and log) a test episode every nth training episode.

This should be pretty easy to implement by adding a new Experiment class that just calls test() every nth episode in train()

Bodies Documentation

I'm sorting out an odd error in the time feature body related to the PPO mujuco thing. However, there's no documentation in the code or documentation on what specific bodies actually do.

Add `scripts` folder

Add a top-level scripts folder for running common tasks:

  1. local atari/classic control
  2. gypsum experiments
  3. etc.

policy.no_grad is used for q target computation in SAC

Hi cpnota,

Although it may not affect the performance too much, I think this can be the wrong implementation when it is combined with polyak update.

Your SAC uses policy.no_grad for q target computation(

_actions, _log_probs = self.policy.no_grad(states)
).
According to the openai spinningup, the action should be output by policy.target.
https://github.com/openai/spinningup/blob/038665d62d569055401d91856abb287263096178/spinup/algos/pytorch/sac/sac.py#L191

Thank you!

Saving agents

Currently there seems to be no way to save an agent.

My suggestion is to add state_dict to the Agent API similar to torch.nn.Module. For each agent one can then manually implement this method to for example return the model, optimizer and scheduler states in a single dict. Then a experiment could create checkpoints and resume training after interruption or save the final trained agent.

Improve feature handling

Consider an actor-critic network with shared features. Currently, backpropagation through the features is occurring twice: once for the backwards pass through the actor and once through the backwards pass for the critic. Even though only one gradient step is being taken, this is not good.

Fix it!

Make run_experiment more modular

def run_experiment(
agents,
envs,
frames,
test_episodes=100,
render=False,
quiet=False,
write_loss=True,
):

run_experiment should only take agents, envs and experiment. Then we could call it like this:

def make_experiment(agent, env):
    return CustomExperiment(
        agent, env, render=False, quiet=False, write_loss=True # Allows for any interface
    )

run_experiment(
         agents, envs, make_experiment
)

Frames argument issue

In examples/experiments.py, you consistently refer to max steps as "timesteps". Borrowing from that, I passed the argument timesteps=1e6 in a sample test I was doing, and it turns out the argument is called fames. This is different that what the example makes it seem like, and form the argument to plot_returns_100. Ideally this would be standardized.

[VAC] No reset after episode end?

Hi @cpnota,
I was just wondering what happens at the end of an episode? Afaik the end of an episode is not registered by the agent and so self._features is never reset. At the end of an episode the last state features is used to calculate the value-target for the first state of the next episode and leads to wrong value targets?

def act(self, state, reward):
self._train(state, reward)
self._features = self.features(state)
self._distribution = self.policy(self._features)
self._action = self._distribution.sample()
return self._action
def eval(self, state, _):

Command line scripts with pip install

Right now, to run the scripts, you need to clone the repository rather than installing with pip. The scripts should be available via the command line when the library is installed.

eval() or act()

Hi,

I'm not clear that whether self._agent.act() or self._agent.eval() should be used right following self._env.reset() in function" _run_test_episode" of "single_env_experiment.py". It seems that in this case eval() makes more sense.

Thanks!

torch-testing not maintained

The tests depend on the torch-torching library. It appears that this is not being maintained, so it would be good to refactor this tests such that the dependency can be removed. It might be nice to reuse PyTorch's internal test code, but I worry about the dependencies breaking.

log() outputs nan in SoftDeterministicPolicyNetwork.sample()

Hi, cpnota! Your repo indeed helps me alot!

I reproduce your sac implementation and found a bug in SoftDeterministicPolicyNetwork.

The torch.log(1 - action.pow(2) + 1e-6) in line 44 can output nan if the tanh_scale is larger than 1.

According to appendix C in SAC paper, I think the _sample function should be:

    def _sample(self, normal):
        raw = normal.rsample()
        log_prob = normal.log_prob(raw)
        log_prob -= torch.log(1 - torch.tanh(raw).pow(2) + 1e-6)
        log_prob = log_prob.sum(1)
        action = self._squash(raw)
        return action, log_prob

Thank you!

Should env.step() return done and info?

Returns
-------
State
The state of the environment after the action is applied
float
The reward achieved by the previous action
done
True if the environment has entered a terminal state and should be reset
info
Diagnostic information useful for debugging

In the abstract Environment class it says that the step() methods should return the state, the reward, an indicator whether the episode ends and some information.

However, in the GymEnvironment only the reward and the state are returned:

return self._state, self._reward

Which usage should be the correct one?

Support newer versions of PyTorch

Right now, if you use the newest version of PyTorch then the tests fail (it's using 1.5.1 instead). I think we briefly discussed this a few months ago, I just wanted to make sure I didn't forget.

Parameter scheduling

There should be an easy way to create parameter schedules over time. E.g., learning rates, epsilon (in epsilon greedy), etc.

Add Requirements File

I'm not sure if requirements.txt or something else is the way to go, but we should make our dependencies explicit.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.