GithubHelp home page GithubHelp logo

chainer / chainerrl Goto Github PK

View Code? Open in Web Editor NEW
1.2K 1.2K 226.0 14.19 MB

ChainerRL is a deep reinforcement learning library built on top of Chainer.

License: MIT License

Python 97.82% Shell 2.18%
actor-critic chainer deep-learning dqn machine-learning python reinforcement-learning

chainerrl's Introduction

Notice: As announced, Chainer is under the maintenance phase and further development will be limited to bug-fixes and maintenance only.


Chainer: A deep learning framework

pypi GitHub license travis coveralls Read the Docs Optuna

Website | Docs | Install Guide | Tutorials (ja) | Examples (Official, External) | Concepts | ChainerX

Forum (en, ja) | Slack invitation (en, ja) | Twitter (en, ja)

Chainer is a Python-based deep learning framework aiming at flexibility. It provides automatic differentiation APIs based on the define-by-run approach (a.k.a. dynamic computational graphs) as well as object-oriented high-level APIs to build and train neural networks. It also supports CUDA/cuDNN using CuPy for high performance training and inference. For more details about Chainer, see the documents and resources listed above and join the community in Forum, Slack, and Twitter.

Installation

For more details, see the installation guide.

To install Chainer, use pip.

$ pip install chainer

To enable CUDA support, CuPy is required. Refer to the CuPy installation guide.

Docker image

We are providing the official Docker image. This image supports nvidia-docker. Login to the environment with the following command, and run the Python interpreter to use Chainer with CUDA and cuDNN support.

$ nvidia-docker run -it chainer/chainer /bin/bash

Contribution

See the contribution guide.

ChainerX

See the ChainerX documentation.

License

MIT License (see LICENSE file).

More information

References

Tokui, Seiya, et al. "Chainer: A Deep Learning Framework for Accelerating the Research Cycle." Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2019. URL BibTex

Tokui, S., Oono, K., Hido, S. and Clayton, J., Chainer: a Next-Generation Open Source Framework for Deep Learning, Proceedings of Workshop on Machine Learning Systems(LearningSys) in The Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS), (2015) URL, BibTex

Akiba, T., Fukuda, K. and Suzuki, S., ChainerMN: Scalable Distributed Deep Learning Framework, Proceedings of Workshop on ML Systems in The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS), (2017) URL, BibTex

chainerrl's People

Contributors

corochann avatar delta2323 avatar imos avatar iory avatar katrinleinweber avatar keisuke-nakata avatar kiyukuta avatar knorth55 avatar kuni-kuni avatar ljvmiranda921 avatar lyx-x avatar marioyc avatar mitmul avatar mmilk1231 avatar monado3 avatar mr4msm avatar muupan avatar okuta avatar prabhatnagarajan avatar seann999 avatar toslunar avatar uidilr avatar ummavi avatar xinyuewang1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chainerrl's Issues

env.spec.timestep_limit has been deprecated

gym now complains:

DEPRECATION WARNING: env.spec.timestep_limit has been deprecated. Replace your call to env.spec.timestep_limit with env.spec.tags.get('wrapper_config.TimeLimit.max_episode_steps'). This change was made 12/28/2016 and is included in version 0.7.0

ValueError: On entry to SGEMV parameter number 8 had an illegal value

Travis CI failed on examples/gym/train_ddpg_gym.py:

Traceback (most recent call last):
  File "examples/gym/train_ddpg_gym.py", line 173, in <module>
    main()
  File "examples/gym/train_ddpg_gym.py", line 170, in main
    max_episode_len=timestep_limit)
  File "/home/travis/build/pfnet/chainerrl/chainerrl/experiments/train_agent.py", line 144, in train_agent_with_evaluation
    logger=logger)
  File "/home/travis/build/pfnet/chainerrl/chainerrl/experiments/train_agent.py", line 52, in train_agent
    action = agent.act_and_train(obs, r)
  File "/home/travis/build/pfnet/chainerrl/chainerrl/agents/ddpg.py", line 314, in act_and_train
    self.replay_updater.update_if_necessary(self.t)
  File "/home/travis/build/pfnet/chainerrl/chainerrl/replay_buffer.py", line 327, in update_if_necessary
    self.update_func(transitions)
  File "/home/travis/build/pfnet/chainerrl/chainerrl/agents/ddpg.py", line 246, in update
    self.actor_optimizer.update(lambda: self.compute_actor_loss(batch))
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/chainer/optimizer.py", line 416, in update
    loss.backward()
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/chainer/variable.py", line 398, in backward
    gxs = func.backward(in_data, out_grad)
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/chainer/functions/connection/linear.py", line 59, in backward
    gW = gy.T.dot(x).astype(W.dtype, copy=False)
ValueError: On entry to SGEMV parameter number 8 had an illegal value

This may be the same issue as chainer/chainer#2744

average_loss always 0 when using episodic_replay=True (DQN)

Trying this two different q_functions:

(non recurrent)

class QFunction(chainer.Chain, StateQFunction):

        def __init__(self, n_input_channels=3, n_actions = 4, bias=0.1):
            self.n_actions = n_actions
            self.n_input_channels = n_input_channels
            conv_layers = chainer.ChainList(
                L.Convolution2D(n_input_channels, 32, 8, stride=4, bias=bias),
                L.Convolution2D(32, 64, 4, stride=2, bias=bias),
                L.Convolution2D(64, 64, 3, stride=1, bias=bias),
                L.Convolution2D(64, 128, 7, stride=1, bias=bias)
                )

            lin_layer = L.Linear(128, 128)                     

            a_stream = MLP(128,n_actions,[2])
            v_stream = MLP(128,1,[2])

            super().__init__(conv_layers=conv_layers, lin_layer=lin_layer, a_stream=a_stream,v_stream=v_stream)

        def __call__(self, x, test=False):
            """
            Args:
                x (ndarray or chainer.Variable): An observation
                test (bool): a flag indicating whether it is in test mode
            """
            h = x
            for l in self.conv_layers:
                h = F.relu(l(h))
            h = self.lin_layer(h)

            batch_size = x.shape[0]
            ya = self.a_stream(h, test=test)
            mean = F.reshape(F.sum(ya,axis=1) / self.n_actions, (batch_size,1))
            ya, mean = F.broadcast(ya,mean)
            ya -= mean

            ys = self.v_stream(h,test=test)
            
            ya,ys = F.broadcast(ya,ys)
            q = ya+ys
            return chainerrl.action_value.DiscreteActionValue(q)


(recurrent)

class QFunctionRecurrent(chainer.Chain, StateQFunction):

        def __init__(self, n_input_channels=3, n_actions = 4, bias=0.1):
            self.n_actions = n_actions
            self.n_input_channels = n_input_channels
            conv_layers = chainer.ChainList(
                L.Convolution2D(n_input_channels, 32, 8, stride=4, bias=bias),
                L.Convolution2D(32, 64, 4, stride=2, bias=bias),
                L.Convolution2D(64, 64, 3, stride=1, bias=bias),
                L.Convolution2D(64, 128, 7, stride=1, bias=bias)
                )

            lstm_layer = L.LSTM(128, 128)                     

            a_stream = MLP(128,n_actions,[2])
            v_stream = MLP(128,1,[2])

            super().__init__(conv_layers=conv_layers, lstm_layer=lstm_layer, a_stream=a_stream,v_stream=v_stream)

        def __call__(self, x, test=False):
            """
            Args:
                x (ndarray or chainer.Variable): An observation
                test (bool): a flag indicating whether it is in test mode
            """
            h = x
            for l in self.conv_layers:
                h = F.relu(l(h))
            h = self.lstm_layer(h)

            batch_size = x.shape[0]
            ya = self.a_stream(h, test=test)
            mean = F.reshape(F.sum(ya,axis=1) / self.n_actions, (batch_size,1))
            ya, mean = F.broadcast(ya,mean)
            ya -= mean

            ys = self.v_stream(h,test=test)
            
            ya,ys = F.broadcast(ya,ys)
            q = ya+ys
            return chainerrl.action_value.DiscreteActionValue(q)

I found that for the non-recurrent version the loss is not zero and the agent will eventually master the gym environment provided.

However, changing nothing else than adding an lstm layer and setting episodic_replay to True the average_loss will become 0 all the time and the agents is not able to learn to better interact with its environent.

First, I thought that this was due to some kind of rounding issues so I set the minibatch_size=1, episodic_update_len = 1 (assuming that one episodic replay will now only containg one time step) but still no changes.

I wonder if this is some kind of bug or (which I think is more likely) an error on my side.

Any help is very much appreciated!

Question on gym action space

Hi, I've defined my own OpenAI gym and have specified my actions in the environment as follows:

`
self.actions = ["NOOP", "LEFT", "RIGHT", "FIRE", "CLOAK"]

self.action_space = spaces.Discrete(len(self.actions))
`

When I try my environment against the 'train_dqn_gym.py' example I can see from my debug output that the training is correctly resulting in trying a variety of different actions.

However, with both 'train_a3c_gym.py' and 'train_acer_gym.py' the action value provided to my step value is always 0 (NOOP) - it never tries any other action.

Have I coded something wrong in my environment? I would appreciate any tips on how to investigate my issue further.

The tutorial code causes TypeError on python 3.4

On python 3.4, random.sample don't accept collections.deque, so I got such error.

Traceback (most recent call last):
  File "quickstart.py", line 111, in <module>
    action = agent.act_and_train(obs, reward)
  File "/opt/rl/lib/python3.4/site-packages/chainerrl/agents/dqn.py", line 340, in act_and_train
    self.replay_updator.update_if_necessary(self.t)
  File "/opt/rl/lib/python3.4/site-packages/chainerrl/replay_buffer.py", line 194, in update_if_necessary
    transitions = self.replay_buffer.sample(self.batchsize)
  File "/opt/rl/lib/python3.4/site-packages/chainerrl/replay_buffer.py", line 42, in sample
    return random.sample(self.memory, n)
  File "/opt/rl/lib/python3.4/random.py", line 311, in sample
    raise TypeError("Population must be a sequence or set.  For dicts, use list(d).")
TypeError: Population must be a sequence or set.  For dicts, use list(d).

python 2.7 works fine, and meybe 3.5+.

Specify successful configurations for examples

Current examples don't specify in what configuration they work well, except newer ones (train_pcl_gym.py and train_reinforce_gym.py). Such instructions are important because users can easily confirm that the implementations actually work.

  • ale/train_a3c_ale.py
  • ale/train_acer_ale.py
  • ale/train_dqn_ale.py
  • ale/train_nsq_ale.py
  • gym/train_a3c_gym.py
  • gym/train_acer_gym.py
  • gym/train_ddpg_gym.py
  • gym/train_dqn_gym.py
  • gym/train_pcl_gym.py
  • gym/train_reinforce_gym.py

env.monitor has been deprecated as of 12/23/2016

gym.error.Error: env.monitor has been deprecated as of 12/23/2016. Remove your call to env.monitor.start(directory) and instead wrap your env with env = gym.wrappers.Monitor(env, directory) to record data.

Windows Bash Run Chainerrl unknown cuda error

recently I got a windows 10 computer, successfully installed bash on ubuntu for windows, cuda, cudnn, chainer and chainerrl. But to run the example, I got the following error. Any suggestions?

(py2env) neil@DESKTOP-C22605O:~/chainerrl$ xvfb-run -s "-screen 0 1400x900x24" python examples/gym/train_dqn_gym.py
Output files are saved in dqn_out/20170324141722891586
INFO:gym.envs.registration:Making new env: Pendulum-v0
Traceback (most recent call last):
File "examples/gym/train_dqn_gym.py", line 179, in
main()
File "examples/gym/train_dqn_gym.py", line 154, in main
episodic_update=args.episodic_replay, episodic_update_len=16)
File "/home/neil/py2env/local/lib/python2.7/site-packages/chainerrl/agents/dqn.py", line 115, in init
cuda.get_device(gpu).use()
File "cupy/cuda/device.pyx", line 75, in cupy.cuda.device.Device.use (cupy/cuda/device.cpp:2083)
File "cupy/cuda/device.pyx", line 81, in cupy.cuda.device.Device.use (cupy/cuda/device.cpp:2035)
File "cupy/cuda/runtime.pyx", line 178, in cupy.cuda.runtime.setDevice (cupy/cuda/runtime.cpp:2915)
File "cupy/cuda/runtime.pyx", line 130, in cupy.cuda.runtime.check_status (cupy/cuda/runtime.cpp:2241)
cupy.cuda.runtime.CUDARuntimeError: cudaErrorUnknown: unknown error

Date and time format for experiments

Human-readability of the name of the subdirectory for an experiment might be improved. The current imprementation (time_str = datetime.datetime.now().strftime('%Y%m%d%H%M%S%f') in chainerrl/experiments/prepare_output_dir.py) produces e.g. 21120903182945898662. How about

  • strftime('%Y%m%d-%H%M%S-%f') (e.g. 21120903-182945-898662), or
  • the basic format in ISO 8601 (e.g. 21120903T182945.898662+0900)?

Documentation on usage of recurrent models

In ChainerRL, to use user-defined recurrent models, you need to make sure they implement chainerrl.recurent.Recurrent interface, otherwise they won't be treated as recurrent models.

When your model's recurrent-ness comes from chainer.links.LSTM, all you have to do is inheriting chainer.recurrent.RecurrentChainMixin.

This kind of information is missing in the document.

PyTorch as an additional backend

I'm curious about whether ChainerRL can support PyTorch as an additional NN backend. Its interface is similar to Chainer's, but I'm not sure how easy it would be to support both. Any suggestions and opinions are welcome.

Add suppression option for print messages during training loop?

In chainerrl.experiments.train_agent, statistical information is reported via print per episode during the training loop. However, this sometimes looks so verbose and I want to suppress these messages, but currently there is no good way to do so. Adding some option that enables/disables these prints might be beneficial.

Extend gym.Wrapper instead of env_modifiers

Since gym has introduced its own interface to modify envs using gym.Wrapper, I think it is better to use it in ChainerRL instead of directly modifying methods as in env_modifiers.

REINFORCE

A simple REINFORCE implementation that doesn't require a value function would be helpful.

Type of observation and action space

I think the examples of gym interface show, the agent and q_functions expect the observation and action to be a Box or a Discrete for each. Is it correct?

If so, how should I use the observation and action other typed, especially a Tuple?
Do I need to modify the environment as to return a Box, or do I have another choice?

Thanks,

MuJoCo-ACER Examples

Are there any example codes of ACER in the continuous action spaces, using the MuJoCo environments?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.