GithubHelp home page GithubHelp logo

thu-ml / tianshou Goto Github PK

View Code? Open in Web Editor NEW
7.4K 91.0 1.1K 45.95 MB

An elegant PyTorch deep reinforcement learning library.

Home Page: https://tianshou.org

License: MIT License

Python 100.00%
pytorch policy-gradient dqn double-dqn a2c ddpg ppo td3 sac imitation-learning

tianshou's Introduction


PyPI Conda Read the Docs Read the Docs Unittest codecov GitHub issues GitHub stars GitHub forks GitHub license

Tianshou (天授) is a reinforcement learning platform based on pure PyTorch and Gymnasium. Unlike other reinforcement learning libraries, which may have complex codebases, unfriendly high-level APIs, or are not optimized for speed, Tianshou provides a high-performance, modularized framework and user-friendly interfaces for building deep reinforcement learning agents. One more aspect that sets Tianshou apart is its generality: it supports online and offline RL, multi-agent RL, and model-based algorithms.

Tianshou aims at enabling concise implementations, both for researchers and practitioners, without sacrificing flexibility.

Supported algorithms include:

Other noteworthy features:

  • Elegant framework with dual APIs:
    • Tianshou's high-level API maximizes ease of use for application development while still retaining a high degree of flexibility.
    • The fundamental procedural API provides a maximum of flexibility for algorithm development without being overly verbose.
  • State-of-the-art results in MuJoCo benchmarks for REINFORCE/A2C/TRPO/PPO/DDPG/TD3/SAC algorithms
  • Support for vectorized environments (synchronous or asynchronous) for all algorithms (see usage)
  • Support for super-fast vectorized environments based on EnvPool for all algorithms (see usage)
  • Support for recurrent state representations in actor networks and critic networks (RNN-style training for POMDPs) (see usage)
  • Support any type of environment state/action (e.g. a dict, a self-defined class, ...) Usage
  • Support for customized training processes (see usage)
  • Support n-step returns estimation and prioritized experience replay for all Q-learning based algorithms; GAE, nstep and PER are highly optimized thanks to numba's just-in-time compilation and vectorized numpy operations
  • Support for multi-agent RL (see usage)
  • Support for logging based on both TensorBoard and W&B
  • Support for multi-GPU training (see usage)
  • Comprehensive documentation, PEP8 code-style checking, type checking and thorough tests

In Chinese, Tianshou means divinely ordained, being derived to the gift of being born. Tianshou is a reinforcement learning platform, and the nature of RL is not learn from humans. So taking "Tianshou" means that there is no teacher to learn from, but rather to learn by oneself through constant interaction with the environment.

“天授”意指上天所授,引申为与生具有的天赋。天授是强化学习平台,而强化学习算法并不是向人类学习的,所以取“天授”意思是没有老师来教,而是自己通过跟环境不断交互来进行学习。

Installation

Tianshou is currently hosted on PyPI and conda-forge. It requires Python >= 3.11.

For installing the most recent version of Tianshou, the best way is clone the repository and install it with poetry (which you need to install on your system first)

git clone [email protected]:thu-ml/tianshou.git
cd tianshou
poetry install

You can also install the dev requirements by adding --with dev or the extras for say mujoco and acceleration by envpool by adding --extras mujoco envpool

Available extras are:

  • atari (for Atari environments)
  • box2d (for Box2D environments)
  • classic_control (for classic control (discrete) environments)
  • mujoco (for MuJoCo environments)
  • mujoco-py (for legacy mujoco-py environments1)
  • pybullet (for pybullet environments)
  • robotics (for gymnasium-robotics environments)
  • vizdoom (for ViZDoom environments)
  • envpool (for envpool integration)
  • argparse (in order to be able to run the high level API examples)

Otherwise, you can install the latest release from PyPI (currently far behind the master) with the following command:

$ pip install tianshou

If you are using Anaconda or Miniconda, you can install Tianshou from conda-forge:

$ conda install tianshou -c conda-forge

Alternatively to the poetry install, you can also install the latest source version through GitHub:

$ pip install git+https://github.com/thu-ml/tianshou.git@master --upgrade

Finally, you may check the installation via your Python console as follows:

import tianshou
print(tianshou.__version__)

If no errors are reported, you have successfully installed Tianshou.

Documentation

Tutorials and API documentation are hosted on tianshou.readthedocs.io.

Find example scripts in the test/ and examples/ folders.

Why Tianshou?

Comprehensive Functionality

RL Platform GitHub Stars # of Alg. (1) Custom Env Batch Training RNN Support Nested Observation Backend
Baselines GitHub stars 9 ✔️ (gym) (2) ✔️ TF1
Stable-Baselines GitHub stars 11 ✔️ (gym) (2) ✔️ TF1
Stable-Baselines3 GitHub stars 7 (3) ✔️ (gym) (2) ✔️ PyTorch
Ray/RLlib GitHub stars 16 ✔️ ✔️ ✔️ ✔️ TF/PyTorch
SpinningUp GitHub stars 6 ✔️ (gym) (2) PyTorch
Dopamine GitHub stars 7 TF/JAX
ACME GitHub stars 14 ✔️ (dm_env) ✔️ ✔️ ✔️ TF/JAX
keras-rl GitHub stars 7 ✔️ (gym) Keras
rlpyt GitHub stars 11 ✔️ ✔️ ✔️ PyTorch
ChainerRL GitHub stars 18 ✔️ (gym) ✔️ ✔️ Chainer
Sample Factory GitHub stars 1 (4) ✔️ (gym) ✔️ ✔️ ✔️ PyTorch
Tianshou GitHub stars 20 ✔️ (Gymnasium) ✔️ ✔️ ✔️ PyTorch

(1): access date: 2021-08-08

(2): not all algorithms support this feature

(3): TQC and QR-DQN in sb3-contrib instead of main repo

(4): super fast APPO!

High Software Engineering Standards

RL Platform Documentation Code Coverage Type Hints Last Update
Baselines GitHub last commit
Stable-Baselines Documentation Status coverage GitHub last commit
Stable-Baselines3 Documentation Status coverage report ✔️ GitHub last commit
Ray/RLlib (1) ✔️ GitHub last commit
SpinningUp GitHub last commit
Dopamine GitHub last commit
ACME (1) ✔️ GitHub last commit
keras-rl Documentation (1) GitHub last commit
rlpyt Docs codecov GitHub last commit
ChainerRL Documentation Status Coverage Status GitHub last commit
Sample Factory codecov GitHub last commit
Tianshou Read the Docs codecov ✔️ GitHub last commit

(1): it has continuous integration but the coverage rate is not available

Reproducible, High-Quality Results

Tianshou is rigorously tested. In contrast to other RL platforms, our tests include the full agent training procedure for all of the implemented algorithms. Our tests would fail once if any of the agents failed to achieve a consistent level of performance on limited epochs. Our tests thus ensure reproducibility. Check out the GitHub Actions page for more detail.

Atari and MuJoCo benchmark results can be found in the examples/atari/ and examples/mujoco/ folders respectively. Our MuJoCo results reach or exceed the level of performance of most existing benchmarks.

Policy Interface

All algorithms implement the following, highly general API:

  • __init__: initialize the policy;
  • forward: compute actions based on given observations;
  • process_buffer: process initial buffer, which is useful for some offline learning algorithms
  • process_fn: preprocess data from the replay buffer (since we have reformulated all algorithms to replay buffer-based algorithms);
  • learn: learn from a given batch of data;
  • post_process_fn: update the replay buffer from the learning process (e.g., prioritized replay buffer needs to update the weight);
  • update: the main interface for training, i.e., process_fn -> learn -> post_process_fn.

The implementation of this API suffices for a new algorithm to be applicable within Tianshou, making experimenation with new approaches particularly straightforward.

Quick Start

Tianshou provides two API levels:

  • the high-level interface, which provides ease of use for end users seeking to run deep reinforcement learning applications
  • the procedural interface, which provides a maximum of control, especially for very advanced users and developers of reinforcement learning algorithms.

In the following, let us consider an example application using the CartPole gymnasium environment. We shall apply the deep Q network (DQN) learning algorithm using both APIs.

High-Level API

To get started, we need some imports.

from tianshou.highlevel.config import SamplingConfig
from tianshou.highlevel.env import (
    EnvFactoryRegistered,
    VectorEnvType,
)
from tianshou.highlevel.experiment import DQNExperimentBuilder, ExperimentConfig
from tianshou.highlevel.params.policy_params import DQNParams
from tianshou.highlevel.trainer import (
    EpochTestCallbackDQNSetEps,
    EpochTrainCallbackDQNSetEps,
    EpochStopCallbackRewardThreshold
)

In the high-level API, the basis for an RL experiment is an ExperimentBuilder with which we can build the experiment we then seek to run. Since we want to use DQN, we use the specialization DQNExperimentBuilder. The other imports serve to provide configuration options for our experiment.

The high-level API provides largely declarative semantics, i.e. the code is almost exclusively concerned with configuration that controls what to do (rather than how to do it).

experiment = (
    DQNExperimentBuilder(
        EnvFactoryRegistered(task="CartPole-v1", seed=0, venv_type=VectorEnvType.DUMMY),
        ExperimentConfig(
            persistence_enabled=False,
            watch=True,
            watch_render=1 / 35,
            watch_num_episodes=100,
        ),
        SamplingConfig(
            num_epochs=10,
            step_per_epoch=10000,
            batch_size=64,
            num_train_envs=10,
            num_test_envs=100,
            buffer_size=20000,
            step_per_collect=10,
            update_per_step=1 / 10,
        ),
    )
    .with_dqn_params(
        DQNParams(
            lr=1e-3,
            discount_factor=0.9,
            estimation_step=3,
            target_update_freq=320,
        ),
    )
    .with_model_factory_default(hidden_sizes=(64, 64))
    .with_epoch_train_callback(EpochTrainCallbackDQNSetEps(0.3))
    .with_epoch_test_callback(EpochTestCallbackDQNSetEps(0.0))
    .with_epoch_stop_callback(EpochStopCallbackRewardThreshold(195))
    .build()
)
experiment.run()

The experiment builder takes three arguments:

  • the environment factory for the creation of environments. In this case, we use an existing factory implementation for gymnasium environments.
  • the experiment configuration, which controls persistence and the overall experiment flow. In this case, we have configured that we want to observe the agent's behavior after it is trained (watch=True) for a number of episodes (watch_num_episodes=100). We have disabled persistence, because we do not want to save training logs, the agent or its configuration for future use.
  • the sampling configuration, which controls fundamental training parameters, such as the total number of epochs we run the experiment for (num_epochs=10)
    and the number of environment steps each epoch shall consist of (step_per_epoch=10000). Every epoch consists of a series of data collection (rollout) steps and training steps. The parameter step_per_collect controls the amount of data that is collected in each collection step and after each collection step, we perform a training step, applying a gradient-based update based on a sample of data (batch_size=64) taken from the buffer of data that has been collected. For further details, see the documentation of SamplingConfig.

We then proceed to configure some of the parameters of the DQN algorithm itself and of the neural network model we want to use. A DQN-specific detail is the use of callbacks to configure the algorithm's epsilon parameter for exploration. We want to use random exploration during rollouts (train callback), but we don't when evaluating the agent's performance in the test environments (test callback).

Find the script in examples/discrete/discrete_dqn_hl.py. Here's a run (with the training time cut short):

Find many further applications of the high-level API in the examples/ folder; look for scripts ending with _hl.py. Note that most of these examples require the extra package argparse (install it by adding --extras argparse when invoking poetry).

Procedural API

Let us now consider an analogous example in the procedural API. Find the full script in examples/discrete/discrete_dqn.py.

First, import some relevant packages:

import gymnasium as gym
import torch
from torch.utils.tensorboard import SummaryWriter
import tianshou as ts

Define some hyper-parameters:

task = 'CartPole-v1'
lr, epoch, batch_size = 1e-3, 10, 64
train_num, test_num = 10, 100
gamma, n_step, target_freq = 0.9, 3, 320
buffer_size = 20000
eps_train, eps_test = 0.1, 0.05
step_per_epoch, step_per_collect = 10000, 10

Initialize the logger:

logger = ts.utils.TensorboardLogger(SummaryWriter('log/dqn'))
# For other loggers, see https://tianshou.readthedocs.io/en/master/01_tutorials/05_logger.html

Make environments:

# You can also try SubprocVectorEnv, which will use parallelization
train_envs = ts.env.DummyVectorEnv([lambda: gym.make(task) for _ in range(train_num)])
test_envs = ts.env.DummyVectorEnv([lambda: gym.make(task) for _ in range(test_num)])

Create the network as well as its optimizer:

from tianshou.utils.net.common import Net

# Note: You can easily define other networks.
# See https://tianshou.readthedocs.io/en/master/01_tutorials/00_dqn.html#build-the-network
env = gym.make(task, render_mode="human")
state_shape = env.observation_space.shape or env.observation_space.n
action_shape = env.action_space.shape or env.action_space.n
net = Net(state_shape=state_shape, action_shape=action_shape, hidden_sizes=[128, 128, 128])
optim = torch.optim.Adam(net.parameters(), lr=lr)

Set up the policy and collectors:

policy = ts.policy.DQNPolicy(
    model=net,
    optim=optim,
    discount_factor=gamma, 
    action_space=env.action_space,
    estimation_step=n_step,
    target_update_freq=target_freq
)
train_collector = ts.data.Collector(policy, train_envs, ts.data.VectorReplayBuffer(buffer_size, train_num), exploration_noise=True)
test_collector = ts.data.Collector(policy, test_envs, exploration_noise=True)  # because DQN uses epsilon-greedy method

Let's train it:

result = ts.trainer.OffpolicyTrainer(
    policy=policy,
    train_collector=train_collector,
    test_collector=test_collector,
    max_epoch=epoch,
    step_per_epoch=step_per_epoch,
    step_per_collect=step_per_collect,
    episode_per_test=test_num,
    batch_size=batch_size,
    update_per_step=1 / step_per_collect,
    train_fn=lambda epoch, env_step: policy.set_eps(eps_train),
    test_fn=lambda epoch, env_step: policy.set_eps(eps_test),
    stop_fn=lambda mean_rewards: mean_rewards >= env.spec.reward_threshold,
    logger=logger,
).run()
print(f"Finished training in {result.timing.total_time} seconds")

Save/load the trained policy (it's exactly the same as loading a torch.nn.module):

torch.save(policy.state_dict(), 'dqn.pth')
policy.load_state_dict(torch.load('dqn.pth'))

Watch the agent with 35 FPS:

policy.eval()
policy.set_eps(eps_test)
collector = ts.data.Collector(policy, env, exploration_noise=True)
collector.collect(n_episode=1, render=1 / 35)

Inspect the data saved in TensorBoard:

$ tensorboard --logdir log/dqn

Please read the documentation for advanced usage.

Contributing

Tianshou is still under development. Further algorithms and features are continuously being added, and we always welcome contributions to help make Tianshou better. If you would like to contribute, please check out this link.

Citing Tianshou

If you find Tianshou useful, please cite it in your publications.

@article{tianshou,
  author  = {Jiayi Weng and Huayu Chen and Dong Yan and Kaichao You and Alexis Duburcq and Minghao Zhang and Yi Su and Hang Su and Jun Zhu},
  title   = {Tianshou: A Highly Modularized Deep Reinforcement Learning Library},
  journal = {Journal of Machine Learning Research},
  year    = {2022},
  volume  = {23},
  number  = {267},
  pages   = {1--6},
  url     = {http://jmlr.org/papers/v23/21-1127.html}
}

Acknowledgments

Tianshou is supported by appliedAI Institute for Europe, who is committed to providing long-term support and development.

Tianshou was previously a reinforcement learning platform based on TensorFlow. You can check out the branch priv for more detail. Many thanks to Haosheng Zou's pioneering work for Tianshou before version 0.1.1.

We would like to thank TSAIL and Institute for Artificial Intelligence, Tsinghua University for providing such an excellent AI research platform.

Footnotes

  1. mujoco-py is a legacy package and is not recommended for new projects. It is only included for compatibility with older projects. Also note that there may be compatibility issues with macOS newer than Monterey.

tianshou's People

Contributors

alexnikulkov avatar arnaujc91 avatar bfanas avatar blazejosinski avatar carlocagnetta avatar chendrag avatar danagi avatar dantp-ai avatar dependabot[bot] avatar duburcqa avatar fengredrum avatar gogoduan avatar imoneoi avatar jamartinh avatar markus28 avatar maxhuettenrauch avatar mehooz avatar michalgregor avatar mischapanch avatar nicoguertler avatar nuance1979 avatar opcode81 avatar rocknamx8 avatar shengxiang19 avatar stephenark30 avatar trinkle23897 avatar ultmaster avatar ycheng517 avatar yingchengyang avatar youkaichao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tianshou's Issues

process obs(image + state)

Hi, I want to process the obs, which include images and numerical states.
How can I add these info into buf?

Error with max_grad_norm in A2C Policy

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue categories for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, torch, sys
    print(tianshou.__version__, torch.__version__, sys.version, sys.platform)

Versions:
0.2.2 1.5.0 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0] linux

A2C policy cannot deal with max_grad_norm correctly. For example:

python test/discrete/test_a2c_with_il.py --max-grad-norm 1
Epoch #1:   0%|                                                                                                                                                                                                      | 0/1000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "test/discrete/test_a2c_with_il.py", line 137, in <module>
    test_a2c()
  File "test/discrete/test_a2c_with_il.py", line 98, in test_a2c
    writer=writer)
  File "/research/dept6/zlhe/miniconda3/envs/fp/lib/python3.7/site-packages/tianshou/trainer/onpolicy.py", line 89, in onpolicy_trainer
    train_collector.sample(0), batch_size, repeat_per_collect)
  File "/research/dept6/zlhe/miniconda3/envs/fp/lib/python3.7/site-packages/tianshou/policy/modelfree/a2c.py", line 101, in learn
    self.model.parameters(), max_norm=self._grad_norm)
AttributeError: 'NoneType' object has no attribute 'parameters'

This is due to the messed up variable names, as the following:

if self._grad_norm:
nn.utils.clip_grad_norm_(
self.model.parameters(), max_norm=self._grad_norm)

However there should be self.actor and self.critic instead of self.model, which causes the error.

Chinese documentation 中文文档

Please wait until my thesis passes the check.
为了防止我的毕业论文被查重,中文文档将在我的毕业论文查重之后放出。

How to use custom loss ?

I would like to add the following extra term to the loss function,
|| y_{pred} - y_{ref} ||_2^2
where y_{pred} is the action sampled by the distribution, and y_{ref} can be computed by the actor.

What is the best way to do it using your framework ? The point is being able to take advantage of the analytical gradient computation.

The only way I can think of is to overwrite the whole learn method of the policy (i.e. PPO algorithm), but it feels inconvenient just to add an extra line of code...

Thank you in advance,

Best,
Alexis

cannot run example 'pong_ppo.py'

Epoch #1: 0%| | 0/1000 [00:00<?, ?it/s]Process Process-1:
Process Process-2:
Process Process-3:
Process Process-5:
Process Process-4:
Process Process-6:
Process Process-7:
Process Process-8:
Traceback (most recent call last):
File "/home/z/anaconda2/envs/py36/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/z/anaconda2/envs/py36/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/z/anaconda2/envs/py36/lib/python3.6/site-packages/tianshou/env/vecenv.py", line 99, in worker
p.send(env.step(data))
File "/home/z/anaconda2/envs/py36/lib/python3.6/site-packages/tianshou/env/atari.py", line 75, in step
_, reward, terminal, info = self.env.step(action)
File "/home/z/anaconda2/envs/py36/lib/python3.6/site-packages/gym/envs/atari/atari_env.py", line 113, in step
action = self._action_set[a]
IndexError: index 94 is out of bounds for axis 0 with size 6

Can running and training be separated?

Can running and training be separated? For example, we deploy on the cloud, send data to the cloud for training, and issue policies to local hosts intermittently or in real time. Local is only responsible for operation.

Gazebo environment integration

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment:
    0.2.1 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54)
    [GCC 7.3.0] linux

Hello there, I'm using the PPO algorithm in this library.

I'm reading the examples/pong_ppo.py. In line 52, it is using the SubprocVectorEnv to create a lot of parallel environment.

Gazebo is a popular simulator in robotics, which supports ROS.
I'm using the Gazebo as my environment and run robot simulation. What should I do to make my environment support this SubprocVectorEnv function. I already make the Gazebo works on one environment instance.

How to add time limitation in my environment?

In tianshou.data.collector.py, you note that "Please make sure the given environment has a time limitation.", which troubled me. I wonder how to add it.

In tianshou/test/continuous/test_ppo.py, the environment is "Pendulum-v0". After each step, it always return "done=False", I wonder the way of judging the terminal of one episode.

Thanks!

Maybe a bug in the evaluation of DDPG

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue tracker and issue categories for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, torch, sys
    print(tianshou.__version__, torch.__version__, sys.version, sys.platform)

Hi, I created a pull request and got merged last night. I'm sorry that I forgot to report a bug fix in that commit.
I noticed a bug(maybe) in ddpg when I was modifying the code about adding noise to action.

In previous implementation, collector would pass eps = None to forward() both in training and evaluating.

def forward(self, batch: Batch,
state: Optional[Union[dict, Batch, np.ndarray]] = None,
model: str = 'actor',
input: str = 'obs',
eps: Optional[float] = None,
**kwargs) -> Batch:

Then the eps would be set to self._eps, so a noise would be added to action. But why should a noise be added to action in evaluation?

if eps is None:
eps = self._eps
if eps > 0:
# noise = np.random.normal(0, eps, size=logits.shape)
# logits += to_torch(noise, device=logits.device)
# noise = self.noise(logits.shape, eps)
logits += torch.randn(
size=logits.shape, device=logits.device) * eps

So I added a line in ddpg (also in sac) to avoid a noise addition in evaluation.

if self.training and explorating:
logits += to_torch_as(self._noise(logits.shape), logits)

Let me know if this is really a bug or not. Thank you.

Bugs in collector.sample

       if self._multi_buf:
            if batch_size > 0:
                lens = [len(b) for b in self.buffer]
                total = sum(lens)
                batch_index = np.random.choice(
                    total, batch_size, p=np.array(lens) / total)  # 
            else:
                batch_index = np.array([])

should be

batch_index = np.random.choice(
                   len(self.buffer]), batch_size, p=np.array(lens) / total)

Automatically batch tuple/dict obs

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue categories for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, torch, sys
    print(tianshou.__version__, torch.__version__, sys.version, sys.platform)

Related to issue #27 : I have a multimodal observation space that consists of image, optical flow, segmentation mask (int array), and other low-dimensional states. The current obs seems to support only a single uniform tensor. Is there any way to have tuple/dict obs that gets collated along the batch dimension? For example, pytorch provides collate_fn that preserves data structure: https://pytorch.org/docs/stable/data.html#dataloader-collate-fn
For now I can put the other obs in info, but I have to manually batch them at run time, which can be tedious and error-prone.

Wrong dtype for replay buffers

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue tracker and issue categories for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable

Replay buffers are systematically using the default dtype for numpy array internal buffers, namely float64. This wrong dtype is preserve over the original dtype of the data with adding samples to the buffer using update method.

Missing sum over log_prob in SAC

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the [source website], and in particular read the [known issues]
  • I have searched through the [issue categories] for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, torch, sys
    print(tianshou.__version__, torch.__version__, sys.version, sys.platform)

yields

0.2.2 1.5.0 3.7.5 (default, Nov 20 2019, 09:21:52) 
[GCC 9.2.1 20191008] linux

When running halfcheetahBullet_v0_sac.py I get the pytorch warning:

tianshou/tianshou/policy/modelfree/sac.py:111: UserWarning: Using a target size (torch.Size([128, 6])) that is different to the input size (torch.Size([128, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.

This is caused by target_q having the wrong shape which in turn stems from log_prob. In order to fix this, log_prob should be summed over dimension 1 in the forward method:

log_prob = dist.log_prob(x) - torch.log(
            self._action_scale * (1 - y.pow(2)) + self.__eps)
log_prob = torch.unsqueeze(torch.sum(log_prob, 1), 1)

This should be correct because dist.log_prob(x) gives log_prob for a univariate Gaussian for each action dimension. As the covariance matrix is diagonal, the probability density factorizes and the log of the multivariate Gaussian is the sum of the individual log_probs.
Code as it is now:

def forward(self, batch, state=None, input='obs', **kwargs):
obs = getattr(batch, input)
logits, h = self.actor(obs, state=state, info=batch.info)
assert isinstance(logits, tuple)
dist = torch.distributions.Normal(*logits)
x = dist.rsample()
y = torch.tanh(x)
act = y * self._action_scale + self._action_bias
log_prob = dist.log_prob(x) - torch.log(
self._action_scale * (1 - y.pow(2)) + self.__eps)
act = act.clamp(self._range[0], self._range[1])
return Batch(
logits=logits, act=act, state=h, dist=dist, log_prob=log_prob)

Let me know if you agree and if I should make a pull request.

By the way, I really like the modularity of Tianshou! Thank you for your work.

SAC policy produces nan action

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue categories for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, torch, sys
    print(tianshou.__version__, torch.__version__, sys.version, sys.platform)
  • version numbers: 0.2.2 1.4.0
  • problem: sac policy generating nan actions
    running python3 examples/halfcheetahBullet_v0_sac.py --task BipedalWalkerHardcore-v3,
    cannot pass nan assertion, and causing env exception.
    Screenshot from 2020-05-01 23-00-13

Batch class wrapper missing 'values' method

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue tracker and issue categories for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:

0.2.2 (master commit de556fd) 1.5.0+cu101 3.6.9 (default, Apr 18 2020, 01:56:04)
[GCC 8.4.0] linux


The Batch class implements keys method but values is missing. It is not a major issue but it would be nice to add it, both for convenience and consistency with the dict API.

Refactor

  • DQN/DDPG/TD3/SAC with n-step return, in process_fn
  • PER interface
  • Batch over batch: do not copy

RNN support

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, sys
    print(tianshou.__version__, sys.version, sys.platform)

I see on README that RNN support is on your TODO list. However, the module API seems to support RNN ( forward(obs, state) method). Could you please provide some examples on how to train RNN policy? Thanks!

state encoder for PG methods?

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the [source website], and in particular read the [known issues]
  • I have searched through the [issue categories] for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    As of version 0.2.2, it seems only the hidden state of polciy net are stored and used, how to train rnn critic nets(for POMDP problem)?
    Maybe a state encoder should be added so policy and critic can share this encoder.
    Are there any discussion groups? Really like this project :)
    [source website]: https://github.com/thu-ml/tianshou/
    [known issues]: https://github.com/thu-ml/tianshou/#faq-and-known-issues
    [issue categories]: https://github.com/thu-ml/tianshou/projects/2

Learning starts

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue tracker and issue categories for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, torch, sys
    print(tianshou.__version__, torch.__version__, sys.version, sys.platform)

I was wondering how to handle learning starts. As far as I can tell from the code, the actions are sampled from the policy from the beginning. How would one start the learning with e.g. 1000 timesteps of uniformly sampled actions?

Model-based algorithm?

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, sys
    print(tianshou.__version__, sys.version, sys.platform)

Are you thinking about including model-based algorithms in this framwork in the future, e.g., PILCO?
If not, I wonder Is it easy to implement model-based algorithms with the current code structure.

potential bug in the implementation of DDPG

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the [source website], and in particular read the [known issues]
  • I have searched through the [issue tracker] for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:

my environment:
0.2.1 3.7.6 (default, Jan 8 2020, 19:59:22)
[GCC 7.3.0] linux

Hello there, I'm using the DDPG algorithm in this library.

To my understanding, to calculate the gradient of the policy network, it should be:
(d(Q) / d(action)) * (d(action) / d(theta)) , I don't feel like this line of code will correctly propogate d(Q) / d(action)..

I'm raising this error because I was playing around with https://github.com/thu-ml/tianshou/blob/master/test/continuous/net.py, and when I added a Softmax layer to the Actor network, I'm getting an exception :
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

How to sample an entire epoch?

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, sys
    print(tianshou.__version__, sys.version, sys.platform)

Hello, I'am going to implement an algorithm called Episodic Backward Update, which requires sampling an entire trajectory during an epoch rather than sampling transitions randomly. So, I wonder if there is any mechanism to achieve this feature in tianshou?

sac example tensor size miss-match warning

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue categories for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, torch, sys
    print(tianshou.__version__, torch.__version__, sys.version, sys.platform)
  • version numbers:
0.2.2 1.4.0 3.7.3 (default, Mar 27 2019, 22:11:17) 
[GCC 7.3.0] linux
  • problem:
    UserWarning: Using a target size (torch.Size([128, 4])) that is different to the input size (torch.Size([128, 1]))
    Seems the shape of current_q1 and current_q2 is not correct.
python3 examples/halfcheetahBullet_v0_sac.py --task BipedalWalkerHardcore-v3 --run-id train
Epoch #1:   0%|                                                                                                                                                                    | 0/1000 [00:00<?, ?it/s]
/usr/local/lib/python3.6/site-packages/tianshou/policy/modelfree/sac.py:111: UserWarning: Using a target size (torch.Size([128, 4])) that is different to the input size (torch.Size([128, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  critic1_loss = F.mse_loss(current_q1, target_q)
/usr/local/lib/python3.6/site-packages/tianshou/policy/modelfree/sac.py:117: UserWarning: Using a target size (torch.Size([128, 4])) that is different to the input size (torch.Size([128, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  critic2_loss = F.mse_loss(current_q2, target_q)

compatible with torch 1.5.0

Fail in SACPolicy with torch==1.5.0

Traceback (most recent call last):
  File "test/continuous/test_sac_with_il.py", line 145, in <module>
    test_sac_with_il()
  File "test/continuous/test_sac_with_il.py", line 106, in test_sac_with_il
    args.batch_size, stop_fn=stop_fn, save_fn=save_fn, writer=writer)
  File "/home/trinkle/github/tianshou-new/tianshou/trainer/offpolicy.py", line 87, in offpolicy_trainer
    losses = policy.learn(train_collector.sample(batch_size))
  File "/home/trinkle/github/tianshou-new/tianshou/policy/modelfree/sac.py", line 131, in learn
    actor_loss.backward()
  File "/home/trinkle/.local/lib/python3.6/site-packages/torch/tensor.py", line 198, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/trinkle/.local/lib/python3.6/site-packages/torch/autograd/__init__.py", line 100, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [128, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Loss of PG

I'd like to ask about the loss about policy gradient:
-(action_prob * r).sum()
where action_prob = dist.log_prob(a) is the probability of each action, in my opinion. And what's the meaning of r?
I think the meaning of r may be reward by the algorithm of PG. However, in CartPole-v0, I print r[i] and find it is like 0.3532, -1.7669 and so on, but the reward in CartPole-v0 each step should be 1 or loss. So I'd ask about the meaning of r. And if r isn't the reward, how can we get the reward? Thanks :)

V-trace support?

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, sys
    print(tianshou.__version__, sys.version, sys.platform)

AttributeError: module 'tensorflow' has no attribute 'io'

  • I have marked all applicable categories:

    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues

  • I have searched through the issue categories for duplicates

  • I have mentioned version numbers, operating system and environment, where applicable:

    import tianshou, torch, sys
    print(tianshou.__version__, torch.__version__, sys.version, sys.platform)

    python test/discrete/test_pg.py --seed 0 --render 0.03
    Traceback (most recent call last):
    File "test/discrete/test_pg.py", line 173, in
    test_pg()
    File "test/discrete/test_pg.py", line 144, in test_pg
    writer = SummaryWriter(log_path)
    File "/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/tensorboard/writer.py", line 225, in init
    self._get_file_writer()
    File "/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/tensorboard/writer.py", line 256, in _get_file_writer
    self.flush_secs, self.filename_suffix)
    File "/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/tensorboard/writer.py", line 66, in init
    log_dir, max_queue, flush_secs, filename_suffix)
    File "/anaconda3/envs/pytorch/lib/python3.6/site-packages/tensorboard/summary/writer/event_file_writer.py", line 76, in init
    if not tf.io.gfile.exists(logdir):
    File "/anaconda3/envs/pytorch/lib/python3.6/site-packages/tensorboard/lazy.py", line 68, in getattr
    return getattr(load_once(self), attr_name)
    AttributeError: module 'tensorflow' has no attribute 'io'

Does it support cuda10.0 and torch1.12?

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue tracker and issue categories for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, torch, sys
    print(tianshou.__version__, torch.__version__, sys.version, sys.platform)

Batch not serializable

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue tracker and issue categories for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable

Batch instances are creating infinite recursive loop when used in argument of a multiprocessing Pipe. Here is a snippet to reproduce the issue.

from tianshou.data import Batch
from multiprocessing import Pipe
(p, c) = Pipe()
c.send(Batch(a=1.0))
print(p.recv())

Similarly, using Pickle without multiprocessing produces the same issue.

from tianshou.data import Batch
import pickle
pickle.dump(Batch(a=1.0), open("save.p", "wb"))                                                                                                                                                     
pickle.load(open("save.p", "rb"))

I'm going to open a PR that fixes the issue.

Different state and action data type in Fetch_env

Hello, I am using FetchPickAndPlace-v1 env. The 'obs', 'action' and reward_threshold(There seems no reward_threshold in this env) data type are different from normal gym env. For example, the 'obs' of FetchPickAndPlace-v1 is a dict. This leads to some problems when I run the collector in your code, because the collector seems to assume the 'obs' is a list. So it will come out an error.

Collector `preprocess_fn` inconsistent input types

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue tracker and issue categories for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:

0.2.2 (master commit de556fd) 1.5.0+cu101 3.6.9 (default, Apr 18 2020, 01:56:04)
[GCC 8.4.0] linux


When using a custom preprocessor preprocess_fn, the Collector is not converting the observation in Batch before calling it, while it is the case for the action. I think it would be better the convert it in Batch systematically, even when a custom preprocessor is defined. So that the final user only has to handle Batch, not an ugly numpy array of dict.

How can I use a self-defined state in tianshou?

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue tracker and issue categories for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, torch, sys
    print(tianshou.__version__, torch.__version__, sys.version, sys.platform)

I was wondering if it's possible for me to add some self-defined states, e.g., a nexworkx graph? Thanks!

Pytorch version requirement

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue tracker and issue categories for duplicates
  • I have mentioned version numbers, operating system and environment
    Tianshou can not install successfully.
    1.1.0 3.7.7
    [GCC 7.3.0] linux

Hi,
I'm using Cuda 9.0, so the torch version I installed is 1.1.0. It seems that tianshou's requirement is torch>=1.4.0. Is it possible to install tianshou with torch 1.1.0?

batched env wrapper

Hi, could you tell me how to save the episode?
I want to save the sate-action pairs, which are stored in ReplayBuffer. Because I want to use these pairs to update another model.

Thanks.

env problem

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue tracker and issue categories for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, torch, sys
    print(tianshou.__version__, torch.__version__, sys.version, sys.platform)

Hey, Tianshou's DRL algorithm is implemented on gym’environment ,if I want to implement tianshou on my own RL environment,what rules should I follow to define a RL environment ? I haven't found related solutions in tianshou' tutorial

support for google cloud TPU

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the [source website], and in particular read the [known issues]
  • I have searched through the [issue tracker] for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:

Hi,
I see distributed training in your todo list, does that include support for google cloud TPU?

save the info

Hi, the info includes a flag which represents succ or fail to grasp.
How can I save the info by tensorboard?
I don't see the env.step and save the info.

Clip with Multi-dimensional action?

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue tracker and issue categories for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, torch, sys
    print(tianshou.__version__, torch.__version__, sys.version, sys.platform)

It seems VPG and PPO doesn't allow the action range to be multidimensional? (VPG seems not support action range.)
e.g., action_low = [0, 1, 10], action_high = [1, 10, 100]

Any ideas about how to fix it?

action space and state space

how to customize the state space and action space?
for example:
state = np.array([ a, b ,c ] ,dtype=int)
action space ={ 0 , 1 }

Loss of A2C algorithm

I'd like to ask about the loss of a2c algorithm. There are three part of loss in tianshou's code:
loss = a_loss + self._w_vf * vf_loss - self._w_ent * ent_loss
I'd like to ask about the meaning of the third part of the loss. I try use this to run in CartPole-v0, it converge after 1355 steps,then I try use only the first and second part of loss to run in CartPole-v0, it converge after 770 steps. It seems that only use the first and second part of loss is enough?

Refactoring of Batch class

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website, and in particular read the known issues
  • I have searched through the issue tracker and issue categories for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable.

I just reviewed your implementation of Batch class. It is nice, but you are always accessing the value of the stored objects using key instead of iterators. I think it would be both clearer and more efficient to rely on iterators everywhere, since it would no longer access the hashing table every time you want to get an item.

If you want, I can do it by myself and open a PR.

Fully support from/to numpy/pytorch for Batch

The current implementation of PPO and other policy algorithm do not support action dict because of this line.

It could be solved by adding to new method to Batch class to convert back the relevant fields to torch.Tensor.

I'm opening a PR to fix that.

support for other environments?

  • I have marked all applicable categories:

    • exception-raising bug
    • RL algorithm bug
    • documentation request
    • new feature request
  • I have visited the [source website], and in particular read the [known issues]

  • I have searched through the [issue tracker] for duplicates

  • I have mentioned version numbers, operating system and environment, where applicable:

Hello, I'm using the DDPG algorithm to solve the network optimization problem, so the state of the environment is defined as the data from the network simulator in my model.Could you please provide some examples on how to interacte with environments other than gym Libraries? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.