GithubHelp home page GithubHelp logo

salesforce / awd-lstm-lm Goto Github PK

View Code? Open in Web Editor NEW
2.0K 66.0 493.0 58 KB

LSTM and QRNN Language Model Toolkit for PyTorch

License: BSD 3-Clause "New" or "Revised" License

Python 97.43% Shell 2.57%
lstm pytorch language-model sgd qrnn

awd-lstm-lm's Introduction

LSTM and QRNN Language Model Toolkit

This repository contains the code used for two Salesforce Research papers:

The model comes with instructions to train:

  • word level language models over the Penn Treebank (PTB), WikiText-2 (WT2), and WikiText-103 (WT103) datasets

  • character level language models over the Penn Treebank (PTBC) and Hutter Prize dataset (enwik8)

The model can be composed of an LSTM or a Quasi-Recurrent Neural Network (QRNN) which is two or more times faster than the cuDNN LSTM in this setup while achieving equivalent or better accuracy.

  • Install PyTorch 0.4
  • Run getdata.sh to acquire the Penn Treebank and WikiText-2 datasets
  • Train the base model using main.py
  • (Optionally) Finetune the model using finetune.py
  • (Optionally) Apply the continuous cache pointer to the finetuned model using pointer.py

If you use this code or our results in your research, please cite as appropriate:

@article{merityRegOpt,
  title={{Regularizing and Optimizing LSTM Language Models}},
  author={Merity, Stephen and Keskar, Nitish Shirish and Socher, Richard},
  journal={arXiv preprint arXiv:1708.02182},
  year={2017}
}
@article{merityAnalysis,
  title={{An Analysis of Neural Language Modeling at Multiple Scales}},
  author={Merity, Stephen and Keskar, Nitish Shirish and Socher, Richard},
  journal={arXiv preprint arXiv:1803.08240},
  year={2018}
}

Update (June/13/2018)

The codebase is now PyTorch 0.4 compatible for most use cases (a big shoutout to https://github.com/shawntan for a fairly comprehensive PR #43). Mild readjustments to hyperparameters may be necessary to obtain quoted performance. If you desire exact reproducibility (or wish to run on PyTorch 0.3 or lower), we suggest using an older commit of this repository. We are still working on pointer, finetune and generate functionalities.

Software Requirements

Python 3 and PyTorch 0.4 are required for the current codebase.

Included below are hyper parameters to get equivalent or better results to those included in the original paper.

If you need to use an earlier version of the codebase, the original code and hyper parameters accessible at the PyTorch==0.1.12 release, with Python 3 and PyTorch 0.1.12 are required. If you are using Anaconda, installation of PyTorch 0.1.12 can be achieved via: conda install pytorch=0.1.12 -c soumith.

Experiments

The codebase was modified during the writing of the paper, preventing exact reproduction due to minor differences in random seeds or similar. We have also seen exact reproduction numbers change when changing underlying GPU. The guide below produces results largely similar to the numbers reported.

For data setup, run ./getdata.sh. This script collects the Mikolov pre-processed Penn Treebank and the WikiText-2 datasets and places them in the data directory.

Next, decide whether to use the QRNN or the LSTM as the underlying recurrent neural network model. The QRNN is many times faster than even Nvidia's cuDNN optimized LSTM (and dozens of times faster than a naive LSTM implementation) yet achieves similar or better results than the LSTM for many word level datasets. At the time of writing, the QRNN models use the same number of parameters and are slightly deeper networks but are two to four times faster per epoch and require less epochs to converge.

The QRNN model uses a QRNN with convolutional size 2 for the first layer, allowing the model to view discrete natural language inputs (i.e. "New York"), while all other layers use a convolutional size of 1.

Finetuning Note: Fine-tuning modifies the original saved model model.pt file - if you wish to keep the original weights you must copy the file.

Pointer note: BPTT just changes the length of the sequence pushed onto the GPU but won't impact the final result.

Character level enwik8 with LSTM

  • python -u main.py --epochs 50 --nlayers 3 --emsize 400 --nhid 1840 --alpha 0 --beta 0 --dropoute 0 --dropouth 0.1 --dropouti 0.1 --dropout 0.4 --wdrop 0.2 --wdecay 1.2e-6 --bptt 200 --batch_size 128 --optimizer adam --lr 1e-3 --data data/enwik8 --save ENWIK8.pt --when 25 35

Character level Penn Treebank (PTB) with LSTM

  • python -u main.py --epochs 500 --nlayers 3 --emsize 200 --nhid 1000 --alpha 0 --beta 0 --dropoute 0 --dropouth 0.25 --dropouti 0.1 --dropout 0.1 --wdrop 0.5 --wdecay 1.2e-6 --bptt 150 --batch_size 128 --optimizer adam --lr 2e-3 --data data/pennchar --save PTBC.pt --when 300 400

Word level WikiText-103 (WT103) with QRNN

  • python -u main.py --epochs 14 --nlayers 4 --emsize 400 --nhid 2500 --alpha 0 --beta 0 --dropoute 0 --dropouth 0.1 --dropouti 0.1 --dropout 0.1 --wdrop 0 --wdecay 0 --bptt 140 --batch_size 60 --optimizer adam --lr 1e-3 --data data/wikitext-103 --save WT103.12hr.QRNN.pt --when 12 --model QRNN

Word level Penn Treebank (PTB) with LSTM

The instruction below trains a PTB model that without finetuning achieves perplexities of approximately 61.2 / 58.8 (validation / testing), with finetuning achieves perplexities of approximately 58.8 / 56.5, and with the continuous cache pointer augmentation achieves perplexities of approximately 53.2 / 52.5.

  • python main.py --batch_size 20 --data data/penn --dropouti 0.4 --dropouth 0.25 --seed 141 --epoch 500 --save PTB.pt
  • python finetune.py --batch_size 20 --data data/penn --dropouti 0.4 --dropouth 0.25 --seed 141 --epoch 500 --save PTB.pt
  • python pointer.py --data data/penn --save PTB.pt --lambdasm 0.1 --theta 1.0 --window 500 --bptt 5000

Word level Penn Treebank (PTB) with QRNN

The instruction below trains a QRNN model that without finetuning achieves perplexities of approximately 60.6 / 58.3 (validation / testing), with finetuning achieves perplexities of approximately 59.1 / 56.7, and with the continuous cache pointer augmentation achieves perplexities of approximately 53.4 / 52.6.

  • python -u main.py --model QRNN --batch_size 20 --clip 0.2 --wdrop 0.1 --nhid 1550 --nlayers 4 --emsize 400 --dropouth 0.3 --seed 9001 --dropouti 0.4 --epochs 550 --save PTB.pt
  • python -u finetune.py --model QRNN --batch_size 20 --clip 0.2 --wdrop 0.1 --nhid 1550 --nlayers 4 --emsize 400 --dropouth 0.3 --seed 404 --dropouti 0.4 --epochs 300 --save PTB.pt
  • python pointer.py --model QRNN --lambdasm 0.1 --theta 1.0 --window 500 --bptt 5000 --save PTB.pt

Word level WikiText-2 (WT2) with LSTM

The instruction below trains a PTB model that without finetuning achieves perplexities of approximately 68.7 / 65.6 (validation / testing), with finetuning achieves perplexities of approximately 67.4 / 64.7, and with the continuous cache pointer augmentation achieves perplexities of approximately 52.2 / 50.6.

  • python main.py --epochs 750 --data data/wikitext-2 --save WT2.pt --dropouth 0.2 --seed 1882
  • python finetune.py --epochs 750 --data data/wikitext-2 --save WT2.pt --dropouth 0.2 --seed 1882
  • python pointer.py --save WT2.pt --lambdasm 0.1279 --theta 0.662 --window 3785 --bptt 2000 --data data/wikitext-2

Word level WikiText-2 (WT2) with QRNN

The instruction below will a QRNN model that without finetuning achieves perplexities of approximately 69.3 / 66.8 (validation / testing), with finetuning achieves perplexities of approximately 68.5 / 65.9, and with the continuous cache pointer augmentation achieves perplexities of approximately 53.6 / 52.1. Better numbers are likely achievable but the hyper parameters have not been extensively searched. These hyper parameters should serve as a good starting point however.

  • python -u main.py --epochs 500 --data data/wikitext-2 --clip 0.25 --dropouti 0.4 --dropouth 0.2 --nhid 1550 --nlayers 4 --seed 4002 --model QRNN --wdrop 0.1 --batch_size 40 --save WT2.pt
  • python finetune.py --epochs 500 --data data/wikitext-2 --clip 0.25 --dropouti 0.4 --dropouth 0.2 --nhid 1550 --nlayers 4 --seed 4002 --model QRNN --wdrop 0.1 --batch_size 40 --save WT2.pt
  • python -u pointer.py --save WT2.pt --model QRNN --lambdasm 0.1279 --theta 0.662 --window 3785 --bptt 2000 --data data/wikitext-2

Speed

For speed regarding character-level PTB and enwik8 or word-level WikiText-103, refer to the relevant paper.

The default speeds for the models during training on an NVIDIA Quadro GP100:

  • Penn Treebank (batch size 20): LSTM takes 65 seconds per epoch, QRNN takes 28 seconds per epoch
  • WikiText-2 (batch size 20): LSTM takes 180 seconds per epoch, QRNN takes 90 seconds per epoch

The default QRNN models can be far faster than the cuDNN LSTM model, with the speed-ups depending on how much of a bottleneck the RNN is. The majority of the model time above is now spent in softmax or optimization overhead (see PyTorch QRNN discussion on speed).

Speeds are approximately three times slower on a K80. On a K80 or other memory cards with less memory you may wish to enable the cap on the maximum sampled sequence length to prevent out-of-memory (OOM) errors, especially for WikiText-2.

If speed is a major issue, SGD converges more quickly than our non-monotonically triggered variant of ASGD though achieves a worse overall perplexity.

Details of the QRNN optimization

For full details, refer to the PyTorch QRNN repository.

Details of the LSTM optimization

All the augmentations to the LSTM, including our variant of DropConnect (Wan et al. 2013) termed weight dropping which adds recurrent dropout, allow for the use of NVIDIA's cuDNN LSTM implementation. PyTorch will automatically use the cuDNN backend if run on CUDA with cuDNN installed. This ensures the model is fast to train even when convergence may take many hundreds of epochs.

awd-lstm-lm's People

Contributors

julien-c avatar keskarnitish avatar racheltho avatar smerity avatar svc-scm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awd-lstm-lm's Issues

Why is the decoder using nhid as input size even when tie_weights is set at True ?

I never used torch, but if I understand your code correctly the last LSTM layer's hidden size is equal to the first layer input size when tie_weight is true. But the decoder always take the hidden size as input size :
LSTM Layers
self.rnns = [torch.nn.LSTM(ninp if l == 0 else nhid, nhid if l != nlayers - 1 else (ninp if tie_weights else nhid), 1, dropout=0) for l in range(nlayers)]
Decoder
self.decoder = nn.Linear(nhid, ntoken)

There is a commented raise ValueError in the case nhid is different of ninp when using tie_weights :

if tie_weights:
            #if nhid != ninp:
            #    raise ValueError('When using the tied flag, nhid must be equal to emsize')
            self.decoder.weight = self.encoder.weight

So when using tie_weight ninp should be equals to nhid ?
I don't understand why there is this restriction instead of just using ninp as the input size of the decoder when using tie_weights.

I hope you will clarify this for me.

Redundant code in getdata.sh

Lines 8-14 and 54-60 in getdata.sh both contain the same lines of code:

echo "- Downloading WikiText-2 (WT2)"
wget --quiet --continue https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-v1.zip
unzip -q wikitext-2-v1.zip
cd wikitext-2
mv wiki.train.tokens train.txt
mv wiki.valid.tokens valid.txt
mv wiki.test.tokens test.txt

I think that lines 54-60 from getdata.sh can be deleted with no consequence.

Inconsistency between NT-ASGD described in the research paper and main.py

The NT-ASGD algorithm compares the validation loss value with previous n loss values, but I think main.py compares the validation loss value with loss values which are from 1st epoch to t-n epoch because of line 222:
if 't0' not in optimizer.param_groups[0] and (len(best_val_loss)>args.nonmono and val_loss > min(best_val_loss[:-args.nonmono])):
If the line is revised to
if 't0' not in optimizer.param_groups[0] and (len(best_val_loss)>args.nonmono and val_loss > min(best_val_loss[-args.nonmono:])):
the line is consistent with the NT-ASGD described in the research paper.

However, if we use the line, the code starts averaging immediately after the validation metric worsens.
So, what about using the following line?
if 't0' not in optimizer.param_groups[0] and (len(best_val_loss)>args.nonmono and val_loss > max(best_val_loss[-args.nonmono:])):

Dictionary - handling OOV tokens

I was looking into the data.py and saw that the dictionary consists of all tokens in train, val, and test files. I'm wondering if adding unseen tokens in val/test files to the dictionary will affect the testing in any way? Thanks!

Error in moving to GPU

@Smerity I tried the following code:

import torch
import torch.nn as nn

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.gru = nn.GRU(10, 10)
        self.gru = WeightDrop(self.gru, ['weight_hh_l0'], dropout=0.5)

m = Model()
# The following operation throws an error
m.cuda()

Can you take a look and see where the problem occurs? I am using PyTorch 0.2.

Triggering condition for ASGD bug

I think there is a bug in the way ASGD is being triggered. Right now the code is

if args.optimizer == 'sgd' and 't0' not in optimizer.param_groups[0] and (len(best_val_loss)>args.nonmono and val_loss > min(best_val_loss[:-args.nonmono])):

I believe this should be

if args.optimizer == 'sgd' and 't0' not in optimizer.param_groups[0] and (len(best_val_loss)>args.nonmono and val_loss > min(best_val_loss[-args.nonmono:])):

with the difference being that we grab from the end of the best_val_loss list instead of the start.

GPU memory and cap

Hi, training crashed not enough memory on Titan X 12GB with char-LSTM on enwik8

The trick about reducing the "cap" on sequence length links to a 404 URL: could you please let me know where I can do that ?

Thanks a lot for the great code !

Generate broken?

Generate still appears to be broken.
Or is finetune necessary before running generate?

Perhaps I'm doing something wrong, but I have generated with Pytorch's default word language model many times. I haven't dug into your code, assume it is correct... All help appreciated, I'd love to see what QRNN is capable of.

Trained using default settings for QRNN on WT2.
Exited training early.

Then called:

$ python -u generate.py --cuda --words=66 --checkpoint="WT2.pt" --model=QRNN --data=data/wikitext-2

Output:
| Generated 0/66 words
No text file generated.

model.decoder is never used?

I started to suspect something is wrong when generate.py script crarshed. Then I was surprised to see that line output, hidden = model(input, hidden) yileds an output variable of with hidden size of last recurrent layer, not the size of vocabulary. So I took further look into the model.py and was surprised to see that self.decoder is not used at all!
If I understood something wrong, correct me, but now it seems that it should not work at all (at least if tie_weights is not used)

DataParallel

I am training to run the model on multiple GPUs. Probably SplitCrossEntropyLoss causes some troubles, any hints?

File "main.py", line 209, in train
    raw_loss = criterion(model.module.decoder.weight, model.module.decoder.bias, output, targets)
  File "/net/people/plgkwrobel/env-pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/net/people/plgkwrobel/env-pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/net/people/plgkwrobel/env-pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/net/people/plgkwrobel/env-pytorch/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
    raise output
  File "/net/people/plgkwrobel/env-pytorch/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
    output = module(*input, **kwargs)
  File "/net/people/plgkwrobel/env-pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/net/scratch/people/plgkwrobel/awd-lstm-lm/splitcross.py", line 115, in forward
    split_targets, split_hiddens = self.split_on_targets(hiddens, targets)
  File "/net/scratch/people/plgkwrobel/awd-lstm-lm/splitcross.py", line 103, in split_on_targets
    split_hiddens.append(hiddens.masked_select(tmp_mask.unsqueeze(1).expand_as(hiddens)).view(-1, hiddens.size(1)))
  File "/net/people/plgkwrobel/env-pytorch/lib/python3.6/site-packages/torch/tensor.py", line 302, in expand_as
    return self.expand(tensor.size())
RuntimeError: The expanded size of the tensor (280) must match the existing size (550) at non-singleton dimension 0

Multiple GPU option

I extended the code to multiple GPU training, but the GPU usage is extremely imbalanced. The root cause is that we collect all outputs back and calculate loss on one GPU. I tried to put loss calculation inside model.forward() as following:

class RNNModel(nn.Module):
        def init(...):
                super(RNNModel, self).init()
                from splitcross import SplitCrossEntropyLoss
                splits = [2800, 20000, 76000]
                self.criterion = SplitCrossEntropyLoss(ninp, splits=splits, verbose=False)
                ... ...
        def forward(...)
                ... ...
                result = output
                # calculate loss
                result = result.view(result.size(0)*result.size(1), -1)
                raw_loss = self.criterion(decoder_weight, decoder_bias, result, target)
                loss = raw_loss
                # activation regularization
                if args.alpha: loss = loss + sum(args.alpha * dropped_rnn_h.pow(2).mean() for dropped_rnn_h in outputs[-1:])
                # Temporal Activation Regularization (slowness)
                if args.beta: loss = loss + sum(args.beta * (rnn_h[1:] - rnn_h[:-1]).pow(2).mean() for rnn_h in raw_outputs[-1:])
                # expand loss to two dimensional space so it can be gathered via the second dimension
                loss = loss.unsqueeze(1)
                raw_loss = raw_loss.unsqueeze(1)
                if return_h:
                        return raw_loss, loss, hidden, raw_outputs, outputs
                return raw_loss, loss, hidden

Then, in my main.py, I collect the loss and use loss.mean().backward() to update parameters. The interesting thing is, I can successfully finish the first round loss.mean().backward() but failed the second round with error:

RuntimeError: invalid argument 3: Index tensor must have same dimensions as input tensor at
/pytorch/torch/lib/THC/generic/THCTensorScatterGather.cu:199

Can anyone help?
Thanks in advance!

`ValueError: result of slicing is an empty tensor` when trying to run generate.py on QRNN

I've trained a QRNN, but when I try to use generate.py with it, I get the following:

  File "generate.py", line 68, in <module>
    output, hidden = model(input, hidden)
  File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/awd-lstm-lm/model.py", line 82, in forward
    raw_output, new_h = rnn(raw_output, hidden[l])
  File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/miniconda3/lib/python3.6/site-packages/torchqrnn/qrnn.py", line 60, in forward
    Xm1 = [self.prevX if self.prevX is not None else X[:1, :, :] * 0, X[:-1, :, :]]
  File "/miniconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 76, in __getitem__
    return Index.apply(self, key)
  File "/miniconda3/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 16, in forward
    result = i.index(ctx.index)
ValueError: result of slicing is an empty tensor

Weight drop code masking the same "raw" weight?

Hey,

I was inspecting the weight drop (variant of dropconnect) code and I found it a bit confusing (https://github.com/salesforce/awd-lstm-lm/blob/master/weight_drop.py#L34):

for name_w in self.weights:
      raw_w = getattr(self.module, name_w + '_raw')
      w = None
      if self.variational:
          mask = torch.autograd.Variable(torch.ones(raw_w.size(0), 1))
          if raw_w.is_cuda: mask = mask.cuda()
          mask = torch.nn.functional.dropout(mask, p=self.dropout, training=True)
          w = mask.expand_as(raw_w) * raw_w
      else:
          w = torch.nn.functional.dropout(raw_w, p=self.dropout, training=self.training)
      setattr(self.module, name_w, w)

In every iteration the raw_w you get from name_w + '_raw' is the same, isn't it? Because you only setattr to name_w (e.g. weight_hh_l0) at the end. So every time the dropout mask operates on the same raw weight matrix...

Or maybe I just overlooked something. Can someone help me understand this?

Thanks!

Unpredictable behavior of adaptive softmax

The behavior of adaptive softmax is very unpredictable. Sometimes I can run through the whole code on dataset A at the first time, but got error message when training on dataset B with same format and schema. Then, if I switch back to dataset A, the code failed again. Here is the error message:

Traceback (most recent call last):
File "main.py", line 244, in
train()
File "main.py", line 208, in train
loss.backward()
File "/opt/conda/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/opt/conda/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: invalid argument 3: Index tensor must have same dimensions as input tensor at /opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/THC/generic/THCTensorScatterGather.cu:199

This issue has blocked me for a long time. Please review it, thanks!

the script `getdata.sh` creates an empty `enwik8` folder and then finds a python script within the folder

In the getdata.sh script, lines 5 and 6 create an empty data folder.

mkdir -p data
cd data

Line 7 to 25 do not ever mention a script named prep_enwik8.py.

Lines 26 to 31 then create an empty folder named enwik8and then magically finds a python script named prep_enwik8.py within that folder!

echo "- Downloading enwik8 (Character)"
mkdir -p enwik8
cd enwik8
wget --continue http://mattmahoney.net/dc/enwik8.zip
python prep_enwik8.py
cd ..

Few questions about main.py

Hi, i am recently studying about averaging method on Optimizations.
I read your paper 'Regularizing and Optimizing LSTM Language Models' and trying to follow your experiment only on PTB. I have few questions about source code.

  1. In your source code main.py, at line 276, you are using 「't0' not in optimizer.param_groups[0]」 condition. I can understand this condition at all. What does this condition mean?

  2. At the same line, there is condition 「len(best_val_loss)>args.nonmono and val_loss > min(best_val_loss[:-args.nonmono])」.

Is this mean 「After args.nonmono」 of Logging Interval L」 and 「Validation loss of right now epoch is bigger then previous args.nonmono of Logging Interval L」?

  1. After changing SGD to ASGD, how does the program keep update parameter?
    Does it update the parameter with SGD before Last Epoch and return the Averaged parameter at the end
    Or
    Update the parameter by averaging every iteration, Epoch or some interval

  2. It is in the same context with Q3. After the program switch optimization to ASGD, the Validation PPL, BPC stop to changing but the training PPL,BPC keep changing. Why does it happen?

  3. Is there any Averaging stop criterion in this program? If so, what is it?

  4. Is there any Training STOP criterion without maximum EPOCH?

  5. Why did you choose 750 EPOCH as MAXIMUM EPOCH? Is it just because you though it is large enough?

Adaptive softmax question

in the 'An Analysis of Neural Language Modeling at Multiple Scales' paper it states that the hierarchy of the words is determined by their frequency.
For some reason i can't find that in the code. not in the dictionary nor the corpus build.
It seems like the words ids are determined by the order of their occurrence.
please point me to where that takes place.
many thanks

Correct way to continue training?

My training is interupted at epoch 150.
For continuing python main.py training, I've adde a new argument:

parser.add_argument('--load', type=str, default='',
                    help='path to load the final model')

and modifed model instantiation:

if not args.load:
    model = model.RNNModel(args.model, ntokens, args.emsize, args.nhid, args.nlayers, args.dropout, args.dropouth, args.dropouti, args.dropoute, args.wdrop, args.tied)
else:
    with open(args.load, 'rb') as f:
        model = torch.load(f)

Then run the training procedure:
python3 -u main.py --model QRNN --batch_size 20 --clip 0.2 --wdrop 0.1 --nhid 1550 --nlayers 4 --emsize 400 --dropouth 0.3 --seed 9001 --dropouti 0.4 --epochs 400 --save PTB.pt --load PTB.pt

Does following logs look fine?

| end of epoch   1 | time: 103.37s | valid loss  4.19 | valid ppl    65.86
| end of epoch   2 | time: 107.36s | valid loss  4.20 | valid ppl    66.46
| end of epoch   3 | time: 105.37s | valid loss  4.19 | valid ppl    66.01
| end of epoch   4 | time: 106.24s | valid loss  4.20 | valid ppl    66.56
| end of epoch   5 | time: 101.58s | valid loss  4.20 | valid ppl    66.42
| end of epoch   6 | time: 102.41s | valid loss  4.19 | valid ppl    66.22
| end of epoch   7 | time: 104.01s | valid loss  4.19 | valid ppl    66.00
Switching!
| end of epoch   8 | time: 110.03s | valid loss  4.14 | valid ppl    62.92
| end of epoch   9 | time: 109.40s | valid loss  4.14 | valid ppl    62.67
| end of epoch  10 | time: 109.45s | valid loss  4.14 | valid ppl    62.52
| end of epoch  11 | time: 110.47s | valid loss  4.13 | valid ppl    62.39
| end of epoch  12 | time: 111.34s | valid loss  4.13 | valid ppl    62.30
| end of epoch  13 | time: 107.84s | valid loss  4.13 | valid ppl    62.25

splits cross entropy can be further optimized

for idx in range(self.nsplits):

As the word in tail, the probability of this word is p(C) * p(x=target|C), then the entropy is target * log(p(C) * p(x=target|C) = target * log(P(C)) + target + log(p(x=target|C)。

We can just add the cross entropy on the head include tombstones, then compute cross entropy on each tail, so it is no need pass head_entropy below.

generate.py producing bad samples

From a trained word level PTB model that gets 58 test perplexity, generate.py seems to be producing relatively bad samples, even with a low temperature. Is this expected?

E.g.
such four once billion other assume memotec boston years portfolio thought sooner four fund have than down modest findings compound
making it makes york-based appear u.s. declining number western rate again medical where makes fields parts institute nov. n't indicate
mass. brief areas events died questionable replaced relatively vermont asbestos an one latest even cluett reported yield before have director

AttributeError: 'LSTM' object has no attribute 'all_weights'

Failed on pytorch 0.2.0_1

python3.6 main.py --batch_size 20 --data data/penn --dropouti 0.4 --seed 28 --epoch 300 --save PTB.pt
[LSTM(400, 1150, dropout=0.3), LSTM(1150, 1150, dropout=0.3), LSTM(1150, 400, dropout=0.3)]
Applying weight drop of 0.5 to weight_hh_l0
Applying weight drop of 0.5 to weight_hh_l0
Applying weight drop of 0.5 to weight_hh_l0
Traceback (most recent call last):
  File "main.py", line 94, in <module>
    model.cuda()
  File "/data1/XXXX/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 147, in cuda
    return self._apply(lambda t: t.cuda(device_id))
  File "/data1/XXXX/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 118, in _apply
    module._apply(fn)
  File "/data1/XXXX/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 118, in _apply
    module._apply(fn)
  File "/data1/XXXX/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 118, in _apply
    module._apply(fn)
  File "/data1/XXXX/.local/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 116, in _apply
    self.flatten_parameters()
  File "/data1/XXXX/.local/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 104, in flatten_parameters
    all_weights = [[p.data for p in l] for l in self.all_weights]
  File "/data1/XXXX/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 262, in __getattr__
    type(self).__name__, name))
AttributeError: 'LSTM' object has no attribute 'all_weights'

Unable to load model from different directory

Hi there,

I'm trying to load a trained language model from another project. Unfortunately, I'm not able to load it because it requires the definition of the model module. As far as I know, this is a known problem for PyTorch model saved using torch.save (as discussed here: pytorch/pytorch#3678). How do you deal with this problem?

Thank you in advance!

Alessandro

Update codebase to work with PyTorch 0.2

The original codebase was written to be run on PyTorch 0.1.12_2. Updating the codebase to work on PyTorch 0.2 requires a number of steps, including modifying WeightDrop and others.

Best would be to provide two branches - one with the current PyTorch 0.1.12_2 codebase (allowing for exact result replication) and a second branch that is updated to allow for PyTorch 0.2.

Weights sharing

Hi guys! Thanks for sharing this awesome project.

I would like to share my weights similar to - asd-lstm weights.
Is there any way I can do that?

Issues with SplitCrossEntropyLoss

  1. Parameters in SplitCrossEntropyLoss are not being updated since they are missing from the optimizer.
  2. Parameters in SplitCrossEntropyLoss are being initialized to 0. EDIT: After reflection, I'm not sure this matters.

Locally, I fixed these issues and ran some very short experiments on 140 batches on wikitext-103 as follows:

Fine-tune broken for QRNNs?

I made some modifications to the codebase, so this might be a problem on my end... But does finetune.py require SplitCrossEntropyLoss to be used for the criterion instead? The decoder is called in SplitCrossEntropyLoss only. I added the appropriate SplitCrossEntropyLoss in finetune.py, and it works as expected.

Question about embedding dropout vs lockeddropout

why do you apply embedding dropout in line 70 of the model.py and then apply LockedDropout in line 73?

doesn't both functions have the same functionality regrading the dropout?

is it equivalent to applying the embedding dropout with an higher rate?

many thanks

How-to generate after training word level qrnn?

After training using Word level WikiText-103 (PTB) with QRNN

Try to finetune:

File "finetune.py", line 107, in evaluate
    if args.model == 'QRNN': model.reset()
AttributeError: 'list' object has no attribute 'reset'

Try to generate:

File "generate.py", line 51, in <module>
    model.eval()
AttributeError: 'list' object has no attribute 'eval'

All help appreciated.

Model crashes under pytorch 0.4

Hi,
The folks over at pytorch are working on cutting a new 0.4 release. We'd like to make the transition as smooth as possible (if you were planning on upgrading), so we've been testing a number of community repos.

I ran a model and it errors out due to a change in pytorch. Minimal repro:

# Install pytorch-nightly (Currently our pre-release branch)
conda install pytorch-nightly -c pytorch

# Get data
./getdata.sh

# Run model
python main.py --batch_size 20 --data data/penn --dropouti 0.4 --dropouth 0.25 --seed 141 --epoch 1 && \
python -u main.py --model QRNN --batch_size 20 --clip 0.2 --wdrop 0.1 --nhid 1550 --nlayers 4 --emsize 400 --dropouth 0.3 --seed 9001 --dropouti 0.4 --epochs
1

Stack trace: https://gist.github.com/zou3519/142d48df1c03db9fe9c11717ad9a59f2

Pytorch 0.4 adds zero-dimensional tensors that cannot be iterated over, which seems to be what the error is complaining about. Changing

return tuple(repackage_hidden(v) for v in h)
in particular to handle this case should fix it.

cc @soumith

Finetuning on different corpus

I am trying to train QRNN model on one dataset and then finetune on another, but I get the following erorr when I try to finetune on different dataset that the model was trained initially:
raw_loss = criterion(output.view(-1, ntokens), targets) RuntimeError: invalid argument 2: size '[-1 x 8967]' is invalid for input with 14600000 elements at /pytorch/torch/lib/TH/THStorage.c:37
Would you be able to explain what this error means?

Do I need to pop the last layer and substitute it with the new Linear layer with needed number of classes?

How can i use Adam optimizer instead of SGD?

Hi! First of all, thanks for your code. I am recently studying your paper "Regularizing and Optimizing LSTM Language Models".

I want to compare Adam optimizer and SGD optimizer with applying NT-ASGD which u proposed.

I tried your command with some addition and your python code.

"python main.py --batch_size 20 --data data/penn --dropouti 0.4 --dropouth 0.25 --seed 141 --epoch 500 --save SGD_PTB.pt --optimizer sgd"
"python main.py --batch_size 20 --data data/penn --dropouti 0.4 --dropouth 0.25 --seed 141 --epoch 500 --save Adam_PTB.pt --optimizer adam"

The thing is that the first command does work good, but the second command work but doesn't calculate loss and ppl and bpc. I copied the log of it below. Please give me any possible solution for this if you don't mind.

| end of epoch 14 | time: 48.42s | valid loss nan | valid ppl nan | valid bpc nan

| epoch 15 | 200/ 663 batches | lr 30.00000 | ms/batch 67.94 | loss nan | ppl nan | bpc nan
| epoch 15 | 400/ 663 batches | lr 30.00000 | ms/batch 68.18 | loss nan | ppl nan | bpc nan
| epoch 15 | 600/ 663 batches | lr 30.00000 | ms/batch 67.13 | loss nan | ppl nan | bpc nan

| end of epoch 15 | time: 48.31s | valid loss nan | valid ppl nan | valid bpc nan

| epoch 16 | 200/ 663 batches | lr 30.00000 | ms/batch 67.27 | loss nan | ppl nan | bpc nan
| epoch 16 | 400/ 663 batches | lr 30.00000 | ms/batch 65.48 | loss nan | ppl nan | bpc nan
| epoch 16 | 600/ 663 batches | lr 30.00000 | ms/batch 67.29 | loss nan | ppl nan | bpc nan

| end of epoch 16 | time: 48.28s | valid loss nan | valid ppl nan | valid bpc nan

| epoch 17 | 200/ 663 batches | lr 30.00000 | ms/batch 67.21 | loss nan | ppl nan | bpc nan
| epoch 17 | 400/ 663 batches | lr 30.00000 | ms/batch 65.92 | loss nan | ppl nan | bpc nan
| epoch 17 | 600/ 663 batches | lr 30.00000 | ms/batch 66.32 | loss nan | ppl nan | bpc nan

What does Finetune do?

The finetune.py file looks to be the same as the main.py file. The paper does not cover the techniques used in the fine tune stage. What are the important techniques that are able to increase performance?

forward function takes too many arguments?

I copy/paste the command to train the model, but get the error below

$python34 C:/Users/dat/Desktop/awd-lstm-lm/main.py --batch_size 20 --data C:/Users/dat/Desktop/awd-lstm-lm/data/penn --dropouti 0.4 --dropouth 0.25 --se ed 141 --epoch 500 --save C:/Users/dat/Desktop/awd-lstm-lm/PTB.pickle
Applying weight drop of 0.5 to weight_hh_l0
Applying weight drop of 0.5 to weight_hh_l0
Applying weight drop of 0.5 to weight_hh_l0
[WeightDrop (
(module): LSTM(400, 1150)
), WeightDrop (
(module): LSTM(1150, 1150)
), WeightDrop (
(module): LSTM(1150, 400)
)]
Args: Namespace(alpha=2, batch_size=20, beta=1, bptt=70, clip=0.25, cuda=True, data='C:/Users/dat/Desktop/awd-lstm-lm/data/penn', dropout=0.4, dropoute=0.1, dropouth=0.25, dropouti=0.4, emsize=400, epochs=500, log_interval=200, lr=30, model='LSTM', nhid=1150, nlayers=3, nonmono=5, save='C:/Users/dat/Desktop/awd-lstm-lm/PTB.pickle', seed=141, tied=True, wdecay=1.2e-06, wdrop=0.5)
Model total parameters: 24221600
Traceback (most recent call last):
File "C:/Users/dat/Desktop/awd-lstm-lm/main.py", line 185, in
train()
File "C:/Users/dat/Desktop/awd-lstm-lm/main.py", line 146, in train
output, hidden, rnn_hs, dropped_rnn_hs = model(data, hidden, return_h=True)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "C:\Users\dat\Desktop\awd-lstm-lm\model.py", line 70, in forward
emb = embedded_dropout(self.encoder, input, dropout=self.dropoute if self.training else 0)
File "C:\Users\dat\Desktop\awd-lstm-lm\embed_regularize.py", line 21, in embedded_dropout
embed.scale_grad_by_freq, embed.sparse
TypeError: forward() takes 3 positional arguments but 8 were given

finetune & pointer bugs?

python finetune.py --epochs 750 --data data/wikitext-2 --save WT2.pt --dropouth 0.2 --seed 1882
python pointer.py --save WT2.pt --lambdasm 0.1279 --theta 0.662 --window 3785 --bptt 2000 --data data/wikitext-2

Traceback (most recent call last):
File "finetune.py", line 183, in
stored_loss = evaluate(val_data)
File "finetune.py", line 108, in evaluate
model.eval()

Looks like model loading & more needs to be modified.

Also, I no longer get the reported ppls in main. LSTM gets stuck around 80s and QRNN around 90s.

Detail on WeightDrop class `_setup()` cuDNN RNN weight compacting issue & `register_parameter()`

Hi there,
cc @Smerity

Thanks for sharing the code first of all. I've been diving into the details and would really appreciate if you could share some insight into WeightDrop class' self._setup() method.

I have 2 questions.

  1. regarding the comment on the cuDNN RNN weight compacting issue, code here. Could anyone expand on what exactly this issue is?

  2. Why does the code delete parameters and registering them again by calling register_parameter()? code here

Thanks.

Question about using monotonic AvSGD

Hi,

I have some questions about the "Regularizing and Optimizing LSTM Languaguage models" paper. In the second line of the first paragraph in page 10 of the paper, you mentioned "using a monotonic criterion instead also hampered performance." I am not sure about what do you mean by "a monotonic criterion"? Do you mean by using AvSGD once the validation metric fails to improve?

In addition, I am also confused about the "dropouti" flag uou used in main.py script in line 39. The help argument said "dropout for input embedding layers (0 = no dropout)", is "dropouti" the dropout applied to the input layer which is the before the embedding layer?

Thank you so much.

Multi-GPU size mismatch

I try to train lm with multi-GPU(In this case, I use 3 GPU) by set "model = torch.nn.DataParallel(model).cuda()", and change "model.init_hidden" to "model.module.init_hidden", then meet error:

2017-11-09 4 42 48

It seems that only one GPU's result has been collected, I can't explain what happened.

In line 252, val_loss should be val_loss2 isn't it?

Hi, I think i just found some error of main.py.

In line 252, the val_loss should be val_loss2 right?

Because after the program switch to ASGD mode, it would not calculate val_loss so, the log of program will show no change after switching to ASGD about Validation result.

Mention requirements and instructions for QRNN in readme/requirements.txt

Hi,

First thanks for releasing this, it has been quite helpful.
Would be great if the README page mentioned in software requirements the dependency on pytorch-qrnn (for QRNN-based models). Currently, following the instructions and running one of the standard QRNN models will just throw a ModuleNotFoundError with no instructions. Would be great if there was a prior mention and/or a try/catch with a link to https://github.com/salesforce/pytorch-qrnn .

Low number of unique words predicted

I would like to perform a sanity check by passing some input to the model and reading the output text.

Following the PyTorch tutorial on language modelling (https://github.com/pytorch/examples/blob/master/word_language_model/generate.py), I have edited the evaluate function:

def evaluate(data_source, batch_size=10):
    # Turn on evaluation mode which disables dropout.
    if args.model == 'QRNN': model.reset()
    model.eval()
    total_loss = 0
    ntokens = len(corpus.dictionary)
    hidden = model.init_hidden(batch_size)
    for i in range(0, data_source.size(0) - 1, args.bptt):
        data, targets = get_batch(data_source, i, args, evaluation=True)

        print ("inputs")
        inp = data.cpu().data.numpy()
        for input_ in inp:
            print ([created_inverse_tokenizer_during_training[i] for i in input_])

        output, hidden = model(data, hidden)

        word_weights = output.squeeze().data.div(args.temperature).exp().cpu()
        word_idx = torch.multinomial(word_weights, 10)

        print ("outputs")
        for word_ in word_idx:
            for item_ in word_:
                print ("next word", created_inverse_tokenizer_during_training[item_])
            print ("")

        output_flat = output.view(-1, ntokens)
        total_loss += len(data) * criterion(output_flat, targets).data
        hidden = repackage_hidden(hidden)
    return total_loss[0] / len(data_source)

, where created_inverse_tokenizer_during_training is idx2word from Dictionary class

I am testing on ptb dataset and I get the following with approximately 60 perplexity value:

inputs:
[made, value, $, their, intends, N, also, south, , or]
[much, criteria, N, office, to, return, closed, as, one, $]
[difference, devised, billion, visits, restrict, on, sharply, it, analyst, N]
[in, by, , as, the, assets, lower, became, peter, a]
[liquidity, benjamin, a, , rtc, for, across, more, , share]
[in, graham, , breaks, to, security, europe, clear, of, in]
[the, an, , , treasury, pacific, particularly, that, , the]
[pit, analyst, by, but, borrowings, and, in, a, &, fiscal]
[, and, an, massage, only, an, frankfurt, repeat, co., year]
[it, author, , no, unless, N, although, of, new, just]
["s", in, not, matter, the, N, london, the, york, ended]
[too, the, , how, agency, return, and, october, said, up]
[soon, 1930s, though, , receives, on, a, N, the, from]
[to, and, , is, specific, equity, few, crash, gold, $]
[tell, , , still, congressional, , other, was, market, N]
[but, who, english, associated, authorization, the, markets, "nt", already, million]
[people, is, butler, in, , loan, recovered, at, had, in]
[do, widely, in, many, such, growth, some, hand, some, fiscal]
["nt", considered, his, minds, agency, offset, ground, , good, N]
[seem, to, , with, , continuing, after, professionals, , and]
[to, be, proceeds, , borrowing, real-estate, stocks, dominated, technical, $]
[be, the, as, fronts, is, loan, began, municipal, factors, N]
[unhappy, father, if, for, unauthorized, losses, to, trading, that, million]
[with, of, the, , and, in, rebound, throughout, would, in]
[it, modern, realistic, and, expensive, the, in, the, have, N]

outputs:
[berlitz, hydro-quebec, banknote, centrust, gitano, cluett, guterman, aer, fromstein, calloway]
[berlitz, centrust, cluett, fromstein, aer, gitano, hydro-quebec, guterman, calloway, banknote]
[banknote, hydro-quebec, calloway, fromstein, berlitz, gitano, cluett, aer, guterman, centrust]
[calloway, berlitz, cluett, centrust, aer, gitano, hydro-quebec, banknote, guterman, fromstein]
[fromstein, hydro-quebec, aer, banknote, gitano, berlitz, calloway, cluett, centrust, guterman]
[calloway, hydro-quebec, guterman, fromstein, berlitz, banknote, cluett, centrust, gitano, aer]
[gitano, fromstein, hydro-quebec, cluett, calloway, centrust, berlitz, guterman, aer, banknote]
[berlitz, gitano, banknote, cluett, calloway, aer, centrust, fromstein, hydro-quebec, guterman]
[calloway, gitano, guterman, berlitz, centrust, hydro-quebec, cluett, aer, fromstein, banknote]
[hydro-quebec, berlitz, fromstein, gitano, cluett, calloway, aer, centrust, guterman, banknote]
[aer, cluett, fromstein, berlitz, guterman, calloway, hydro-quebec, centrust, banknote, gitano]
[cluett, calloway, centrust, fromstein, banknote, gitano, guterman, hydro-quebec, aer, berlitz]
[hydro-quebec, fromstein, calloway, aer, banknote, berlitz, cluett, gitano, centrust, guterman]
[banknote, gitano, aer, centrust, cluett, fromstein, calloway, guterman, hydro-quebec, berlitz]
[calloway, aer, gitano, berlitz, fromstein, cluett, guterman, banknote, hydro-quebec, centrust]
[banknote, cluett, fromstein, berlitz, gitano, aer, centrust, calloway, hydro-quebec, guterman]
[cluett, fromstein, aer, calloway, guterman, banknote, berlitz, gitano, centrust, hydro-quebec]
[aer, guterman, berlitz, gitano, centrust, cluett, calloway, hydro-quebec, fromstein, banknote]
[centrust, fromstein, cluett, berlitz, aer, banknote, guterman, gitano, calloway, hydro-quebec]
[guterman, banknote, fromstein, cluett, gitano, calloway, aer, centrust, berlitz, hydro-quebec]
[calloway, berlitz, aer, banknote, hydro-quebec, fromstein, cluett, guterman, gitano, centrust]
[banknote, hydro-quebec, berlitz, fromstein, guterman, calloway, cluett, centrust, gitano, aer]
[centrust, aer, fromstein, cluett, hydro-quebec, calloway, gitano, berlitz, guterman, banknote]
[fromstein, centrust, aer, banknote, berlitz, guterman, gitano, hydro-quebec, calloway, cluett]
[cluett, banknote, hydro-quebec, gitano, berlitz, fromstein, calloway, guterman, centrust, aer]

As you can see, the number of unique words in the output is rather small. Why is that? Or am I doing it wrong?

Finetune issue

Hello guys, first thanks for sharing your code with us.

I have noticed a problem when running the fine-tune process as I'm getting an error

RuntimeError: invalid argument 2: size '[-1 x 10000]' is invalid for input with 227500 elements at /pytorch/aten/src/TH/THStorage.c:37

and it happens when

output_flat = output.view(-1, ntokens)

is called in the evaluate function of finetune.py.

After some investigation, I have found that the call for

decoded = self.decoder(output.view(output.size(0)*output.size(1), output.size(2)))

have been dropped from model.py

I understand that the SplitCrossEntropyLoss is doing this step for us on training but, given the fine tune is done with regular cross entropy, shouldn't we include this line back in the code?

My apologies if I'm missing something!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.