zihangdai / mos Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hi author,I'm a newbie on NLP. I don't understand the use of "bptt"(args.bptt = 70) and some expressions related to it,such as followings:
bptt = args.bptt if np.random.random() < 0.95 else args.bptt / 2.
seq_len = max(5, int(np.random.normal(bptt, 5)))
seq_len = min(seq_len, args.bptt + args.max_seq_len_delta)
optimizer.param_groups[0]['lr'] = lr2 * seq_len / args.bptt
To my understanding, seq_len is the time-step of RNN. So what's the relationship between "bptt" and "seq_len"? Could you please help to explain some of it? Thanks
Hi,
I was trying to run MoS on WikiText-103 and 1B word dataset. I wonder if you have used adaptive softmax, such as this paper Efficient Softmax Approximation for GPUs, when running 1B word dataset?
Thank you!
Hello! I'm running into an issue with training the Penn Treeback model:
python main.py --data data/penn --dropouti 0.4 --dropoutl 0.29 --dropouth 0.225 --seed 28 --batch_size 12 --lr 20.0 --epoch 1000 --nhid 960 --nhidlast 620 --emsize 280 --n_experts 15 --save PTB --single_gpu
Traceback (most recent call last):
File "main.py", line 260, in <module>
train()
File "main.py", line 202, in train
hidden[s_id] = repackage_hidden(hidden[s_id])
File "/mos/utils.py", line 12, in repackage_hidden
return tuple(repackage_hidden(v) for v in h)
File "/mos/utils.py", line 12, in <genexpr>
return tuple(repackage_hidden(v) for v in h)
File "/mos/utils.py", line 12, in repackage_hidden
return tuple(repackage_hidden(v) for v in h)
File "/mos/utils.py", line 12, in <genexpr>
return tuple(repackage_hidden(v) for v in h)
File "/mos/utils.py", line 12, in repackage_hidden
return tuple(repackage_hidden(v) for v in h)
File "/mos/utils.py", line 12, in <genexpr>
return tuple(repackage_hidden(v) for v in h)
File "/mos/utils.py", line 12, in repackage_hidden
return tuple(repackage_hidden(v) for v in h)
File "/mos/utils.py", line 12, in <genexpr>
return tuple(repackage_hidden(v) for v in h)
File "/mos/utils.py", line 12, in repackage_hidden
return tuple(repackage_hidden(v) for v in h)
File "/mos/utils.py", line 12, in <genexpr>
return tuple(repackage_hidden(v) for v in h)
File "/mos/utils.py", line 12, in repackage_hidden
return tuple(repackage_hidden(v) for v in h)
File "/usr/local/lib/python3.5/dist-packages/torch/tensor.py", line 360, in __iter__
raise TypeError('iteration over a 0-d tensor')
TypeError: iteration over a 0-d tensor
Hi! Thanks for sharing this code base! Do you have a pre-trained model that we could use? We want to test the idea of incorporating a language model into our project first, so it would be great if we could utilize a pre-trained model for this purpose rather than train a new model from scratch.
Thanks so much!
I am trying to train the model under Windows 10 with CUDA 9.2 (NVIDIA 1070 GPU).
Pytorch version 1.0.0
I get the following error when trying to build from wikitext-2:
C:\Users\vlad\Anaconda3\envs\py36\python.exe C:/Users/vlad/Documents/GitHub/mos/main.py --epochs 1000 --data data/wikitext-2 --save WT2 --dropouth 0.2 --seed 1882 --n_experts 15 --nhid 1150 --nhidlast 650 --emsize 300 --batch_size 15 --lr 15.0 --dropoutl 0.29 --small_batch_size 5 --max_seq_len_delta 20 --dropouti 0.55 --single_gpu
Experiment dir : WT2-20181223-005932
torch.Size([139241, 15])
torch.Size([21764, 10])
torch.Size([245569, 1])
Applying weight drop of 0.5 to weight_hh_l0
Applying weight drop of 0.5 to weight_hh_l0
Applying weight drop of 0.5 to weight_hh_l0
param size: 34909528
Args: Namespace(alpha=2, batch_size=15, beta=1, bptt=70, clip=0.25, continue_train=False, cuda=True, data='data/wikitext-2', dropout=0.4, dropoute=0.1, dropouth=0.2, dropouti=0.55, dropoutl=0.29, emsize=300, epochs=1000, log_interval=200, lr=15.0, max_seq_len_delta=20, model='LSTM', n_experts=15, nhid=1150, nhidlast=650, nlayers=3, nonmono=5, save='WT2-20181223-005932', seed=1882, single_gpu=True, small_batch_size=5, tied=True, wdecay=1.2e-06, wdrop=0.5)
Model total parameters: 34909528
C:\Users\vlad\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\rnn.py:179: RuntimeWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
self.dropout, self.training, self.bidirectional, self.batch_first)
Traceback (most recent call last):
File "C:/Users/vlad/Documents/GitHub/mos/main.py", line 261, in <module>
train()
File "C:/Users/vlad/Documents/GitHub/mos/main.py", line 205, in train
log_prob, hidden[s_id], rnn_hs, dropped_rnn_hs = parallel_model(cur_data, hidden[s_id], return_h=True)
File "C:\Users\vlad\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\vlad\Documents\GitHub\mos\model.py", line 84, in forward
raw_output, new_h = rnn(raw_output, hidden[l])
File "C:\Users\vlad\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\vlad\Documents\GitHub\mos\weight_drop.py", line 47, in forward
return self.module.forward(*args)
File "C:\Users\vlad\Anaconda3\envs\py36\lib\site-packages\torch\nn\modules\rnn.py", line 179, in forward
self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: shape '[5290000, 1]' is invalid for input of size 4600
Hi Mr.Dai,
It seems the performance discrepancy has lead to slightly worse performance on the penn treebank in the readme page. I am able to reproduce the result for the Penn Treebank. However, for the WT2 benchmark. Before finetune I am able to get 65.66/62.94, and After finetune 64.41/61.77, for default dynamic evaluation setting I am getting 43.34/41.49. I also observed similar slight performance decrease(about 0.6) for pt0.4.0 and 0.2.0 in a following work of mos(ChengyueGongR/Frequency-Agnostic#2). I am wondering is this result in-line with your benchmark in pytorch 0.4.0? Thank you for your time!
I try to run the script model.py. But a NotImplementedError occurs.
Traceback (most recent call last): File "/Users/TONY/Downloads/mos-master/model.py", line 131, in <module> model(input, hidden) File "/Users/TONY/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__ param size: 4426 result = self.forward(*input, **kwargs) File "/Users/TONY/Downloads/mos-master/model.py", line 71, in forward emb = embedded_dropout(self.encoder, input, dropout=self.dropoute if self.training else 0) File "/Users/TONY/Downloads/mos-master/embed_regularize.py", line 19, in embedded_dropout X = embed._backend.Embedding.apply(words, masked_embed_weight, File "/Users/TONY/anaconda/lib/python3.6/site-packages/torch/nn/backends/backend.py", line 10, in __getattr__ raise NotImplementedError NotImplementedError
Hi,
When I try to reproduce the results, I find that the model converges ~100 epoch and the valid ppl is 65.32, which is much higher than the published results. The only thing I changed is the number of experts (which I increased from 15 to 16), but I don't think that can explain the large difference between my result and yours.
My python is 3.6.7 and pytorch version is 0.4.0
For the training log. I trained for 300 epoch and the last 200 epoch has almost identical output as the 100th, so I didn't put that up.
CUDA_VISIBLE_DEVICES=1 python3 main.py --data data/penn --dropouti 0.4 --dropoutl 0.29 --dropouth 0.225 --seed 28 --batch_size 12 --lr 20.0 --epoch 1000 --nhid 960 --nhidlast 620 --emsize 280 --n_experts 16 --save PTB --single_gpu
Experiment dir : PTB-20190520-055046
torch.Size([77465, 12])
torch.Size([7376, 10])
torch.Size([82430, 1])
ModuleList(
(0): LSTM(280, 960)
(1): LSTM(960, 960)
(2): LSTM(960, 620)
)
param size: 21675120
Args: Namespace(alpha=2, batch_size=12, beta=1, bptt=70, clip=0.25, continue_train=False, cuda=True, data='data/penn', dropout=0.4, dropoute=0.1, dropouth=0.225, dropouti=0.4, dropoutl=0.29, emsize=280, epochs=1000, log_interval=200, lr=20.0, max_seq_len_delta=40, model='LSTM', n_experts=16, nhid=960, nhidlast=620, nlayers=3, nonmono=5, save='PTB-20190520-055046', seed=28, single_gpu=True, small_batch_size=12, tied=True, wdecay=1.2e-06, wdrop=0.5)
Model total parameters: 21675120
Which GPU does the first recommended command line run well on? On an Nvidia 1050ti (4Gb of GPU memory) it exhausts GPU memory with a batch size of 12, but runs with a batch size of 6. With that setting, I get order of 300ms per batch. Ubuntu Linux 16.04 on a cheap used Dell 7500 with two 6 core Xeons, if it matters.
I am running this code on my Linux server, however it seems that this python program is only runned by one of my computer's cpu and leaves two thirds total RAM free. Is where any way to accelerate the program?
Hi, Zihang
It seems that the original link for downloading Penn Treebank data no longer exists. Could you please update this and let me know where I can download the same data to replicate your experiment ?
Thanks,
Yuzhou
Is the "Dynamic Evaluation of Neural Sequence Models" ( https://arxiv.org/abs/1709.07432 ) already implemented?
I am asking because there are benchmark available with the evaluation method and this RNN-model.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.