nyu-dl / dl4mt-nonauto Goto Github PK
View Code? Open in Web Editor NEWLicense: BSD 3-Clause "New" or "Revised" License
License: BSD 3-Clause "New" or "Revised" License
I'm having trouble reproducing the image captioning results, using the pre-trained models.
For the pre-trained AR model, I get:
iter 1 | BLEU = 4.41, 44.0/12.4/2.3/0.3 (BP=1.000, ratio=1.042, hyp_len=56450, ref_len=54191)
For the pre-trained NAR model, I get:
iter 1 | BLEU = 0.75, 40.4/5.3/0.3/0.0 (BP=1.000, ratio=1.024, hyp_len=55862, ref_len=54575)
iter 2 | BLEU = 0.98, 41.2/4.7/0.3/0.0 (BP=1.000, ratio=1.024, hyp_len=55861, ref_len=54575)
iter 3 | BLEU = 0.95, 42.5/4.6/0.3/0.0 (BP=1.000, ratio=1.024, hyp_len=55862, ref_len=54575)
iter 4 | BLEU = 0.88, 42.9/4.4/0.3/0.0 (BP=1.000, ratio=1.024, hyp_len=55862, ref_len=54575)
Also, I'm assuming the output format is:
BLEU = [weighted BLEU], [BLEU-1]/[BLEU-2]/[BLEU-3]/[BLEU-4] -- is this correct?
I am trying to reproduce the results of IWSLT-16 En-De experiment, using pretrained models.
I got the dataset from the link written in README (https://drive.google.com/file/d/1m7dZqEXHWPYcre6xxsFwFLrb9CRCZGmn/view?usp=sharing)
I want to apply the test data, but the bpe-version of the data is not available in the directory.
(there are valid.en-de.bpe.de, valid.en-de.bpe.en
for validation data, but not for test data)
Can you share them?
Hi, I have been trying to reproduce the validation results of IWSLT-16 En-De experiments using the pre-trained models. However, I'm getting the following results for the AR model:
UserWarning:
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
iter 1 | BLEU = 0.00, 3.4/0.2/0.0/0.0 (BP=1.000, ratio=2.090, hyp_len=41421, ref_len=19823)
For the AR model, I'm running:
python run.py --dataset iwslt-ende --vocab_size 40000 --ffw_block highway --params small --lr_schedule anneal --batch_size 1024 --mode test --debug --gpu 0 --load_vocab --load_from ~/nlp/dl4mt-nonauto/models/iwslt16/ende/02.05_23.12.ar_voc40k_2048_5_278_507_2_drop_0.1_0.0003_tran_high_
For the NAT model, I'm running:
python run.py --dataset iwslt-ende --vocab_size 40000 --ffw_block highway --params small --lr_schedule anneal --batch_size 128 --fast --valid_repeat_dec 20 --use_argmax --next_dec_input both --mode test --remove_repeats --debug --gpu 0 --load_vocab --trg_len_option predict --use_predicted_trg_len --load_from ~/nlp/dl4mt-nonauto/models/iwslt16/ende/02.08_20.10.ptrn_model_voc40k_2048_5_278_507_2_drop_0.1_drop_len_pred_0.3_0.0003_anne_anneal_steps_250000_high_tr4_2decs__pred_both_copy_argmax_
May I know what are the correct flags to set? Thanks!
Thank you for providing us the preprocessed dataset.
Could do please tell me How is your WMT16 EN-Ro Dataset Preprocessed?
From raw 612422 sentence pairs to 608319 sentence pairs?
Also, it seems that the dataset (En-Ro) has been shuffled or reorganized?
Thank you for sharing the code!
I tried running your model with multiple GPU settings as follows, and I got an error from BucketIterator.
It seems that BucketIterator (from torchtext) does not accept num_gpus
argument.
I am using torchtext (version 0.3.1).
python run.py --dataset iwslt-ende --vocab_size 40000 --ffw_block highway --params small --lr_schedule anneal --fast --valid_repeat_dec 8 --use_argmax --next_dec_input both --denoising_prob 0.5 --layerwise_denoising_weight --use_distillation --num_gpus 3
2019-02-20 16:05:59 INFO: - random seed is 19920206
2019-02-20 16:05:59 INFO: - TRAINING CORPUS : /work01/kiyono/dl4mt-nonauto-data/iwslt/en-de/distill/ende/train.tags.en-de.bpe
2019-02-20 16:06:02 INFO: - before pruning : 195897 training examples
2019-02-20 16:06:02 INFO: - after pruning : 195897 training examples
Traceback (most recent call last):
File "run.py", line 572, in <module>
num_gpus=args.num_gpus)
TypeError: __init__() got an unexpected keyword argument 'num_gpus'
Do you have any ideas about how to avoid this error?
Could you please share the bpe codes file?
Or could you please tell us how many bpe symbols you learn?
While training the non-autoregressive model from scratch on CPU , the train loss value computes to zero and hence parameters are not getting updated. I have only removed the use distillation flag. Else everything is same except the dataset.
Hi,
In test mode, on pre-trained non-autoregressive model and with the MS COCO dataset, I got the RuntimeError about the batch size as below, As I should write another Collate function or I have to do something else? Could you please advise me in this regard?
start decoding: 0%| | 0/200 [00:00<?, ?it/s]Traceback (most recent call last):
File "/media/deeplab/f6321bd3-2eb4-461a-9abc-d10e94252592/Vahid Chahkandi/dl4mt-nonauto-master/run.py", line 612, in <module>
names=["test." + xx for xx in names], maxsteps=None)
File "/media/deeplab/f6321bd3-2eb4-461a-9abc-d10e94252592/Vahid Chahkandi/dl4mt-nonauto-master/decode.py", line 173, in decode_model
for iters, dev_batch in enumerate(dev):
File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363,
in __next__
data = self._next_data()
File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 989,
in _next_data
return self._process_data(data)
File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1014,
in _process_data data.reraise()
File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/_utils.py", line 395,
in reraise raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 185,
in _worker_loop data = fetcher.fetch(index)
File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47,
in fetch
return self.collate_fn(data)
File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 84,
in default_collate return [default_collate(samples) for samples in transposed]
File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 84,
in <listcomp>
return [default_collate(samples) for samples in transposed]
File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 82,
in default_collate raise RuntimeError('each element in list of batch should be of equal size')
RuntimeError: each element in list of batch should be of equal size
Hi, first of all, thank you for sharing your code.
could you please advise what I have to do when receiving this error when one class called "model.py".
line 265, in
args.block_cls = model.HighwayBlock
NameError: name 'model' is not defined
Thank you in advance.
In ran my code on my desktop PC and everything was ok, but when I try to run it on gColab the error which said "No such file or directory" appears, whereas I'm pretty sure the directory exists and the path is correct.
in the first line of Error it seems there is a problem with No event loop integration for 'inline'
that I don't know what it is, it might the main problem is because of this!!
I appreciate any help.
The Error is:
UnknownBackend: No event loop integration for 'inline'. Supported event loops are: qt, qt4, qt5, gtk, gtk2, gtk3, tk, wx, pyglet, glut, osx
Traceback (most recent call last):
File "/content/gdrive/My Drive/Colab Notebooks/Retrieving/run.py", line 414, in <module>
distill=(args.mode == "distill"), use_distillation=args.use_distillation)
File "/content/gdrive/My Drive/Colab Notebooks/Retrieving/data.py", line 120, in __init__
self.train_data, self.train_sampler = self.prepare_train_data(path, train_f, batch_size, max_len=max_len, size=None)
File "/content/gdrive/My Drive/Colab Notebooks/Retrieving/data.py", line 131, in prepare_train_data
bpes, features_path, bpe2img, img2bpes = process_json(dataPath, annFile, max_len=max_len, size=size)
File "/content/gdrive/My Drive/Colab Notebooks/Retrieving/mscoco.py", line 18, in process_json
annots = json.load(open(annPath, 'r'))
FileNotFoundError: [Errno 2] No such file or directory: 'mscoco/mscoco_data/mscoco/karpathy_split/train.json.bpe.fixed'
If you suspect this is an IPython bug, please report it at:
https://github.com/ipython/ipython/issues
or send an email to the mailing list at [email protected]
Hi,
First off, thanks for sharing your code. I'm trying to understand how the distillation for an AR model is implemented. From what I understand after looking at the code, the logits from the AR model are saved to disk.
I'm having trouble finding where/how they are loaded back during training. Could you point me to where this is actually done?
Thanks!
Lucas
size mismatch for encoder.out.weight: copying a param with shape torch.Size([36377, 278]) from checkpoint, the shape in current model is torch.Size([38022, 278]).
size mismatch for decoder.0.out.weight: copying a param with shape torch.Size([36377, 278]) from checkpoint, the shape in current model is torch.Size([38022, 278]).
size mismatch for decoder.1.out.weight: copying a param with shape torch.Size([36377, 278]) from checkpoint, the shape in current model is torch.Size([38022, 278]).
I found the transformer usually get BLEU score around 27-28 for WMT14 EN-DE. However, in the paper, the AR model only gets around 24? I am curious about what is the AR model. Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.