nyu-dl / dl4mt-nonauto Goto Github PK

View Code? Open in Web Editor NEW

119.0 119.0 17.0 56 KB

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

dl4mt-nonauto's People

Contributors

Stargazers

Watchers

Forkers

kastnerkyle pbaljeka kormilitzin jdc08161063 codeaudit johnyjyu mansimov hanfeijp afcarl leejayyoon yoonjung wszlong chenjun2hao minhtannguyen chenyangh yjlong fangzheng354

dl4mt-nonauto's Issues

Reproducing MSCOCO image captioning results

I'm having trouble reproducing the image captioning results, using the pre-trained models.

For the pre-trained AR model, I get:
iter 1 | BLEU = 4.41, 44.0/12.4/2.3/0.3 (BP=1.000, ratio=1.042, hyp_len=56450, ref_len=54191)

For the pre-trained NAR model, I get:
iter 1 | BLEU = 0.75, 40.4/5.3/0.3/0.0 (BP=1.000, ratio=1.024, hyp_len=55862, ref_len=54575)
iter 2 | BLEU = 0.98, 41.2/4.7/0.3/0.0 (BP=1.000, ratio=1.024, hyp_len=55861, ref_len=54575)
iter 3 | BLEU = 0.95, 42.5/4.6/0.3/0.0 (BP=1.000, ratio=1.024, hyp_len=55862, ref_len=54575)
iter 4 | BLEU = 0.88, 42.9/4.4/0.3/0.0 (BP=1.000, ratio=1.024, hyp_len=55862, ref_len=54575)

Also, I'm assuming the output format is:
BLEU = [weighted BLEU], [BLEU-1]/[BLEU-2]/[BLEU-3]/[BLEU-4] -- is this correct?

Test data for reproducing IWSLT-16 En-De results

I am trying to reproduce the results of IWSLT-16 En-De experiment, using pretrained models.
I got the dataset from the link written in README (https://drive.google.com/file/d/1m7dZqEXHWPYcre6xxsFwFLrb9CRCZGmn/view?usp=sharing)

I want to apply the test data, but the bpe-version of the data is not available in the directory.
(there are valid.en-de.bpe.de, valid.en-de.bpe.en for validation data, but not for test data)

Can you share them?

IWSLT-16 En-De Decoding

Hi, I have been trying to reproduce the validation results of IWSLT-16 En-De experiments using the pre-trained models. However, I'm getting the following results for the AR model:

UserWarning:
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()

iter 1 | BLEU = 0.00, 3.4/0.2/0.0/0.0 (BP=1.000, ratio=2.090, hyp_len=41421, ref_len=19823)

For the AR model, I'm running:

python run.py --dataset iwslt-ende --vocab_size 40000 --ffw_block highway --params small --lr_schedule anneal --batch_size 1024 --mode test --debug --gpu 0 --load_vocab --load_from ~/nlp/dl4mt-nonauto/models/iwslt16/ende/02.05_23.12.ar_voc40k_2048_5_278_507_2_drop_0.1_0.0003_tran_high_

For the NAT model, I'm running:

python run.py --dataset iwslt-ende --vocab_size 40000 --ffw_block highway --params small --lr_schedule anneal --batch_size 128 --fast --valid_repeat_dec 20 --use_argmax --next_dec_input both --mode test --remove_repeats --debug --gpu 0 --load_vocab --trg_len_option predict --use_predicted_trg_len --load_from ~/nlp/dl4mt-nonauto/models/iwslt16/ende/02.08_20.10.ptrn_model_voc40k_2048_5_278_507_2_drop_0.1_drop_len_pred_0.3_0.0003_anne_anneal_steps_250000_high_tr4_2decs__pred_both_copy_argmax_

May I know what are the correct flags to set? Thanks!

How is your WMT16 EN-Ro Dataset Preprocessed?

Thank you for providing us the preprocessed dataset.
Could do please tell me How is your WMT16 EN-Ro Dataset Preprocessed?
From raw 612422 sentence pairs to 608319 sentence pairs?
Also, it seems that the dataset (En-Ro) has been shuffled or reorganized?

Training error (num_gpu argument)

Thank you for sharing the code!

I tried running your model with multiple GPU settings as follows, and I got an error from BucketIterator.
It seems that BucketIterator (from torchtext) does not accept num_gpus argument.
I am using torchtext (version 0.3.1).

python run.py --dataset iwslt-ende --vocab_size 40000 --ffw_block highway --params small --lr_schedule anneal --fast --valid_repeat_dec 8 --use_argmax --next_dec_input both --denoising_prob 0.5 --layerwise_denoising_weight --use_distillation --num_gpus 3
2019-02-20 16:05:59 INFO: - random seed is 19920206
2019-02-20 16:05:59 INFO: - TRAINING CORPUS : /work01/kiyono/dl4mt-nonauto-data/iwslt/en-de/distill/ende/train.tags.en-de.bpe
2019-02-20 16:06:02 INFO: - before pruning : 195897 training examples
2019-02-20 16:06:02 INFO: - after pruning : 195897 training examples
Traceback (most recent call last):
  File "run.py", line 572, in <module>
    num_gpus=args.num_gpus)
TypeError: __init__() got an unexpected keyword argument 'num_gpus'

Do you have any ideas about how to avoid this error?

different batch_size lead to different results

Hi, I have been reproducing your results on IWSLT-16 En-De experiments using the NAT pre-trained models. However, I get different result when I use different batch_size.

When batch_size = 1:

But when batch_size = 1600:

Can you tell me why ?

Need the bpe codes files for applying bpe to a new file.

Could you please share the bpe codes file?
Or could you please tell us how many bpe symbols you learn?

Train loss value computes to zero in every iteration

While training the non-autoregressive model from scratch on CPU , the train loss value computes to zero and hence parameters are not getting updated. I have only removed the use distillation flag. Else everything is same except the dataset.

RuntimeError: each element in list of batch should be of equal size

Hi,
In test mode, on pre-trained non-autoregressive model and with the MS COCO dataset, I got the RuntimeError about the batch size as below, As I should write another Collate function or I have to do something else? Could you please advise me in this regard?

start decoding:   0%|          | 0/200 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/media/deeplab/f6321bd3-2eb4-461a-9abc-d10e94252592/Vahid Chahkandi/dl4mt-nonauto-master/run.py", line 612, in <module> 
   names=["test." + xx for xx in names], maxsteps=None)

  File "/media/deeplab/f6321bd3-2eb4-461a-9abc-d10e94252592/Vahid Chahkandi/dl4mt-nonauto-master/decode.py", line 173, in decode_model  
for iters, dev_batch in enumerate(dev):

  File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, 
in __next__
    data = self._next_data()

  File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 989, 
in _next_data
    return self._process_data(data)

  File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1014,
 in _process_data  data.reraise()
  File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/_utils.py", line 395, 
in reraise  raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):

  File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 185, 
in _worker_loop  data = fetcher.fetch(index)

  File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, 
in fetch
    return self.collate_fn(data)

  File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 84, 
in default_collate   return [default_collate(samples) for samples in transposed]

  File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 84, 
in <listcomp>
    return [default_collate(samples) for samples in transposed]

  File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 82, 
in default_collate    raise RuntimeError('each element in list of batch should be of equal size')
RuntimeError: each element in list of batch should be of equal size

I receive Error for "model.py"

Hi, first of all, thank you for sharing your code.
could you please advise what I have to do when receiving this error when one class called "model.py".

line 265, in
args.block_cls = model.HighwayBlock
NameError: name 'model' is not defined

Thank you in advance.

No event loop integration for 'inline'

In ran my code on my desktop PC and everything was ok, but when I try to run it on gColab the error which said "No such file or directory" appears, whereas I'm pretty sure the directory exists and the path is correct.
in the first line of Error it seems there is a problem with No event loop integration for 'inline' that I don't know what it is, it might the main problem is because of this!!

I appreciate any help.

The Error is:

UnknownBackend: No event loop integration for 'inline'. Supported event loops are: qt, qt4, qt5, gtk, gtk2, gtk3, tk, wx, pyglet, glut, osx
Traceback (most recent call last):
  File "/content/gdrive/My Drive/Colab Notebooks/Retrieving/run.py", line 414, in <module>
    distill=(args.mode == "distill"), use_distillation=args.use_distillation)
  File "/content/gdrive/My Drive/Colab Notebooks/Retrieving/data.py", line 120, in __init__
    self.train_data, self.train_sampler = self.prepare_train_data(path, train_f, batch_size, max_len=max_len, size=None)
  File "/content/gdrive/My Drive/Colab Notebooks/Retrieving/data.py", line 131, in prepare_train_data
    bpes, features_path, bpe2img, img2bpes = process_json(dataPath, annFile, max_len=max_len, size=size)
  File "/content/gdrive/My Drive/Colab Notebooks/Retrieving/mscoco.py", line 18, in process_json
    annots = json.load(open(annPath, 'r'))
FileNotFoundError: [Errno 2] No such file or directory: 'mscoco/mscoco_data/mscoco/karpathy_split/train.json.bpe.fixed'

If you suspect this is an IPython bug, please report it at:
    https://github.com/ipython/ipython/issues
or send an email to the mailing list at [email protected]

General information about distillation

Hi,

First off, thanks for sharing your code. I'm trying to understand how the distillation for an AR model is implemented. From what I understand after looking at the code, the logits from the AR model are saved to disk.
I'm having trouble finding where/how they are loaded back during training. Could you point me to where this is actually done?

Thanks!
Lucas

RuntimeError: Error(s) in loading state_dict for FastTransformer:

size mismatch for encoder.out.weight: copying a param with shape torch.Size([36377, 278]) from checkpoint, the shape in current model is torch.Size([38022, 278]).
size mismatch for decoder.0.out.weight: copying a param with shape torch.Size([36377, 278]) from checkpoint, the shape in current model is torch.Size([38022, 278]).
size mismatch for decoder.1.out.weight: copying a param with shape torch.Size([36377, 278]) from checkpoint, the shape in current model is torch.Size([38022, 278]).

Is the AR model for NMT tasks transformer?

I found the transformer usually get BLEU score around 27-28 for WMT14 EN-DE. However, in the paper, the AR model only gets around 24? I am curious about what is the AR model. Thanks!

nyu-dl / dl4mt-nonauto Goto Github PK

dl4mt-nonauto's People

Contributors

Stargazers

Watchers

Forkers

dl4mt-nonauto's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs