GithubHelp home page GithubHelp logo

dl4mt-nonauto's People

Contributors

jaseleephd avatar kyunghyuncho avatar mansimov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dl4mt-nonauto's Issues

Reproducing MSCOCO image captioning results

I'm having trouble reproducing the image captioning results, using the pre-trained models.

For the pre-trained AR model, I get:
iter 1 | BLEU = 4.41, 44.0/12.4/2.3/0.3 (BP=1.000, ratio=1.042, hyp_len=56450, ref_len=54191)

For the pre-trained NAR model, I get:
iter 1 | BLEU = 0.75, 40.4/5.3/0.3/0.0 (BP=1.000, ratio=1.024, hyp_len=55862, ref_len=54575)
iter 2 | BLEU = 0.98, 41.2/4.7/0.3/0.0 (BP=1.000, ratio=1.024, hyp_len=55861, ref_len=54575)
iter 3 | BLEU = 0.95, 42.5/4.6/0.3/0.0 (BP=1.000, ratio=1.024, hyp_len=55862, ref_len=54575)
iter 4 | BLEU = 0.88, 42.9/4.4/0.3/0.0 (BP=1.000, ratio=1.024, hyp_len=55862, ref_len=54575)

Also, I'm assuming the output format is:
BLEU = [weighted BLEU], [BLEU-1]/[BLEU-2]/[BLEU-3]/[BLEU-4] -- is this correct?

IWSLT-16 En-De Decoding

Hi, I have been trying to reproduce the validation results of IWSLT-16 En-De experiments using the pre-trained models. However, I'm getting the following results for the AR model:

UserWarning:
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()

iter 1 | BLEU = 0.00, 3.4/0.2/0.0/0.0 (BP=1.000, ratio=2.090, hyp_len=41421, ref_len=19823)

For the AR model, I'm running:

python run.py --dataset iwslt-ende --vocab_size 40000 --ffw_block highway --params small --lr_schedule anneal --batch_size 1024 --mode test --debug --gpu 0 --load_vocab --load_from ~/nlp/dl4mt-nonauto/models/iwslt16/ende/02.05_23.12.ar_voc40k_2048_5_278_507_2_drop_0.1_0.0003_tran_high_

For the NAT model, I'm running:

python run.py --dataset iwslt-ende --vocab_size 40000 --ffw_block highway --params small --lr_schedule anneal --batch_size 128 --fast --valid_repeat_dec 20 --use_argmax --next_dec_input both --mode test --remove_repeats --debug --gpu 0 --load_vocab --trg_len_option predict --use_predicted_trg_len --load_from ~/nlp/dl4mt-nonauto/models/iwslt16/ende/02.08_20.10.ptrn_model_voc40k_2048_5_278_507_2_drop_0.1_drop_len_pred_0.3_0.0003_anne_anneal_steps_250000_high_tr4_2decs__pred_both_copy_argmax_

May I know what are the correct flags to set? Thanks!

How is your WMT16 EN-Ro Dataset Preprocessed?

Thank you for providing us the preprocessed dataset.
Could do please tell me How is your WMT16 EN-Ro Dataset Preprocessed?
From raw 612422 sentence pairs to 608319 sentence pairs?
Also, it seems that the dataset (En-Ro) has been shuffled or reorganized?

Training error (num_gpu argument)

Thank you for sharing the code!

I tried running your model with multiple GPU settings as follows, and I got an error from BucketIterator.
It seems that BucketIterator (from torchtext) does not accept num_gpus argument.
I am using torchtext (version 0.3.1).

python run.py --dataset iwslt-ende --vocab_size 40000 --ffw_block highway --params small --lr_schedule anneal --fast --valid_repeat_dec 8 --use_argmax --next_dec_input both --denoising_prob 0.5 --layerwise_denoising_weight --use_distillation --num_gpus 3
2019-02-20 16:05:59 INFO: - random seed is 19920206
2019-02-20 16:05:59 INFO: - TRAINING CORPUS : /work01/kiyono/dl4mt-nonauto-data/iwslt/en-de/distill/ende/train.tags.en-de.bpe
2019-02-20 16:06:02 INFO: - before pruning : 195897 training examples
2019-02-20 16:06:02 INFO: - after pruning : 195897 training examples
Traceback (most recent call last):
  File "run.py", line 572, in <module>
    num_gpus=args.num_gpus)
TypeError: __init__() got an unexpected keyword argument 'num_gpus'

Do you have any ideas about how to avoid this error?

different batch_size lead to different results

Hi, I have been reproducing your results on IWSLT-16 En-De experiments using the NAT pre-trained models. However, I get different result when I use different batch_size.

  • When batch_size = 1:

微信截图_20191011163447

  • But when batch_size = 1600:

微信截图_20191011162118

Can you tell me why ?

Train loss value computes to zero in every iteration

While training the non-autoregressive model from scratch on CPU , the train loss value computes to zero and hence parameters are not getting updated. I have only removed the use distillation flag. Else everything is same except the dataset.

RuntimeError: each element in list of batch should be of equal size

Hi,
In test mode, on pre-trained non-autoregressive model and with the MS COCO dataset, I got the RuntimeError about the batch size as below, As I should write another Collate function or I have to do something else? Could you please advise me in this regard?

start decoding:   0%|          | 0/200 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/media/deeplab/f6321bd3-2eb4-461a-9abc-d10e94252592/Vahid Chahkandi/dl4mt-nonauto-master/run.py", line 612, in <module> 
   names=["test." + xx for xx in names], maxsteps=None)

  File "/media/deeplab/f6321bd3-2eb4-461a-9abc-d10e94252592/Vahid Chahkandi/dl4mt-nonauto-master/decode.py", line 173, in decode_model  
for iters, dev_batch in enumerate(dev):

  File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, 
in __next__
    data = self._next_data()

  File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 989, 
in _next_data
    return self._process_data(data)

  File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1014,
 in _process_data  data.reraise()
  File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/_utils.py", line 395, 
in reraise  raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):

  File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 185, 
in _worker_loop  data = fetcher.fetch(index)

  File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, 
in fetch
    return self.collate_fn(data)

  File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 84, 
in default_collate   return [default_collate(samples) for samples in transposed]

  File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 84, 
in <listcomp>
    return [default_collate(samples) for samples in transposed]

  File "/home/deeplab/dl4mt-nonauto-master/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 82, 
in default_collate    raise RuntimeError('each element in list of batch should be of equal size')
RuntimeError: each element in list of batch should be of equal size

I receive Error for "model.py"

Hi, first of all, thank you for sharing your code.
could you please advise what I have to do when receiving this error when one class called "model.py".

line 265, in
args.block_cls = model.HighwayBlock
NameError: name 'model' is not defined

Thank you in advance.

No event loop integration for 'inline'

In ran my code on my desktop PC and everything was ok, but when I try to run it on gColab the error which said "No such file or directory" appears, whereas I'm pretty sure the directory exists and the path is correct.
in the first line of Error it seems there is a problem with No event loop integration for 'inline' that I don't know what it is, it might the main problem is because of this!!

I appreciate any help.

The Error is:

UnknownBackend: No event loop integration for 'inline'. Supported event loops are: qt, qt4, qt5, gtk, gtk2, gtk3, tk, wx, pyglet, glut, osx
Traceback (most recent call last):
  File "/content/gdrive/My Drive/Colab Notebooks/Retrieving/run.py", line 414, in <module>
    distill=(args.mode == "distill"), use_distillation=args.use_distillation)
  File "/content/gdrive/My Drive/Colab Notebooks/Retrieving/data.py", line 120, in __init__
    self.train_data, self.train_sampler = self.prepare_train_data(path, train_f, batch_size, max_len=max_len, size=None)
  File "/content/gdrive/My Drive/Colab Notebooks/Retrieving/data.py", line 131, in prepare_train_data
    bpes, features_path, bpe2img, img2bpes = process_json(dataPath, annFile, max_len=max_len, size=size)
  File "/content/gdrive/My Drive/Colab Notebooks/Retrieving/mscoco.py", line 18, in process_json
    annots = json.load(open(annPath, 'r'))
FileNotFoundError: [Errno 2] No such file or directory: 'mscoco/mscoco_data/mscoco/karpathy_split/train.json.bpe.fixed'

If you suspect this is an IPython bug, please report it at:
    https://github.com/ipython/ipython/issues
or send an email to the mailing list at [email protected]

General information about distillation

Hi,

First off, thanks for sharing your code. I'm trying to understand how the distillation for an AR model is implemented. From what I understand after looking at the code, the logits from the AR model are saved to disk.
I'm having trouble finding where/how they are loaded back during training. Could you point me to where this is actually done?

Thanks!
Lucas

RuntimeError: Error(s) in loading state_dict for FastTransformer:

size mismatch for encoder.out.weight: copying a param with shape torch.Size([36377, 278]) from checkpoint, the shape in current model is torch.Size([38022, 278]).
size mismatch for decoder.0.out.weight: copying a param with shape torch.Size([36377, 278]) from checkpoint, the shape in current model is torch.Size([38022, 278]).
size mismatch for decoder.1.out.weight: copying a param with shape torch.Size([36377, 278]) from checkpoint, the shape in current model is torch.Size([38022, 278]).

Is the AR model for NMT tasks transformer?

I found the transformer usually get BLEU score around 27-28 for WMT14 EN-DE. However, in the paper, the AR model only gets around 24? I am curious about what is the AR model. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.