Comments (11)
Hi @aarzchan
Can you write here the script that you are running to reproduce image captioning results? Likely, some flag is missing/incorrectly set. Thanks!
And yes the output format is correct.
from dl4mt-nonauto.
For the AR model, I'm running:
python run.py --dataset mscoco --params big --load_vocab --mode test --n_layers 4 --ffw_block highway --debug --load_from mscoco_models_final/ar_model --batch_size 1024
For the NAR model, I'm running:
python run.py --dataset mscoco --params big --use_argmax --load_vocab --mode test --n_layers 4 --fast --ffw_block highway --debug --trg_len_option predict --use_predicted_trg_len --load_from mscoco_models_final/nar_model --batch_size 1024
from dl4mt-nonauto.
I just downloaded pretrained mscoco models and data from main branch and ran the scripts that you have written here and I can reproduce previous results:
Here is the output for Autoregressive model
Here is the output for Non-Autoregressive model
The flags and the script you tried are correct and I think there is likely an issue with the data.
Have you set correct paths to data here (lines 44 - 56) in your directory https://github.com/nyu-dl/dl4mt-nonauto/blob/master/run.py#L418 ? Also you didn't recreate vocab for MSCOCO, and used the vocab.pkl
file that was already provided ?
Also for mscoco models the --vocab_size 10000
but it shouldn't affect it since it force you to load the already created vocabulary https://github.com/nyu-dl/dl4mt-nonauto/blob/master/run.py#L418
Let me know how I can help!
from dl4mt-nonauto.
Thanks for looking into this for me!
Yes, I did change the paths in lines 44 - 56 of data.py
to my own dataset path. Otherwise, there would be a runtime error at that part.
I didn't make any modifications to the MSCOCO dataset. All I did was download it and unzip it.
Also, I forgot to mention that, when I initially ran the script, I got the following error in line 46 of model.py
:
RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #2 'mat2'
I fixed this by adding channels = channels.float()
right before this line. Not sure if this affects the results at all.
from dl4mt-nonauto.
Yes the change you made on line 46 affects the results for some reason and makes performance much worse. I tried it myself after upgrading to Pytorch 1.0 and seeing the results you have attached at the beginning of this github thread.
Would you mind temporarily running the code in the environment with Pytorch 0.3 or 0.4 while I try to figure out why this change causes problems ?
from dl4mt-nonauto.
Oh, I see. Actually, I was using PyTorch 0.4.1 to run the code from the multigpu branch.
For PyTorch 0.3.1 on the multigpu branch, I get this PyTorch-version-related error:
Traceback (most recent call last):
File "run.py", line 667, in <module>
names=["test."+xx for xx in names], maxsteps=None)
File "/home/aarchan/dl4mt-nonauto_multigpu/decode.py", line 207, in decode_model
with torch.no_grad():
AttributeError: module 'torch' has no attribute 'no_grad'
However, for PyTorch 0.3.1 for the main branch, I'm able to reproduce the same results you attached earlier:
AR model:
iter 1 | BLEU = 23.47, 68.3/33.2/16.3/8.2
NAR model:
iter 1 | BLEU = 20.12, 66.9/29.6/13.2/6.3
iter 2 | BLEU = 20.87, 67.2/30.4/13.9/6.7
iter 3 | BLEU = 21.04, 67.2/30.6/14.0/6.8
iter 4 | BLEU = 21.12, 67.2/30.6/14.1/6.9
I guess the MSCOCO experiments were not updated in the multigpu branch?
from dl4mt-nonauto.
Good to hear that you managed to reproduce it.
Yes I haven't looked that into detail on running multigpu branch on MSCOCO dataset. Mainly used multigpu branch for wmt14 en-de experiments
I was using Pytorch 0.4.0 on master branch to reproduce MSCOCO experiments.
I will keep you updated once this issue is figure out
Thanks
from dl4mt-nonauto.
Got it. Thanks again for your help!
from dl4mt-nonauto.
I was looking at the MSCOCO image captioning results in some other image captioning papers, and I noticed your AR model's BLEU-4 score (8.2) is much lower than other papers' models (most recent papers report 30+ BLEU-4). For example, an older paper, Karpathy & Fei-Fei, 2015, reports BLEU-4 scores of 23.0 and 10.0 for their model and a nearest neighbor baseline, respectively.
I understand that the purpose of your experiment was to show that NAR is much faster than AR while getting similar performance, but I was wondering why there is such a large performance gap between your AR baseline and other AR models. Please let me know if I'm overlooking something here. Thanks!
from dl4mt-nonauto.
BLEU-4 score reported in paper by Karpathy & Fei-Fei and Xu et al http://proceedings.mlr.press/v37/xuc15.pdf is effectively final BLEU score that we report in our paper. See the footnote in Xu et al paper (BLEU-n is the geometric average of the n-gram precision.
For instance, BLEU-1 is the unigram precision, and BLEU-2 is
the geometric average of the unigram and bigram precision). In our case BLEU-4 is not geometric average and only a 4-gram precision that is why our BLEU-4 and theirs BLEU-4 is different.
Also in paper by Xu et al. they say that "we report BLEU4 from 1 to 4 without a brevity penalty" whereas we use brevity penalty.
from dl4mt-nonauto.
Okay, I see. Thanks for the clarification!
from dl4mt-nonauto.
Related Issues (14)
- Train loss value computes to zero in every iteration HOT 1
- How is your WMT16 EN-Ro Dataset Preprocessed? HOT 1
- I receive Error for "model.py" HOT 2
- No event loop integration for 'inline'
- RuntimeError: each element in list of batch should be of equal size
- Need the bpe codes files for applying bpe to a new file. HOT 2
- General information about distillation HOT 11
- Training error (num_gpu argument) HOT 8
- Test data for reproducing IWSLT-16 En-De results HOT 2
- RuntimeError: Error(s) in loading state_dict for FastTransformer: HOT 16
- Is the AR model for NMT tasks transformer? HOT 3
- IWSLT-16 En-De Decoding HOT 1
- different batch_size lead to different results HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dl4mt-nonauto.