Comments (16)
Decoding from NA Models why? --load_vocab,Why does it appear size mismatch for encoder.out.weight: copying a param with shape torch.Size([36377, 278]) from checkpoint, the shape in current model is torch.Size([38022, 278]).???
from dl4mt-nonauto.
from dl4mt-nonauto.
Hm it is a bit hard for me to figure out why this happens.
Haven't used this codebase for about a year or so...
from dl4mt-nonauto.
--lr_schedule anneal and transformer different?
@
from dl4mt-nonauto.
from dl4mt-nonauto.
--lr_schedule anneal just anneals the learning rate starting from certain value
--lr_schedule transformer first warmups up the learning rate and then anneals it down. by warming up I mean it increases the learning rate from very low value to certain value like 3e-4 for first couple of training iterations. see original transformer paper for reference
from dl4mt-nonauto.
anneal and transformer ?Whether it affects Bleu?
@mansimov
from dl4mt-nonauto.
Why is the Bleu produced by choosing ananne and transformer different?
What is their impact on bleu?
Thank you
@mansimov
from dl4mt-nonauto.
Anneal works for well for IWSLT, transformer works well for WMT.
In general I advice to use transformer schedule.
Don't remember how exact impact of choosing learning rate schedules is on bleu
from dl4mt-nonauto.
In the experiment, we chose the Uyghur-Chinese corpus.
When using ananne and transformer, the Bleu of the two is very different.
I don't understand very much.
from dl4mt-nonauto.
I am very interested in this paper, I want to have a little research on your basis.
from dl4mt-nonauto.
I really hope to get your help.
from dl4mt-nonauto.
I am glad that you are interested :)
It is very hard to say why results are different without diving into details. But it is unsurprising to me that with Transformers you need to be careful with learning rate schedule depending on number of GPUs you have, model size and dataset size!
Good luck!
from dl4mt-nonauto.
@jasonleeinf @mansimov @kyunghyuncho
Hi
In your paper,
Equation 2: t=1,l=0,l-1=-1 is wrong;
Excuse me, how did you explain it? I hope to help you.
from dl4mt-nonauto.
from dl4mt-nonauto.
Hi @mohengzxr
Thanks for pointing out this mistake in the paper. l=1 instead of l=0 in the first sum.
Best, Elman
from dl4mt-nonauto.
Related Issues (14)
- Train loss value computes to zero in every iteration HOT 1
- General information about distillation HOT 11
- Training error (num_gpu argument) HOT 8
- Reproducing MSCOCO image captioning results HOT 11
- Test data for reproducing IWSLT-16 En-De results HOT 2
- Is the AR model for NMT tasks transformer? HOT 3
- IWSLT-16 En-De Decoding HOT 1
- different batch_size lead to different results HOT 6
- How is your WMT16 EN-Ro Dataset Preprocessed? HOT 1
- I receive Error for "model.py" HOT 2
- No event loop integration for 'inline'
- RuntimeError: each element in list of batch should be of equal size
- Need the bpe codes files for applying bpe to a new file. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dl4mt-nonauto.