GithubHelp home page GithubHelp logo

Comments (16)

mohengzxr avatar mohengzxr commented on July 20, 2024

Decoding from NA Models why? --load_vocab,Why does it appear size mismatch for encoder.out.weight: copying a param with shape torch.Size([36377, 278]) from checkpoint, the shape in current model is torch.Size([38022, 278]).???

from dl4mt-nonauto.

mohengzxr avatar mohengzxr commented on July 20, 2024

@mansimov

from dl4mt-nonauto.

mansimov avatar mansimov commented on July 20, 2024

Hm it is a bit hard for me to figure out why this happens.
Haven't used this codebase for about a year or so...

from dl4mt-nonauto.

mohengzxr avatar mohengzxr commented on July 20, 2024

--lr_schedule anneal and transformer different?
@

from dl4mt-nonauto.

mohengzxr avatar mohengzxr commented on July 20, 2024

@mansimov

from dl4mt-nonauto.

mansimov avatar mansimov commented on July 20, 2024

--lr_schedule anneal just anneals the learning rate starting from certain value
--lr_schedule transformer first warmups up the learning rate and then anneals it down. by warming up I mean it increases the learning rate from very low value to certain value like 3e-4 for first couple of training iterations. see original transformer paper for reference

from dl4mt-nonauto.

mohengzxr avatar mohengzxr commented on July 20, 2024

anneal and transformer ?Whether it affects Bleu?
@mansimov

from dl4mt-nonauto.

mohengzxr avatar mohengzxr commented on July 20, 2024

Why is the Bleu produced by choosing ananne and transformer different?
What is their impact on bleu?
Thank you
@mansimov

from dl4mt-nonauto.

mansimov avatar mansimov commented on July 20, 2024

Anneal works for well for IWSLT, transformer works well for WMT.
In general I advice to use transformer schedule.
Don't remember how exact impact of choosing learning rate schedules is on bleu

from dl4mt-nonauto.

mohengzxr avatar mohengzxr commented on July 20, 2024

In the experiment, we chose the Uyghur-Chinese corpus.
When using ananne and transformer, the Bleu of the two is very different.
I don't understand very much.

from dl4mt-nonauto.

mohengzxr avatar mohengzxr commented on July 20, 2024

I am very interested in this paper, I want to have a little research on your basis.

from dl4mt-nonauto.

mohengzxr avatar mohengzxr commented on July 20, 2024

I really hope to get your help.

from dl4mt-nonauto.

mansimov avatar mansimov commented on July 20, 2024

I am glad that you are interested :)

It is very hard to say why results are different without diving into details. But it is unsurprising to me that with Transformers you need to be careful with learning rate schedule depending on number of GPUs you have, model size and dataset size!

Good luck!

from dl4mt-nonauto.

mohengzxr avatar mohengzxr commented on July 20, 2024

@jasonleeinf @mansimov @kyunghyuncho
Hi
In your paper,
Equation 2: t=1,l=0,l-1=-1 is wrong;
Excuse me, how did you explain it? I hope to help you.

image

from dl4mt-nonauto.

mohengzxr avatar mohengzxr commented on July 20, 2024

image

from dl4mt-nonauto.

mansimov avatar mansimov commented on July 20, 2024

Hi @mohengzxr

Thanks for pointing out this mistake in the paper. l=1 instead of l=0 in the first sum.
Best, Elman

from dl4mt-nonauto.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.