Comments (9)
I don't think it does, but I also haven't run any comparison tests
from opennmt-py.
@vene But with option -extra-shuffle, I guess things will be different.
from opennmt-py.
Anecdotally speaking, I ran an informal comparison and it made almost no difference, since as @vene said my dataset was large enough and the batch size was small enough that the majority of batches had no padding.
from opennmt-py.
The mask is used in: https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/Translator.py#L130
from opennmt-py.
It seems that this apply mask is not used during training.
from opennmt-py.
@magic282 There's no mask in during training in the implementation. I'm not sure whether it would make a huge difference.
from opennmt-py.
I assumed since the sentences are sorted by length, with small enough batches and large enough datasets, training batches will be fully filled out? Now I'm not sure anymore...
from opennmt-py.
Thanks for checking @nelson-liu, that makes sense!
I wonder if skipping the masking really saves a lot of time during training. With -extra-shuffle
it indeed seems like this is a bug, as @magic282 points out. Even with sorted batches, and with a huge number of sentences for each length bin, there will be some unfortunate batches with one sentence of length d+1
and N-1
sentences of length d
, where the code does not correctly reflect the intended model, then.
from opennmt-py.
old thread, if someone is motivated to implement, just reopen.
from opennmt-py.
Related Issues (20)
- bash: scripts/onmt/train.sh: No such file or directory HOT 4
- Cannot load recurrent encoder-decoder model trained with copy attention HOT 7
- Columns and DataType Not Explicitly Set on line 163 of run_mmlu_opennmt.py
- Training fails to start with rotary embedding (Latest OpenNMT-py) HOT 3
- NCCL timeout with 2B+ parameter model HOT 8
- set random seed for a multi-GPU model HOT 1
- Data generation when resuming from a checkpoint HOT 2
- Input size mismatch HOT 1
- Error message of `SequenceTooLongError` HOT 1
- Bug when training encoder-decoder models HOT 1
- Error evaluating LM-prior checkpoint: HOT 1
- Supported SentencePiece parameters HOT 1
- List index out of range in onmt.utils.distributed.all_reduce_and_rescale_tensors:51
- Speech to Text Toy Data Could Not Be Downloaded HOT 3
- Translation API Not Working HOT 1
- How to use Huawei‘s NPU Ascend310 to install OpenNMT-py? HOT 1
- NaN values when training big transformer model HOT 1
- Support for torch 2.2 HOT 5
- Device side assert triggered on AWQ Mistral converted model HOT 2
- (Again, but different) AssertionError: assert model_dim % head_count == 0 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opennmt-py.