Comments (6)
I can open a PR :)
But not sure about setting padding_idx
to None, it should be set by config.pad_token_id
as done in e.g. LLaMA
(Obviously you decide for BC but why maintaining bugs?)
from transformers.
Nice catch @PaulLerner ! Sounds indeed like a bug, would you like to open a PR for the fix ? I think we should add a warning if padding_idx
is not None (and init it to None by default on the config) to ensure BC and educate users about the consequences - what do you think?
from transformers.
Good point yes ! Sounds good to me ! Let me know when you open the PR 🙏
from transformers.
Found the same problem in
and more GPT-based models (it's OK for position embeddings, just showing the output of grep):
src/transformers/models/blip/modeling_blip.py: self.token_embedding = nn.Embedding(config.vocab_size, embed_dim)
src/transformers/models/clip/modeling_clip.py: self.token_embedding = nn.Embedding(config.vocab_size, embed_dim)
src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py: self.wte = nn.Embedding(config.vocab_size, self.embed_dim)
src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py: self.wpe = nn.Embedding(config.max_position_embeddings, self.embed_dim)
src/transformers/models/gptj/modeling_gptj.py: self.wte = nn.Embedding(config.vocab_size, self.embed_dim)
src/transformers/models/gpt_neo/modeling_gpt_neo.py: self.wte = nn.Embedding(config.vocab_size, self.embed_dim)
src/transformers/models/gpt_neo/modeling_gpt_neo.py: self.wpe = nn.Embedding(config.max_position_embeddings, self.embed_dim)
src/transformers/models/gpt_neox_japanese/modeling_gpt_neox_japanese.py: self.embed_in = nn.Embedding(config.vocab_size, config.hidden_size)
src/transformers/models/gpt_neox/modeling_gpt_neox.py: self.embed_in = nn.Embedding(config.vocab_size, config.hidden_size)
src/transformers/models/gptsan_japanese/modeling_gptsan_japanese.py: self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.d_model)
src/transformers/models/gptsan_japanese/modeling_gptsan_japanese.py: self.embed_tokens = nn.Embedding(config.vocab_size, config.d_model)
src/transformers/models/gptsan_japanese/modeling_gptsan_japanese.py: self.extra_position_embeddings = nn.Embedding(config.max_position_embeddings, config.d_model)
Do you want me to correct them as well? (I only looked at the models I know that matched the following regex there may be more)
grep "\.Embedding(" src/transformers/models/*/*py
from transformers.
Hey, so, after some thought, I'm not sure it makes sense to correct this.
It seems like BLOOM was not trained by specifying padding_idx
:
In [10]: model.transformer.word_embeddings.weight[model.config.pad_token_id]
# would be only 0 if padding_idx had been specified, see https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html
Out[10]:
tensor([ 0.0028, -0.0032, 0.0008, ..., -0.0020, -0.0012, -0.0015],
grad_fn=<SelectBackward0>)
Si fixing it would mean that, when instantiating a new BLOOM model, the padding embedding will be correctly initialized but then it will be overwritten by pre-trained BLOOM anyway.
Also, note that this will actually not affect the loss if the padding tokens are properly masked (btw, the BloomForCausalLM doc stating that it's enough to pass labels = input_ids
is a bit underspecified as it actually expects pad labels to be -100
)
Your choice :)
from transformers.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
from transformers.
Related Issues (20)
- Unable to run FLAN-T5 inference on GCP TPU v3 (TF 2.16.1) HOT 8
- Using context length of 60, trying to predict next 7 days Close price. Error is : Lags cannot go further than history length HOT 8
- Chatdoctor
- DeformableDETR two stage not support bfloat16
- ModuleNotFoundError: No module named 'torch.distributed.checkpoint.format_utils' HOT 4
- Training hangs at the first gradient syncing of an MoE model while using deepspeed
- Title: CUDA RuntimeError: Unspecified Launch Failure during Training HOT 4
- RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16 HOT 6
- Add post_process_depth_estimation to image processors HOT 1
- [BUG] Offline loading of non-safe tensors fails HOT 3
- `center_crop` outputs wrong sized array if provided with odd-numbered dimensions smaller than requested crop size HOT 1
- LLama3-70b LoRa results in OOM with torchrun but succeeds with python3 command HOT 2
- Sink Cache Attention Scores are strange. CausalMask seems not working. HOT 2
- Libraries import missing, unable to load image for inference and not able to load pipeline with the trained model HOT 4
- CLIPTokenizerFast cause memory leak HOT 1
- VisEncoderDecoderModel generate text incomplete when predict image with long text label HOT 1
- Trained tokenizer has broken encoding for cyrillic HOT 3
- Running out of memory while finetuning and inferencing VideoMAE due to which script is being killed. HOT 5
- Trainer memory leak for evaluation with `compute_metrics`
- Llama Model throwing "RuntimeError: expected scalar type BFloat16 but found Float" when using torch.compile and AMP together HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.