Comments (6)
@emergenz @younesbelkada @ArthurZucker Outside of the debate of whether it's confusing or not, we're somewhat limited here on renaming. It's possible to rename the class itself (it's not importable from the __init__
and this shouldn't affect existing copies, other than creating a divergence). However, we can't rename the layer attribute in the other layers i.e. self.mlp
, as this would change the state dict. As this would then create a mismatch i.e. self.mlp = LlamaGLU(...)
I think we should leave as is (possibly with a clarifying comment somewhere in the code).
from transformers.
Oh, unfortunate but I understand. Thanks for clarification!
from transformers.
Hi @emergenz
Technically you are right, those classes correspond to Gated Linear Units. We could rename them but we need to be careful with public classes as these standalone classes are used by many codebases, e.g.: here to be renamed in order to not affect these - this might introduce some complexity in the modeling codebase so I will let the core maintainers decide on this @amyeroberts @ArthurZucker
from transformers.
I don't find it misleading at all TBH
from transformers.
I don't find it misleading at all TBH
@ArthurZucker Can you elaborate? GLUs are not MLPs. In fact, I was looking at the modeling_qwen2.py
code when I saw hidden_states = self.mlp(hidden_states)
in the DecoderLayer
forward pass. I was extremely confused as to why the Qwen models are using MLPs instead of GLUs and therefore checked their technical report, where I realized that they indeed do not use MLPs. Only then did I realize that the Qwen2MLP
class indeed represents a GLU.
I would argue that this is highly misleading naming.
from transformers.
Agreed with the statement above !
from transformers.
Related Issues (20)
- transformers unable to run whisper on MPS from version 4.40.0 onwards HOT 1
- Dropout sync across GPUs causes major performance drops HOT 10
- When tranining the RWKV, it report "backward error" HOT 5
- Control flow issue with symbolic_trace when using inputs_embeds in LlamaForCausalLM HOT 4
- Question about LlavaProcessor HOT 2
- Segmentation fault python3 when attempting T5ForConditionalGeneration.from_pretrained("t5-small") HOT 5
- use_cache=False makes a huge difference in Paligemma HOT 2
- Minimum required accelerate library is not compatible HOT 2
- GroundingDino - Loss calculation exceptions HOT 3
- `stop_strings` Argument in `model.generate()` Results in Exception if Generation Completes Without `stop_string` Being Generated HOT 6
- GemmaTokenizerFast word_ids() returns only zeros HOT 1
- Tokenizers: Character encoding inconsistencies between __call__ and .convert_tokens_to_ids HOT 1
- Memory leak when using CLIPTextModel HOT 2
- Add `StatefulDataLoader` support HOT 7
- How to build and evaluate a vanilla transformer? HOT 1
- Can't create transformer pipeline because pytorch failed to be detected HOT 8
- Trainer having issues with DataLoaderShard when running with torchrun
- cannot import name 'AutoModelForImageToImage' from 'transformers.models.auto.modeling_auto' (/opt/conda/lib/python3.10/site-packages/transformers/models/auto/modeling_auto.py) HOT 1
- linear_sum_assignment error in the object_detection.py guide HOT 2
- A parameter in TrainingArguments: sample_output=True HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.