This is not an issue, so I apologize for putting it here, but I didn't really know whe

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I also have this question. I noticed in the Huggingface code, <code class="notranslate

How come the networks include <cls>, <eos>, <unk> and other similar tokens?,about facebookresearch/esm

Comments (4)

tomsercu commented on July 26, 2024 2

For now github issues are a good place to ask questions 👍
You're right, there are a number of tokens in the vocab which have no good reason to be there. We use fairseq to train the models and largely stick to their conventions when it comes to vocab. The unusual tokens are completely unseen in training data, so shouldn't be used. But their dummy presence shouldn't hurt either.

from esm.

joshim5 commented on July 26, 2024 1

@tueboesen To clarify further, it's important to follow the conventions if you use these models for downstream tasks. For example, cls/eos need to be appended and prepended to the sequences to get the best performance. Thanks for your interest and let us know if you have any more questions!

from esm.

jiosephlee commented on July 26, 2024

@joshim5 @tomsercu Just to jump in, I have a few quick follow-up questions: "The unusual tokens are completely unseen in training data" does this apply to cls/eos tokens as well? I'd be surprised if CLS tokens improve performance for downstream applications without having seen them. Also, is there a need to manually append/prepend cls/eos tokens? It seems like the hugging face version of the tokenizer is automatically adding these tokens.

"to get the best performance" does this also depend on the fact that the CLS token is used for classifiers? For some context, for other models like BERT or ViTs, I'm seeing arguments for average pooling of the token embeddings rather than the CLS token. I'm curious if there's a recommendation for ESM.

from esm.

gorj-tessella commented on July 26, 2024

I also have this question. I noticed in the Huggingface code, EsmForSequenceClassification uses EsmClassificationHead which use only the encoding at token position 0 which should be <cls>, noting "take <s> token (equiv. to [CLS])". This is obviously different from the "mean_representations" value typically generated by extract.py, which is the average over the used tokens, not including the <cls> and <eos> tokens.

Is there some justification for using the <cls> token embedding vs. the mean sequence token embedding?

from esm.

Recommend Projects

How come the networks include <cls>, <eos>, <unk> and other similar tokens? about esm HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs