Hello, I am trying to adapt the idea of learning on triplets for the classification ta

Help with seminar on conversation about nlp_course HOT 2 OPEN

maiiabocharova commented on July 23, 2024

Help with seminar on conversation

from nlp_course.

Comments (2)

justheuristic commented on July 23, 2024

Hi!

Disclaimer: i'm not very well versed with metric learning tasks. An expert's opinion should be preferable to mine.

(0) Not all BERT-like models are created equal :) There is a particular subtype that is aimed at embedding whole sentences -- no guarantees, but it might be worth trying. Here's a lib that has a bunch of them: https://github.com/UKPLab/sentence-transformers
(1) dim=16 seems too small based on my (limited) experience. The actual dimension should depend on the number of classes, but last time I was working on a similar architecture for retrieval, the optimal dimensions were in 128-1024 range
(2) If the dataset is large enough (so, tens of thousands, rather than hundreds), it is usually beneficial to also fine-tune BERT layers with a small learning rate. In that case, you can worry less about the architecture

from nlp_course.

maiiabocharova commented on July 23, 2024

Hi!

Disclaimer: i'm not very well versed with metric learning tasks. An expert's opinion should be preferable to mine.

(0) Not all BERT-like models are created equal :) There is a particular subtype that is aimed at embedding whole sentences -- no guarantees, but it might be worth trying. Here's a lib that has a bunch of them: https://github.com/UKPLab/sentence-transformers (1) dim=16 seems too small based on my (limited) experience. The actual dimension should depend on the number of classes, but last time I was working on a similar architecture for retrieval, the optimal dimensions were in 128-1024 range (2) If the dataset is large enough (so, tens of thousands, rather than hundreds), it is usually beneficial to also fine-tune BERT layers with a small learning rate. In that case, you can worry less about the architecture

Thanks a lot for answering! I used sentence-transformers. Also had the idea that it would be better than using just Bert. For starters I used miniLM-v6 as it was quick to train.

I have 140k datapoints in datasets. Number of classes is 6, but 80% of the datapoints belong to 1 class — class "others" and to most under-represented class belong only 0.5% of datapoints. The thing is that after training the embeddings I need to somehow use them to later classify new sentences. And here is where I am stuck. I thought about using SVM but it will not work with such high dimensional datapoints. I am now thinking about using a fully connected network with a couple of dense layers in order to classify datapoints according to embeddings. But that would once again leave me with a problem of very imbalanced dataset.

from nlp_course.

Help with seminar on conversation about nlp_course HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs