Hi: Good evening! I read from your paper that Unic

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

There is the link: <a href="https://github.com/huggingface/transformers/blob/master/sr

decoder parameter initialization and dictionary during define-tuning about unicoder HOT 13 CLOSED

microsoft commented on May 14, 2024

decoder parameter initialization and dictionary during define-tuning

from unicoder.

Comments (13)

Eureka6174 commented on May 14, 2024

Sorry for the late reply.

Their are output layers both in pre-training and fine-tuning. But they are different. And for different fine-tuning tasks, the simplest way is to use different output layers. For now, we just used linear layers as output layers. It's still open to try for other options.

For the BPE problem, we only produce one verb for the first subword "materi@@". We choose this option just to follow BERT. We didn't try other options. If you have better choice, could you also tell us?

Please feel free to contact us if this doesn't fully answer your question.

from unicoder.

thomas-happify commented on May 14, 2024

@Eureka6174
Hi there!

So for the XLM-R text generation, what exactly is the decoder?
Is it also a simple linear layer instead of initializing another XLM-R as decoder?

from unicoder.

Eureka6174 commented on May 14, 2024

It's transformers layers in decoder. Masked self-attention layers are initialized from XLM-R and attention from decoder to encoder is random initialized.

from unicoder.

thomas-happify commented on May 14, 2024

@Eureka6174
Thanks! that make sense.

BTW Is the pre-training code in this repo as well? I'm only seeing generation_from_pretrained_xlmr.py but I don't think that's the one right? I'm really interested how you further pretrained the model with decoder.

Thanks!

from unicoder.

ever4244 commented on May 14, 2024

Thanks for the answer. I just wonder would it be better if we choose the last subword instead of the first subword as the position of POS-tag.

Another Question:

Q1:
I notice that there is a model_type flag provided during testing to indicate what pre-trained model is used.

If I trained my own model using fairseq-train (for example a classic NMT transformer model)
How do I use it in the POS-tag and NER testing task?
It would have different dimensions and layers comparing to the pre-trained XLM and BERT models you provided.

Can I just declare it as an XLM and run the NER testing codes (since they share the same encoder)?

Q2: Any example and code for pre-training from scratch?
I currently just trying to use the multilingual translation script in the generation/example to pre-train a model, but there are many different tasks for model pre-training in your paper. I understand that you fine-tune the existing XLM-R models using language modeling. I just wonder if there is a pre-training example that trains the model from scratch so that I can change the size and dimension with more flexibility.

Q3:
I see that in the "generation" folder the unicoder X_dae model is fairseq based.
In the "understanding" folder, the pre-trained model is huggings transformer-based.
So can I used them interchangeably? for example, if I trained/fine-tuned a model in the generation folder with fairseq, can I move it to the understanding folder and testing the model there?
It seems to me that the fairseq trained model is xxx.pt while the huggings transformer-based model is saved as pytorch_model.bin and config.json. So I am puzzled about how to use one encoder for generation and understanding tasks.

Thank for very much!

from unicoder.

Eureka6174 commented on May 14, 2024

We tried the last token but it got similar results as first token.
Q1: You could use the code from HuggingFace to transform it to HuggingFace format and run POS Tagging and NER.
Q2: Our pre-training scripts is not ready to release for now.
Q3: You need to transform the model with HuggingFace code.

Thanks!

from unicoder.

ever4244 commented on May 14, 2024

We tried the last token but it got similar results as first token.
Q1: You could use the code from HuggingFace to transform it to HuggingFace format and run POS Tagging and NER.
Q2: Our pre-training scripts is not ready to release for now.
Q3: You need to transform the model with HuggingFace code.

Thanks!

Thank you for the timely response!
Can you elaborate Q1 or give me a link on "use the code from HuggingFace to transform it to HuggingFace format"
Are you referring to this one?
https://github.com/stas00/porting/tree/master/transformers/fairseq-wmt19
I am not sure whether this is only for the standard model or can it convert the model from a different structure. My model may have a different size, dimension, or even attention connection occasionally, can I use it to convert from xxx.pt to pytorch_model.bin?

BTW:
I trained a model using fairseq-train, but I found that in the Generation folder your pre-trained model contains a "sentencepiece.bpe.model". I don't have this file when compiling data into bpe, I just got "check.pt" and "dict.txt", I want to ask in which process do you get sentencepiece.bpe.model?

Thank you very much!

from unicoder.

Eureka6174 commented on May 14, 2024

There is the link: https://github.com/huggingface/transformers/blob/master/src/transformers/models/roberta/convert_roberta_original_pytorch_checkpoint_to_pytorch.py

I think you could read the document of HuggingFace and Transformers at first.

Thanks,
Yaobo

from unicoder.

ever4244 commented on May 14, 2024

Thanks for the link, but I have modified my question so I restate it a bit.

Q1:
I also find a link:
https://github.com/stas00/porting/tree/master/transformers/fairseq-wmt19
But both your link and my link seem to be the conversions of a standard model structure. What if have a different model structure?
My model may have a different size, dimension, or even attention connection occasionally, can I use it to convert from xxx.pt to pytorch_model.bin?
As my current pre-trained model is a transformer NMT model 6 encoder layer and 1 decoder layer, Can I use the roberta converter when bert and NMT share a similar encoder but different decoder? I suppose there is no universal converter of various structures between fairseq and huggings?

I am sorry I have been sing fairseq a lot and new to huggings. I will read more of its documents.

from unicoder.

ever4244 commented on May 14, 2024

I trained a model using fairseq-train without spm, but I found that I need a "sentencepiece.bpe.model" for later task.

I used a script similar to fairseq translation example: prepare-wmt14en2de.sh, it does not generate a sentence piece model. And I prepare the data and train the old model with it.

https://github.com/pytorch/fairseq/tree/master/examples/translation

while the one that has a sentence piece model is: prepare-iwslt17-multilingual.sh
python "$SPM_TRAIN"
--input=$TRAIN_FILES
--model_prefix=$DATA/sentencepiece.bpe
--vocab_size=$BPESIZE
--character_coverage=1.0
--model_type=bpe.

I am currently want to re-learn the sentencepiece.bpe.model on the training data for prepare-wmt14en2de.sh.

Since I already trained the model without the sentencepiece.bpe.model, I just want to make sure I can get the exact same training data when I reapply the spm learning script on the old data. So that my previously trained model from prepare-wmt14en2de.sh can be coupled with the newly learnt sentencepiece.bpe.model

However,
In prepare-wmt14en2de.sh it use: fastBPE's learn_bpe.py,
https://github.com/glample/fastBPE
in prepare-iwslt17-multilingual.sh it use: sentencepiece's spm_train.py
https://github.com/google/sentencepiece

They use different codes for bpe learning and encoding. They even use different BPE replacement tokens (@@ for fastBPE and _ for sentencepiece ). So how can create a sentencepiece.bpe.model that can be used together with my old model from fastBPE? (That is a sentencepiece.bpe.model which will generate the exact same training data as fastBPE)

Thank you very much!

from unicoder.

Eureka6174 commented on May 14, 2024

I think your question is more about Fairseq and Huggingface, they are out of my knowledge. My model doesn't have different structure and different sentencepiece. Maybe you should raise an issue in their github repo.

from unicoder.

ever4244 commented on May 14, 2024

I think your question is more about Fairseq and Huggingface, they are out of my knowledge. My model doesn't have different structure and different sentencepiece. Maybe you should raise an issue in their github repo.

Thanks.
I have some questions on preprocessing as well.

I want my pre-training model replication to be as close as to your model so that there won't be a performance loss due to the difference in the pre-processing text during pre-training and testing. So I want to make sure that I follow your pre-processing procedure on training data.

In prepare-wmt14en2de.sh, they use several scripts to tokenize and clean the corpus.

for example:
TOKENIZER=$SCRIPTS/tokenizer/tokenizer.perl
CLEAN=$SCRIPTS/training/clean-corpus-n.perl
NORM_PUNC=$SCRIPTS/tokenizer/normalize-punctuation.perl
REM_NON_PRINT_CHAR=$SCRIPTS/tokenizer/remove-non-printing-char.perl

https://github.com/pytorch/fairseq/tree/master/examples/translation

In prepare-iwslt17-multilingual.sh, it does not use this preprocessing as sentencepiece can be used on raw text.

So what is your pre-processing procedure for pre-training? I want to use the same tokenizer and normalization as your model in the pre-training and fine-tuning.

from unicoder.

Eureka6174 commented on May 14, 2024

I'm using just raw text for pre-processing. But I didn't try the tokenizer you mentioned because they are different for different languages. If you would like to have a try, I appreciate it if you could share us your results, either work or not work.

from unicoder.

decoder parameter initialization and dictionary during define-tuning about unicoder HOT 13 CLOSED

Comments (13)

Related Issues (6)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs