Can I not do data augmentation on unlabelled data?

Sorry, I'm still a little confused. When I test with: <code class="notranslate

Question about the back translations. about mixtext HOT 9 CLOSED

salt-nlp commented on May 18, 2024

Question about the back translations.

from mixtext.

Comments (9)

jiaaoc commented on May 18, 2024

We use back-translation to create paraphrases for unlabeled data and perform consistency training. You could use other ways to generate paraphrases.

from mixtext.

callmeYe commented on May 18, 2024

So I have to create paraphrases, right? In addition, When I look at the code, I find that only the first 100,000 pieces of data in the data set have been back translated. Do I not need to perform back translation for all the datasets？

from mixtext.

jiaaoc commented on May 18, 2024

It depends on the size of the unlabeled data you are going to use. In this work, we used 100,000 unlabeled data, so we just did back translations on them, not the whole dataset.

from mixtext.

callmeYe commented on May 18, 2024

Sorry, I'm still a little confused.
When I test with:
python ./code/train.py --gpu 0,1 --n-labeled 10 \ --data-path ./data/yahoo_answers_csv/ --batch-size 2 --batch-size-u 4 --epochs 20 --val-iteration 1000 \ --lambda-u 1 --T 0.5 --alpha 16 --mix-layers-set 7 9 12 \ --lrmain 0.000005 --lrlast 0.0005
The number of unlabeled data per class seems to be 5,000. Do they add up to exactly 100,000?

from mixtext.

jiaaoc commented on May 18, 2024

You could use up to 100,000

from mixtext.

jiaaoc commented on May 18, 2024

10,000

from mixtext.

jiaaoc commented on May 18, 2024

Anyway, the number of data you need to paraphrase only depends on the number of unlabeled data you are going to use.

from mixtext.

callmeYe commented on May 18, 2024

Are they one-to-one correspondence？

from mixtext.

jiaaoc commented on May 18, 2024

one unlabeled data could be associated with multiple paraphrases. Please refer to the paper/codes for details.

from mixtext.

Recommend Projects

Question about the back translations. about mixtext HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs