GithubHelp home page GithubHelp logo

Comments (9)

jiaaoc avatar jiaaoc commented on May 18, 2024

We use back-translation to create paraphrases for unlabeled data and perform consistency training. You could use other ways to generate paraphrases.

from mixtext.

callmeYe avatar callmeYe commented on May 18, 2024

So I have to create paraphrases, right? In addition, When I look at the code, I find that only the first 100,000 pieces of data in the data set have been back translated. Do I not need to perform back translation for all the datasets?

from mixtext.

jiaaoc avatar jiaaoc commented on May 18, 2024

It depends on the size of the unlabeled data you are going to use. In this work, we used 100,000 unlabeled data, so we just did back translations on them, not the whole dataset.

from mixtext.

callmeYe avatar callmeYe commented on May 18, 2024

Sorry, I'm still a little confused.
When I test with:
python ./code/train.py --gpu 0,1 --n-labeled 10 \ --data-path ./data/yahoo_answers_csv/ --batch-size 2 --batch-size-u 4 --epochs 20 --val-iteration 1000 \ --lambda-u 1 --T 0.5 --alpha 16 --mix-layers-set 7 9 12 \ --lrmain 0.000005 --lrlast 0.0005
The number of unlabeled data per class seems to be 5,000. Do they add up to exactly 100,000?

from mixtext.

jiaaoc avatar jiaaoc commented on May 18, 2024

You could use up to 100,000

from mixtext.

jiaaoc avatar jiaaoc commented on May 18, 2024

10,000

from mixtext.

jiaaoc avatar jiaaoc commented on May 18, 2024

Anyway, the number of data you need to paraphrase only depends on the number of unlabeled data you are going to use.

from mixtext.

callmeYe avatar callmeYe commented on May 18, 2024

Are they one-to-one correspondence?

from mixtext.

jiaaoc avatar jiaaoc commented on May 18, 2024

one unlabeled data could be associated with multiple paraphrases. Please refer to the paper/codes for details.

from mixtext.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.