Comments (5)
The training_utils
folder should be a great starting point here.
Collect your data (Wikipedia/Reddit/News/etc.), convert to TFRecords with the appropriate control codes, transfer the data to GCS if using TPUs, and then train. We used Adagrad with a linear warmup and no learning rate decay. With TPUs, you can spawn a cloud TPU pod and the Estimator
will take care of all data parallelism.
from ctrl.
The
training_utils
folder should be a great starting point here.
Collect your data (Wikipedia/Reddit/News/etc.), convert to TFRecords with the appropriate control codes, transfer the data to GCS if using TPUs, and then train. We used Adagrad with a linear warmup and no learning rate decay. With TPUs, you can spawn a cloud TPU pod and theEstimator
will take care of all data parallelism.
How are these two files(codes && control_codes.txt) generated?
from ctrl.
codes
has nothing to do with the control codes; it is the BPE codes you get from fastBPE (see https://github.com/glample/fastBPE#learn-codes)
For control_codes.txt
, you can first collect your data and decide the list of control codes you want (this is the first column). Then, TFRecord each file with its corresponding control code map and then figure out the percentage of data later (if this is relevant to you; for training it isn't).
from ctrl.
Is it feasible to train a CTRL model from scratch on Colab with free TPU? If not, how many TPUs would be required for how much money?
from ctrl.
On a large amount of data, I don't think that would work. We trained on 256 cores of the Cloud TPU v3 Pod. You should be able to train on slightly smaller slices with a commensurate increase in training time as well. Regd. pricing, I think the best resource would be Google's official sheet.
from ctrl.
Related Issues (20)
- Using ctrl for summarization HOT 2
- TPU configuration - fine tuning
- Is that a way to do "general" generation? HOT 1
- Source attribution - Cannot replicate results
- why set "seq_length = min(args.generate_num, 256)" HOT 1
- training curriculum used
- repeats the last word on AWS HOT 1
- License for pre-trained model
- Sampling method used for translation
- tips and scripts related to data collection HOT 1
- Are control codes required for finetuning?
- Altering the tone of the output
- Will BERT+transformer-decoder better than tensor2tensor for text-generation?
- control code not recognised HOT 2
- Issues with pytorch_generation.py when running the Colab exercise
- CTRL model can not work in huggingface transformers HOT 2
- 12 layer (huggingface gpt-2 equivalent) ctrl model?
- Cuda out of memory issue.
- A transformer decoer-based model or seq2seq model?
- control code
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ctrl.