Comments (17)
Do you mean gradient_accumulation_steps? The code has implemented. You can add option --gradient_accumulation_steps n for incremental training
from codexglue.
Alright, thanks much !
I come from tensorflow background, therefore i am unaware of how its done in pytorch.
I would be thankful, if you can let me know what exactly i need to do
Say i run training for 2 epoch, save checkpoint and want to start again from saved check point.
from codexglue.
Change "pretrained_model=microsoft/codebert-base" to "pretrained_model=saved_checkpoint_path"
from codexglue.
Alright.
Thanks Much !!
from codexglue.
One more question, just to be sure.
My calls should look like following:
Do i need to use --gradient_accumulation_steps somewhere now ? or just --pretrained_model should be fine
Call 1 for first two epochs:
python run.py --do_train --do_eval --model_type roberta --model_name_or_path "microsoft/codebert-base" --train_filename "../dataset/java/valid.jsonl" --dev_filename "../dataset/java/valid.jsonl" --output_dir "model/java" --max_source_length 256 --max_target_length 128 --beam_size 10 --train_batch_size 8 --eval_batch_size 8 --learning_rate 5e-5 --num_train_epochs 2
Call 2 for training for next two epoch
python run.py --do_train --do_eval --model_type roberta --model_name_or_path "saved_checkpoint_path" --train_filename "../dataset/java/valid.jsonl" --dev_filename "../dataset/java/valid.jsonl" --output_dir "model/java" --max_source_length 256 --max_target_length 128 --beam_size 10 --train_batch_size 8 --eval_batch_size 8 --learning_rate 5e-5 --num_train_epochs 2
from codexglue.
just --pretrained_model is fine
from codexglue.
Thanks
from codexglue.
python run.py --do_train --do_eval --model_type roberta --model_name_or_path "saved_checkpoint_path" --train_filename "../dataset/java/valid.jsonl" --dev_filename "../dataset/java/valid.jsonl" --output_dir "model/java" --max_source_length 256 --max_target_length 128 --beam_size 10 --train_batch_size 8 --eval_batch_size 8 --learning_rate 5e-5 --num_train_epochs 2
from codexglue.
Sorry, the option should be --load_model_path.
python run.py --do_train --do_eval --model_type roberta --model_name_or_path microsoft/codebert-base --train_filename "../dataset/java/valid.jsonl" --dev_filename "../dataset/java/valid.jsonl" --output_dir "model/java" --max_source_length 256 --max_target_length 128 --beam_size 10 --train_batch_size 8 --eval_batch_size 8 --learning_rate 5e-5 --num_train_epochs 2 --load_model_path $output_dir/checkpoint-best-bleu/pytorch_model.bin
from codexglue.
Alright,
Thanks once again.
from codexglue.
Hi,
Just to test the flow, I started training for 1 epoch, and Model was saved.
python run.py --do_train --do_eval --model_type roberta --model_name_or_path "microsoft/codebert-base" --train_filename "../dataset/javascript/valid.jsonl" --dev_filename "../dataset/javascript/valid.jsonl" --output_dir "model/javascript" --max_source_length 256 --max_target_length 128 --beam_size 10 --train_batch_size 16 --eval_batch_size 16 --learning_rate 5e-5 --num_train_epochs 1
Again I started the training from the trained model for next 2 epochs
python run.py --do_train --do_eval --model_type roberta --model_name_or_path "microsoft/codebert-base" --train_filename "../dataset/javascript/valid.jsonl" --dev_filename "../dataset/javascript/valid.jsonl" --output_dir "model/javascript" --max_source_length 256 --max_target_length 128 --beam_size 10 --train_batch_size 16 --eval_batch_size 16 --learning_rate 5e-5 --num_train_epochs 2 --load_model_path "/content/code/model/javascript/checkpoint-best-bleu/pytorch_model.bin"
Training has started again, but in console it says "Epoch 0" again instead of "Epoch 1"
Is this normal for script to say Epoch 0 again ? Is it actually Epoch 1 as essentially I am incrementally training on last checkpoint model.
Log for first iteration (Epoch 1)
12/03/2020 08:34:08 - INFO - main - Num examples = 3885
12/03/2020 08:34:08 - INFO - main - Batch size = 16
12/03/2020 08:34:08 - INFO - main - Num epoch = 1
epoch 0 loss 6.5622: 100% 243/243 [08:22<00:00, 2.07s/it]
12/03/2020 08:42:34 - INFO - main -
***** Running evaluation *****
12/03/2020 08:42:34 - INFO - main - Num examples = 3885
12/03/2020 08:42:34 - INFO - main - Batch size = 16
12/03/2020 08:45:32 - INFO - main - eval_ppl = 306.69674
12/03/2020 08:45:32 - INFO - main - global_step = 244
12/03/2020 08:45:32 - INFO - main - train_loss = 6.5622
12/03/2020 08:45:32 - INFO - main - ********************
12/03/2020 08:45:34 - INFO - main - Best ppl:306.69674
12/03/2020 08:45:34 - INFO - main - ********************
Total: 1000
12/03/2020 08:53:21 - INFO - main - bleu-4 = 7.58
12/03/2020 08:53:21 - INFO - main - ********************
12/03/2020 08:53:21 - INFO - main - Best bleu:7.58
12/03/2020 08:53:21 - INFO - main - ********************
Log for second iteration (Epoch 2)
12/03/2020 08:58:29 - INFO - main - ***** Running training *****
12/03/2020 08:58:29 - INFO - main - Num examples = 3885
12/03/2020 08:58:29 - INFO - main - Batch size = 16
12/03/2020 08:58:29 - INFO - main - Num epoch = 2
epoch 0 loss 5.4316: 100% 243/243 [08:22<00:00, 2.07s/it]
12/03/2020 09:06:54 - INFO - main -
***** Running evaluation *****
12/03/2020 09:06:54 - INFO - main - Num examples = 3885
12/03/2020 09:06:54 - INFO - main - Batch size = 16
12/03/2020 09:09:50 - INFO - main - eval_ppl = 117.87884
12/03/2020 09:09:50 - INFO - main - global_step = 244
12/03/2020 09:09:50 - INFO - main - train_loss = 5.4316
12/03/2020 09:09:50 - INFO - main - ********************
12/03/2020 09:09:52 - INFO - main - Best ppl:117.87884
12/03/2020 09:09:52 - INFO - main - ********************
from codexglue.
Since loss has decreased in subsequent epochs, shall I assume that it is actually Epoch 1 and not epoch 0
In simple terms i want to be sure, that it is not training from scratch again.
from codexglue.
Note that i am training on valid.jsonl just to quickly test the flow
from codexglue.
--load_model_path
just only re-load the model from checkpoint, but optimizer and logs will be reset. Maybe for implementing incremental training, we also need to save optimizer and logs.
from codexglue.
Alrights
Resetting of logger is fine.
But not optimizer. right ?
from codexglue.
Replace run.py with run.txt. You just need to re-run the following command and the program will restore the last checkpoint for incremental training.
lang=ruby #programming language
lr=5e-5
batch_size=32
beam_size=10
source_length=256
target_length=128
data_dir=../dataset
output_dir=model/$lang
train_file=$data_dir/$lang/train.jsonl
dev_file=$data_dir/$lang/valid.jsonl
epochs=10
pretrained_model=microsoft/codebert-base #Roberta: roberta-base
python run.py --do_train --do_eval --model_type roberta --model_name_or_path $pretrained_model --train_filename $train_file --dev_filename $dev_file --output_dir $output_dir --max_source_length $source_length --max_target_length $target_length --beam_size $beam_size --train_batch_size $batch_size --eval_batch_size $batch_size --learning_rate $lr --num_train_epochs $epochs
from codexglue.
Many Thanks for prompt response !
from codexglue.
Related Issues (20)
- Is there a python library to use code BLEU? HOT 1
- 403 Forbidden error for Code-To-Text data files HOT 6
- Question related to fine tuning pretrained models for Defect-Detection task
- clone detection reproduction,CodeBert pipeline MAP@R only 76.64
- The CodeBlue evaluation script about code-to-code translation
- About CodeBLEU
- not a mach-o file when run code bleu
- javascript keys for CodeBLEU HOT 1
- [Code Completion - Token level] About eval_acc function
- [codecompletion-token] split function in code/dataset.py
- Missing data in ConCode HOT 2
- this line is wrong HOT 2
- The mlm and mlm_probability arguments in the run.py are not effective.
- Save_total_limit argument not used in run.py
- When resuming from a saved checkpoint, the train_dataloader doesn't resume from the same saved step.
- idx_file.txt is not effectively updated with the current epoch.
- Convert gradient accumulation with Accelerate
- no test file of webquery_predictions
- CloseTesting answer dataset is empty
- Question about text-code evaluation
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from codexglue.