Comments (18)
The converted PyTorch models can be fine-tuned similarly to other causal LMs in HuggingFace.
See tutorials like http://reyfarhan.com/posts/easy-gpt2-finetuning-huggingface/.
from codegen.
@smith-co @thisisanshgupta @tlkh
For torch, I wrote up a minimal example in deepspeed, which can train the 16B on a ~24 GB gpu. You would need to sanity test this, optimize the configuration, plug in the data loader, and save the weights to disk:
https://github.com/salesforce/CodeGen/blob/main/jaxformer/hf/train_deepspeed.py
For jax, the training library in is undergoing sanity checks on TPU-v3 and should be released soon.
from codegen.
@thisisanshgupta @Ontopic Yes, I'm working on the release of my training library for TPU-v3/v4 and will keep you posted.
from codegen.
Hello @enijkamp thank you for your work. Looking forward to some fine-tuning instructions and code.
Currently, I have tried to fine-tune as if it is GPT-2, but I am running into issues where the model's quality degrades significantly.
Is there any particular way the data has to be structured for fine-tuning? Currently, I am just concatenating together the prompts and code as follows:
def xyz():
"""abc"""
code()
def xyz():
"""abc"""
code()
from codegen.
@enijkamp I want to fine-tune the model with my own code data, how should I build the dataset. Are there any requirements for the format of the dataset, whether the data needs to be labeled and what format should it be labeled in. Can some guidance or examples be given, thanks๏ผ
from codegen.
@enijkamp : I want to finetune mono model , Can you please share dataset format for python and details steps or notebook .
from codegen.
+1
from codegen.
Would you be releasing training code for the original models? Would be nice to try some on v3s (if possible).
from codegen.
I think this script might help in finetuning:
from codegen.
@TheodoreGalanos Working on a release for the JAX coding. I trained the models on TPU-v4 and have to resolve a blocker for v3.
from codegen.
@enijkamp @thisisanshgupta I am checking the link you have shared.
Still I think it would greatly help everyone if it is possible to provide fine tuning steps in the repo. ๐
from codegen.
I for one would appreciate any code/directions needed to run things on a TPU-v4. Great work all!
from codegen.
@smith-co @thisisanshgupta @tlkh @Ontopic @TheodoreGalanos @shmuelhizmi A first release of the training code for TPU-v3/v4 is here:
https://github.com/salesforce/jaxformer
from codegen.
@smith-co @thisisanshgupta @tlkh
For torch, I wrote up a minimal example in deepspeed, which can train the 16B on a ~24 GB gpu. You would need to sanity test this, optimize the configuration, plug in the data loader, and save the weights to disk: https://github.com/salesforce/CodeGen/blob/main/jaxformer/hf/train_deepspeed.py
For jax, the training library in is undergoing sanity checks on TPU-v3 and should be released soon.
Besides the VRAM, how much RAM would be required to train the model?
from codegen.
@enijkamp , or anyone who has used jaxformer to fine-tune on TPU-v4, what is the approximate cost?
from codegen.
@glicerico Roughly speaking, cost is a function of the size of the model and data. How much data do you have? Which model do you want to fine-tune?
from codegen.
@enijkamp , trying to reproduce the work by Shin and Van Durme, who used a few hundred (sentence, parse) pairs to fine tune codex for semantic parsing. I would like to do this with CodeGen. Seeing your results, I would probably want to fine tune the 16GB model.
from codegen.
@glicerico Roughly speaking, cost is a function of the size of the model and data. How much data do you have? Which model do you want to fine-tune?
Is there any more easier code script template withouth deep-speed to fine-tune CodeGen(350M)?
Plus: Is the data format same as other pre-trained model like CodeT5 or CodeBERT?
Looking forward to the reply.
from codegen.
Related Issues (20)
- What is the hardware requirement for fine tuning codegen 2B and higher models?
- memory out of error. Hardware requirements HOT 1
- A question about the detail of data preprocessing
- Limit of code generation HOT 1
- instruct dataset
- Using LoRA with CodeGen 2B mono HOT 2
- How to use infills sampling?
- What is min loss in CodeGen1B while finetuning.
- Clarity on training data for each of the codegen versions
- How to use gpu to accelerate inference? HOT 1
- How much VRAM do I need if I want to enable GPU acceleration? codegen25-7B-instruct
- Set different temperature
- fine tunning : data format
- AttributeError: 'CodeGen25Tokenizer' object has no attribute 'encoder' HOT 3
- What is the context window for Codegen2? HOT 1
- Defect detection
- Error calling tokenizer.get_vocab() (Codegen2.5) HOT 1
- Atrribute Error: 'AlignConfig' object has no attribute 'encoder', 'PoolFormerConfig' object has no attribute 'encoder'. HOT 1
- Which dataset is used for fine-tuning CodeGen25-7B-multi resulting in CodeGen25-7B-mono?
- AttributeError: 'CodeGenTokenizer' object has no attribute 'encoder'. Did you mean: 'encode'? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from codegen.