Comments (7)
It seems that full finetuning has this problem, while lora doesn't. Could you share the yaml training configuration? Also how many GPUs are you using?
from alignment-handbook.
Thanks for your reply. I don't try the full model fine-tuning. For the lora, i only changed: gradient_accumulation_steps: 1, per_device_train_batch_size: 16, per_device_eval_batch_size: 4, save_strategy: "epoch". I am using the 8 A6000. Also, i am not sure if you observed the eval loss is increasing in the training.
from alignment-handbook.
Sorry, I did not encounter this problem. Do you use the official binary dataset? What is your base model? Though I don't think they matter that much.
from alignment-handbook.
Yeah, i agree eval loss does not matter. For the lora, how many cards you are using?
from alignment-handbook.
8 A40 cards. My new experiments also encounter this problem.
Difference between the two configurations
previous
bath size 4 accumulation 2 cards 8 lr 1e-7
new
batch size 8 accumulation 1 cards 8 lr 1e-4
I think the main change it I increase lr a lot, are you sure you use a lr=1e-7 in your experiments?
from alignment-handbook.
Iβm currently training a lora across all mistral modules with the standard setting with the exception of no eval, and a single batch size on a 3090. My loss is hitting .29 and itβs only been training for 180 steps. (.4 epochs).
edit:
Epoch .52, 210 steps in, the loss is at .18 and rewards/accuracy is 1.0.
from alignment-handbook.
quite weird, i just trained the DPO and my loss is normal across epochs, pretty much similar to the results shared on hf model card.
how about rebase and try again ? definitively .29 or lower is because the model is seeing the right prediction token somehow.
from alignment-handbook.
Related Issues (20)
- Downloading latest CUDA version (11.6 or above) for MacOS to use FlashAttention
- Not able to run Zephyr 7B Gemma with 4 80GB A100s HOT 1
- Early Stopping Issue when used with ConstantLengthDataset
- Is there a way to freeze some layers of a model ?
- Missing config_qlora.yaml
- How to select parts to bp in sft
- Can any one share the script what params should be passed to run_dpo.py HOT 1
- Efficient dialog data format for KTO training
- Can we please add the option to work with a tokenized dataset, escpailly for the CPT task.
- Constitutional AI models do not achieve MT-Bench scores as reported
- Multi-GPU Training with DPO Full Parameter Stucks
- Cannot reproduce zephyr-7b-gemma-v0.1 HOT 2
- CPT training is giving pretty unstalbe results with the learning rate 2e-5. HOT 1
- Method to disable evaluation
- Different dtype while saving optimizer with FSDP HOT 2
- Dependency updates for QLoRA+FSDP
- Clarification on dataset mixer HOT 2
- How to work with local data HOT 1
- FSDP + QDoRA Support HOT 5
- Issue Running `run_sft.py` After Configuration Changes in GMAL Folder : (ChildFailedError) HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alignment-handbook.