I am training DPO with lora, the loss has weird behavior: will decrease sharply at the

8 A40 cards. My new experiments also encounter this problem. <a target="_blank" re

DPO loss about alignment-handbook HOT 7 OPEN

huggingface commented on May 15, 2024

DPO loss

from alignment-handbook.

Comments (7)

ChenDRAG commented on May 15, 2024

It seems that full finetuning has this problem, while lora doesn't. Could you share the yaml training configuration? Also how many GPUs are you using?

from alignment-handbook.

JiuhaiChen commented on May 15, 2024

Thanks for your reply. I don't try the full model fine-tuning. For the lora, i only changed: gradient_accumulation_steps: 1, per_device_train_batch_size: 16, per_device_eval_batch_size: 4, save_strategy: "epoch". I am using the 8 A6000. Also, i am not sure if you observed the eval loss is increasing in the training.

from alignment-handbook.

ChenDRAG commented on May 15, 2024

Sorry, I did not encounter this problem. Do you use the official binary dataset? What is your base model? Though I don't think they matter that much.

from alignment-handbook.

JiuhaiChen commented on May 15, 2024

Yeah, i agree eval loss does not matter. For the lora, how many cards you are using?

from alignment-handbook.

ChenDRAG commented on May 15, 2024

8 A40 cards. My new experiments also encounter this problem.

Difference between the two configurations
previous

bath size 4 accumulation 2 cards 8 lr 1e-7

new
batch size 8 accumulation 1 cards 8 lr 1e-4

I think the main change it I increase lr a lot, are you sure you use a lr=1e-7 in your experiments?

from alignment-handbook.

NicolasMejiaPetit commented on May 15, 2024

I’m currently training a lora across all mistral modules with the standard setting with the exception of no eval, and a single batch size on a 3090. My loss is hitting .29 and it’s only been training for 180 steps. (.4 epochs).

edit:
Epoch .52, 210 steps in, the loss is at .18 and rewards/accuracy is 1.0.

from alignment-handbook.

fblgit commented on May 15, 2024

quite weird, i just trained the DPO and my loss is normal across epochs, pretty much similar to the results shared on hf model card.
how about rebase and try again ? definitively .29 or lower is because the model is seeing the right prediction token somehow.

from alignment-handbook.

Recommend Projects

DPO loss about alignment-handbook HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs