Comments (7)
Yeah, I was thinking the same, should be per_device_train_batch_size: 4
instead of 8 since the assumption is 8 GPUs here.
However, I think the mistake kind of propagated in their own replication: https://huggingface.co/alignment-handbook/zephyr-7b-dpo-full
For the official model released, as you mentioned , the DPO batch params seem to contradict their replication based on this repo.
Also the official model link does not have any details on the SFT part, so I have no idea yet on whether is it LoRA or Full finetuning that the HF team decided to release as the official model.
from alignment-handbook.
Hi!
I re-ran MT-bench to compare the two public DPO-trained zephyr-7b checkpoints:
The MT-bench score of HuggingFaceH4/zephyr-7b-beta (blue curves above) closely reproduces the number reported in the paper. The number is 7.34 in the paper (Table 1), and the score from my re-run is 7.37.
But the MT-bench score of alignment-handbook/zephyr-7b-dpo-full (yellow curves above) was worse overall. The score is 7.09.
There could be multiple reasons, such as:
- the randomness of GPT4 evaluation used in MT-bench (if anyone has the resources to rerun MT-bench multiple times, that'll be great)
- the difference in the SFT step (the two models used different SFT checkpoints)
- the difference in the DPO step (e.g., the global batch size difference that I mentioned; I am not sure if this is the only difference).
I am wondering if you have any insights @lewtun. It would be great if we can use the recipe to re-train the stronger HuggingFaceH4/zephyr-7b-beta with a MT-bench score of 7.37. 🙏
from alignment-handbook.
anyways, I am running an experiment for the DPO based on 4 GPUs, leaving the batch to be 8. If the loss is the same then I can confirm it official release is using the ....-sft-full
model and batch size is correct.
from alignment-handbook.
@liutianlin0121 I do not think it's the issue with the replication of the model (as we go about re-training along the recipes provided). It seems that even the officially released hugging face model score has degraded.
from alignment-handbook.
@liutianlin0121 I do not think it's the issue with the replication of the model (as we go about re-training along the recipes provided). It seems that even the officially released hugging face model score has degraded.
Yeah my objective is to reproduce the original model HuggingFaceH4/zephyr-7b-beta. Using the existing code base, I suppose I can reproduce the handbook model alignment-handbook/zephyr-7b-dpo-full, but the latter is somehow weaker in MT-bench compared to the former.
from alignment-handbook.
nt-handbook/zephyr-7b-dpo-full (yellow curves above) was worse overall. The score is 7.09.
It seems that I misunderstand your post.
Just to confirm, you are able to replicate (close enough) the MT-Bench score for the official HF model of 7.37?
from alignment-handbook.
@timothylimyl Yes. I was able to reproduce the MT-Bench score for the official model. But I ran the MT-bench evaluation a few weeks ago. To debug, perhaps it would be useful to take a look at the GPT4-generated judgement at data/mt_bench/model_judgment/gpt-4_single.jsonl
. Do they appear reasonable?
In one of my early MT-Bench runs, I used too many concurrent-api-call with
python gen_judgment.py --model-list [LIST-OF-MODEL-ID] --parallel A_LARGE_NUMBER_LIKE_8_or_16
This caused some errors in the GPT4 model judgements at data/mt_bench/model_judgment/gpt-4_single.jsonl
. Specifically, some score fields were populated with $error
, and these $error
were automatically omitted when computing the mean scores. After that, I only use a single concurrent api call, and the evaluation speed is not much slower. Not sure if this is the case for your evaluation, but perhaps it would be helpful to manually look at several model judgement.
from alignment-handbook.
Related Issues (20)
- cannot replicate DPO results of zephyr HOT 5
- Major bug: Chat template is not actually applied in run_sft.py and run_dpo.py HOT 7
- Estimated Time for SFT Fine-Tuning of Mistral-7B Model HOT 1
- Minor question about PAD token and EOS token. HOT 2
- Downloading latest CUDA version (11.6 or above) for MacOS to use FlashAttention
- Not able to run Zephyr 7B Gemma with 4 80GB A100s HOT 1
- Early Stopping Issue when used with ConstantLengthDataset
- Is there a way to freeze some layers of a model ?
- Missing config_qlora.yaml
- How to select parts to bp in sft
- Can any one share the script what params should be passed to run_dpo.py HOT 1
- Efficient dialog data format for KTO training
- Can we please add the option to work with a tokenized dataset, escpailly for the CPT task.
- Constitutional AI models do not achieve MT-Bench scores as reported
- Multi-GPU Training with DPO Full Parameter Stucks
- Cannot reproduce zephyr-7b-gemma-v0.1 HOT 2
- CPT training is giving pretty unstalbe results with the learning rate 2e-5. HOT 1
- Method to disable evaluation
- Different dtype while saving optimizer with FSDP HOT 2
- Dependency updates for QLoRA+FSDP
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alignment-handbook.