Describe the bug When I'm fine tuning llama2 with deepspeed zero3

Maybe this link will help, <a href="https://huggingface.co/docs/transformers/main/en/d

Maybe this link will help, <a href="https://huggingface.co/docs/transform

[BUG] is_zero_init_model is always False when I'm using zero_init! about deepspeed HOT 4 OPEN

CHNRyan commented on July 3, 2024

[BUG] is_zero_init_model is always False when I'm using zero_init!

from deepspeed.

Comments (4)

Taiinguyenn139 commented on July 3, 2024

Maybe this link will help, https://huggingface.co/docs/transformers/main/en/deepspeed?models=pretrained+model#non-trainer-deepspeed-integration

from deepspeed.

CHNRyan commented on July 3, 2024

Maybe this link will help, https://huggingface.co/docs/transformers/main/en/deepspeed?models=pretrained+model#non-trainer-deepspeed-integration

@Taiinguyenn139 Thanks for your reply! I have tried it but I lose. Here is my code, maybe it is not correct:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1,2,3,4,5,6,7'

from datasets import load_dataset
import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer, TrainingArguments
import bitsandbytes as bnb
from peft import LoraConfig
from trl import SFTTrainer

from accelerate import Accelerator
accelerator = Accelerator()

from transformers.integrations import HfDeepSpeedConfig
import deepspeed
ds_config = "ds_config/3.json"
dschf = HfDeepSpeedConfig(ds_config) 

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True, 
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4", 
    bnb_4bit_compute_dtype=torch.bfloat16, 
)

base_model_name ="/home/yangtong/data/llama2-hf/llama2-13b-chat_hf"
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=bnb_config, 
    torch_dtype=torch.bfloat16
)
base_model.config.use_cache = False
base_model.config.pretraining_tp = 1
# engine = deepspeed.initialize(model=base_model, config_params=ds_config)

dataset = load_dataset("json",data_files="Belle_open_source_0.5M_changed.json",split="train")

result_dir = "tmp"
training_args = TrainingArguments(
    report_to="wandb",
    output_dir=result_dir, 
    # per_device_train_batch_size * gradient_accumulation_steps = batch_size
    per_device_train_batch_size=1,
    gradient_accumulation_steps=16,
    learning_rate=2e-4,
    logging_steps=10, 
    # max_steps=520, 
    num_train_epochs=0.037,
    save_steps=500,  # 65
    bf16 = True,
    # optim='paged_adamw_32bit',
    gradient_checkpointing=True,
    # group_by_length=True,
    # remove_unused_columns=False,
    # warmup_ratio=0.03,
    # lr_scheduler_type='constant',
    # max_grad_norm=0.3
)

models = ['v_proj', 'gate_proj', 'down_proj', 'k_proj', 'q_proj', 'o_proj', 'up_proj']

peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=models
)

tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
tokenizer.deprecation_warnings["Asking-to-pad-a-fast-tokenizer"] = True
tokenizer.pad_token = tokenizer.eos_token

max_seq_length = 512  
trainer = SFTTrainer(
    model=base_model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_args
)

trainer.train()

output_dir = os.path.join(result_dir, "final_checkpoint")
trainer.model.save_pretrained(output_dir)
# trainer.save_model(output_dir)  # Stage-3

I have some questions about the way your provide:
(1) The situation in this link is "Non-Trainer DeepSpeed integration". I'm wondering I use SFTtrainer in my code, isn't it attribute to Trainer?
(2) I'm using accelerate, and I set TrainingArguments before from_pretrained in my origin code refer to https://huggingface.co/docs/transformers/v4.18.0/en/main_classes/deepspeed#constructing-massive-models:~:text=If%20you%20want%20to%20use%20a,is%20how%20example%20scripts%20are%20written.. Is necessery to set HfDeepSpeedConfig?

from deepspeed.

Taiinguyenn139 commented on July 3, 2024

(1) In my experience, you can run ZeRO 3 with SFTrainer or Trainer
(2) I dont use accelerate but I use deepspeed command like this

deepspeed train.py

You don't need to set HfDeepSpeedConfig

(3) To more clearly, ZeRO stage 3 won't shard your params because your are using QLoRA, as discussed in this post
https://www.reddit.com/r/LocalLLaMA/comments/1ai5mv3/thoughts_on_qlora_with_fsdp/
It's just offload your params to CPU only. So is_zero_init_model is always False maybe expected behavior.

from deepspeed.

CHNRyan commented on July 3, 2024

(1) In my experience, you can run ZeRO 3 with SFTrainer or Trainer (2) I dont use accelerate but I use deepspeed command like this
deepspeed train.py
You don't need to set HfDeepSpeedConfig

(3) To more clearly, ZeRO stage 3 won't shard your params because your are using QLoRA, as discussed in this post https://www.reddit.com/r/LocalLLaMA/comments/1ai5mv3/thoughts_on_qlora_with_fsdp/ It's just offload your params to CPU only. So is_zero_init_model is always False maybe expected behavior.

@Taiinguyenn139 Thank you for your help! I'm using SFTrainer so I think I don't need HFDeepSpeedConfig. And using my origin code and command "accelerate launch --config_file "config/z3_3.yaml" --num_processes 1 ft_acc.py" is entirly equal to "deepspeed ft_acc.py" with "deepspeed="config_path"" added in TrainingArguments.
And based on the link you provided, I try to use zero3+lora instead zero3+qlora (just remove bnb_config = BitsAndBytesConfig(...) ). Then it magically worked! Parameters first shard then load to each GPU! It looks like zero3_init don't support qlora, but except this link, I didn't search any information about that. Maybe I'll open another issue to ask this question. And I'll truly appreciate if you have any other info help me!

from deepspeed.

[BUG] is_zero_init_model is always False when I'm using zero_init! about deepspeed HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs