GithubHelp home page GithubHelp logo

预训练完毕后,进行lora合成报错,只训练lora参数,但是用你们提供的alpaca-lora 合成是OK的 about chinese-llama-alpaca-2 HOT 11 CLOSED

ymcui avatar ymcui commented on May 19, 2024
预训练完毕后,进行lora合成报错,只训练lora参数,但是用你们提供的alpaca-lora 合成是OK的

from chinese-llama-alpaca-2.

Comments (11)

musellama avatar musellama commented on May 19, 2024

请帮帮我

from chinese-llama-alpaca-2.

airaria avatar airaria commented on May 19, 2024

请提供完整的训练和合并的sh脚本

from chinese-llama-alpaca-2.

musellama avatar musellama commented on May 19, 2024

请提供完整的训练和合并的sh脚本

#--modules_to_save ${modules_to_save}
#--gradient_checkpointing
#lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
lr=2e-4
lora_rank=64
lora_alpha=128
lora_trainable="q_proj,v_proj"
modules_to_save="embed_tokens,lm_head"
lora_dropout=0.05

pretrained_model=path/model/chinese-alpaca-2-7b-hf
chinese_tokenizer_path=path/tokenizer
dataset_dir=path/yuliao
data_cache=temp_data_cache_dir
per_device_train_batch_size=1
per_device_eval_batch_size=1
gradient_accumulation_steps=3
output_dir=output_dir
block_size=512

deepspeed_config_file=ds_zero2_no_offload.json

torchrun --nnodes 1 --nproc_per_node 1 run_clm_pt_with_peft.py
--deepspeed ${deepspeed_config_file}
--model_name_or_path ${pretrained_model}
--tokenizer_name_or_path ${chinese_tokenizer_path}
--dataset_dir ${dataset_dir}
--data_cache_dir ${data_cache}
--validation_split_percentage 0.001
--per_device_train_batch_size ${per_device_train_batch_size}
--per_device_eval_batch_size ${per_device_eval_batch_size}
--do_train
--seed $RANDOM
--fp16
--num_train_epochs 1
--lr_scheduler_type cosine
--learning_rate ${lr}
--warmup_ratio 0.05
--weight_decay 0.01
--logging_strategy steps
--logging_steps 10
--save_strategy steps
--save_total_limit 3
--save_steps 200
--gradient_accumulation_steps ${gradient_accumulation_steps}
--preprocessing_num_workers 8
--block_size ${block_size}
--output_dir ${output_dir}
--overwrite_output_dir
--ddp_timeout 30000
--logging_first_step True
--lora_rank ${lora_rank}
--lora_alpha ${lora_alpha}
--trainable ${lora_trainable}
--lora_dropout ${lora_dropout}
--torch_dtype float16
--ddp_find_unused_parameters False

合并脚本
python scripts/merge_llama2_with_chinese_lora_low_mem.py
--base_model path_to_original_llama2_hf_dir
--lora_model pt_lora_model
--output_type huggingface
--output_dir path_to_output_dir

合并脚本 没变动
"""
Usage:
python merge_llama2_with_chinese_lora_low_mem.py
--base_model path/to/llama2-hf-model
--lora_model path/to/chinese-llama2-or-alpaca2-lora
--output_type [huggingface|pth|]
--output_dir path/to/output-dir
"""
import argparse
import json
import os
import gc
import torch
import peft
from transformers import LlamaTokenizer
from transformers.modeling_utils import dtype_byte_size
from huggingface_hub import snapshot_download
import re

parser = argparse.ArgumentParser(description='Script to merge Llama-2-hf with Chinese LLaMA-2 or Alpaca-2 LoRA weights')
parser.add_argument('--base_model', default=None, required=True,
type=str, help="Base model path (basically Llama-2-hf)")
parser.add_argument('--lora_model', default=None, required=True,
type=str, help="LoRA model path (Chinese-LLaMA-2-LoRA, Chinese-Alpaca-2-LoRA)")
parser.add_argument('--output_type', default='huggingface',choices=['huggingface', 'pth'],
type=str, help="Output model type can be 'huggingface' (default) or 'pth' format")
parser.add_argument('--output_dir', default='./merged_model',
type=str, help="Output path for the merged model")
parser.add_argument('--verbose', default=False, action='store_true',
help="Show detailed debugging messages")

emb_to_model_size = {
4096 : '7B',
5120 : '13B',
8192 : '70B',
}
num_shards_of_models = {'7B': 1, '13B': 2, '70B': 8}
params_of_models = {
'7B':
{
"dim": 4096,
"multiple_of": 256,
"n_heads": 32,
"n_layers": 32,
"norm_eps": 1e-05,
"vocab_size": -1,
},
'13B':
{
"dim": 5120,
"multiple_of": 256,
"n_heads": 40,
"n_layers": 40,
"norm_eps": 1e-05,
"vocab_size": -1,
},
'70B':
{
"dim": 8192,
"multiple_of": 4096,
"ffn_dim_multiplier": 1.3,
"n_heads": 64,
"n_kv_heads": 8,
"n_layers": 80,
"norm_eps": 1e-05,
"vocab_size": -1,
},
}

def transpose(weight, fan_in_fan_out):
return weight.T if fan_in_fan_out else weight

Borrowed and modified from https://github.com/tloen/alpaca-lora

def translate_state_dict_key(k):
k = k.replace("base_model.model.", "")
if k == "model.embed_tokens.weight":
return "tok_embeddings.weight"
elif k == "model.norm.weight":
return "norm.weight"
elif k == "lm_head.weight":
return "output.weight"
elif k.startswith("model.layers."):
layer = k.split(".")[2]
if k.endswith(".self_attn.q_proj.weight"):
return f"layers.{layer}.attention.wq.weight"
elif k.endswith(".self_attn.k_proj.weight"):
return f"layers.{layer}.attention.wk.weight"
elif k.endswith(".self_attn.v_proj.weight"):
return f"layers.{layer}.attention.wv.weight"
elif k.endswith(".self_attn.o_proj.weight"):
return f"layers.{layer}.attention.wo.weight"
elif k.endswith(".mlp.gate_proj.weight"):
return f"layers.{layer}.feed_forward.w1.weight"
elif k.endswith(".mlp.down_proj.weight"):
return f"layers.{layer}.feed_forward.w2.weight"
elif k.endswith(".mlp.up_proj.weight"):
return f"layers.{layer}.feed_forward.w3.weight"
elif k.endswith(".input_layernorm.weight"):
return f"layers.{layer}.attention_norm.weight"
elif k.endswith(".post_attention_layernorm.weight"):
return f"layers.{layer}.ffn_norm.weight"
elif k.endswith("rotary_emb.inv_freq") or "lora" in k:
return None
else:
print(layer, k)
raise NotImplementedError
else:
print(k)
raise NotImplementedError

def unpermute(w):
return (
w.view(n_heads, 2, dim // n_heads // 2, dim).transpose(1, 2).reshape(dim, dim)
)

def save_shards(model_sd, num_shards: int, prefix="", verbose=False):
"""
Convert and save the HF format weights to PTH format weights
"""
with torch.no_grad():
if num_shards == 1:
new_state_dict = {}
for k, v in model_sd.items():
new_k = translate_state_dict_key(k)
if new_k is not None:
if "wq" in new_k or "wk" in new_k:
new_state_dict[new_k] = unpermute(v)
else:
new_state_dict[new_k] = v

        os.makedirs(output_dir, exist_ok=True)
        print(f"Saving shard 1 of {num_shards} into {output_dir}/{prefix}consolidated.00.pth")
        torch.save(new_state_dict, output_dir + f"/{prefix}consolidated.00.pth")
    else:
        new_state_dicts = [dict() for _ in range(num_shards)]
        for k in list(model_sd.keys()):
            v = model_sd[k]
            new_k = translate_state_dict_key(k)
            if new_k is not None:
                if new_k=='tok_embeddings.weight':
                    assert v.size(1)%num_shards==0
                    splits = v.split(v.size(1)//num_shards,dim=1)
                elif new_k=='output.weight':
                    if v.size(0)%num_shards==0:
                        splits = v.split(v.size(0)//num_shards,dim=0)
                    else:
                        size_list = [v.size(0)//num_shards] * num_shards
                        size_list[-1] += v.size(0)%num_shards
                        splits = v.split(size_list, dim=0)  # 13B: size_list == [24976,24977]
                elif new_k=='norm.weight':
                    splits = [v] * num_shards
                elif 'ffn_norm.weight' in new_k:
                    splits = [v] * num_shards
                elif 'attention_norm.weight' in new_k:
                    splits = [v] * num_shards


                elif 'w1.weight' in new_k:
                    splits = v.split(v.size(0)//num_shards,dim=0)
                elif 'w2.weight' in new_k:
                    splits = v.split(v.size(1)//num_shards,dim=1)
                elif 'w3.weight' in new_k:
                    splits = v.split(v.size(0)//num_shards,dim=0)


                elif 'wo.weight' in new_k:
                    splits = v.split(v.size(1)//num_shards,dim=1)

                elif 'wv.weight' in new_k:
                    splits = v.split(v.size(0)//num_shards,dim=0)

                elif "wq.weight" in new_k or "wk.weight" in new_k:
                    v = unpermute(v)
                    splits = v.split(v.size(0)//num_shards,dim=0)
                else:
                    print(f"Unexpected key {new_k}")
                    raise ValueError
                if verbose:
                    print(f"Processing {new_k}")
                for sd,split in zip(new_state_dicts,splits):
                    sd[new_k] = split.clone()
                    del split
                del splits
            del model_sd[k],v
            gc.collect()    # Effectively enforce garbage collection

        os.makedirs(output_dir, exist_ok=True)
        for i,new_state_dict in enumerate(new_state_dicts):
            print(f"Saving shard {i+1} of {num_shards} into {output_dir}/{prefix}consolidated.0{i}.pth")
            torch.save(new_state_dict, output_dir + f"/{prefix}consolidated.0{i}.pth")

def merge_shards(output_dir, num_shards: int):
ckpt_filenames = sorted([f for f in os.listdir(output_dir) if re.match('L(\d+)-consolidated.(\d+).pth',f)])

for i in range(num_shards):
    shards_filenames = sorted([f for f in ckpt_filenames if re.match(f'L(\d+)-consolidated.0{i}.pth',f)])
    print(f"Loading {shards_filenames} ...")
    shards_dicts = [torch.load(os.path.join(output_dir,fn)) for fn in shards_filenames]
    shards_merged = {}
    for d in shards_dicts:
        shards_merged |= d

    print(f"Saving the merged shard to " + os.path.join(output_dir, f"consolidated.0{i}.pth"))
    torch.save(shards_merged, os.path.join(output_dir, f"consolidated.0{i}.pth"))

    print("Cleaning up...")
    del shards_merged
    for d in shards_dicts:
        del d
    del shards_dicts
    gc.collect()    # Effectively enforce garbage collection
    for fn in shards_filenames:
        os.remove(os.path.join(output_dir,fn))

if name=='main':
args = parser.parse_args()
base_model_path = args.base_model
lora_model_path = args.lora_model
output_dir = args.output_dir
output_type = args.output_type
os.makedirs(output_dir, exist_ok=True)

print(f"="*80)
print(f"Base model: {base_model_path}")
print(f"LoRA model: {lora_model_path}")

tokenizers_and_loras = []
print(f"Loading {lora_model_path}")
if not os.path.exists(lora_model_path):
    print("Cannot find lora model on the disk. Downloading lora model from hub...")
    lora_model_path = snapshot_download(repo_id=lora_model_path)
tokenizer = LlamaTokenizer.from_pretrained(lora_model_path, legacy=True)
lora_config = peft.LoraConfig.from_pretrained(lora_model_path)
lora_state_dict = torch.load(os.path.join(lora_model_path,'adapter_model.bin'),map_location='cpu')
if 'base_model.model.model.embed_tokens.weight' in lora_state_dict:
    lora_vocab_size = lora_state_dict['base_model.model.model.embed_tokens.weight'].shape[0]
    assert lora_vocab_size==len(tokenizer), \
    (f"The vocab size of the tokenizer {len(tokenizer)} does not match the vocab size of the LoRA weight {lora_vocab_size}!\n")
tokenizers_and_loras.append(
    {
        "tokenizer"  :tokenizer,
        "state_dict" :lora_state_dict,
        "config": lora_config,
        "scaling": lora_config.lora_alpha / lora_config.r,
        "fan_in_fan_out" : lora_config.fan_in_fan_out,
    })

if not os.path.exists(base_model_path):
    print("Cannot find lora model on the disk. Downloading lora model from hub...")
    base_model_path = snapshot_download(repo_id=base_model_path)
ckpt_filenames = sorted([f for f in os.listdir(base_model_path) if re.match('pytorch_model-(\d+)-of-(\d+).bin',f)])

embedding_size = None
model_size = None
total_size = 0
for index, filename in enumerate(ckpt_filenames):
    print(f"Loading ckpt {filename}")
    state_dict = torch.load(os.path.join(base_model_path,filename), map_location='cpu')
    if index == 0:
        embedding_size = state_dict['model.embed_tokens.weight'].shape[1]
        model_size = emb_to_model_size[embedding_size]
        if output_type=='pth':
            params = params_of_models[model_size]
            num_shards = num_shards_of_models[model_size]
            n_layers = params["n_layers"]
            n_heads = params["n_heads"]
            dim = params["dim"]
            dims_per_head = dim // n_heads
            base = 10000.0
            inv_freq = 1.0 / (base ** (torch.arange(0, dims_per_head, 2).float() / dims_per_head))
    print("Merging...")
    for k in state_dict:
        for tl_idx, t_and_l in enumerate(tokenizers_and_loras):
            saved_key = 'base_model.model.'+k
            lora_key_A = saved_key.replace('.weight','.lora_A.weight')
            if saved_key in t_and_l['state_dict']:
                if args.verbose:
                    print(f"copying {saved_key} from {tl_idx}-th LoRA weight to {k}")
                state_dict[k] = t_and_l['state_dict'][saved_key].half().clone() # do we need half()?
            if lora_key_A in t_and_l['state_dict']:
                lora_key_B = lora_key_A.replace('lora_A.weight','lora_B.weight')
                if args.verbose:
                    print(f"merging {lora_key_A} and lora_B.weight form {tl_idx}-th LoRA weight to {k}")
                state_dict[k] += (
                    transpose(
                        t_and_l['state_dict'][lora_key_B].float()
                      @ t_and_l['state_dict'][lora_key_A].float(), t_and_l['fan_in_fan_out']) * t_and_l['scaling']
                )
        weight_size = state_dict[k].numel() * dtype_byte_size(state_dict[k].dtype)
        total_size += weight_size

    if output_type=='huggingface':
        print(f"Saving ckpt {filename} to {output_dir} in HF format...")
        torch.save(state_dict,os.path.join(output_dir, filename))
    elif output_type=='pth':
        print(f"Converting to pth format...")
        save_shards(model_sd=state_dict, num_shards=num_shards,prefix=f"L{index+1}-", verbose=args.verbose)
    del state_dict
    gc.collect()    # Effectively enforce garbage collection

print(f"Saving tokenizer")
tokenizers_and_loras[-1]['tokenizer'].save_pretrained(output_dir)
if output_type == 'pth':
    with open(output_dir + "/params.json", "w") as f:
        print(f"Saving params.json into {output_dir}/params.json")
        json.dump(params, f)
    merge_shards(output_dir, num_shards=num_shards)

if output_type=='huggingface':
    configs = ('config.json', 'generation_config.json', 'pytorch_model.bin.index.json')
    for config in configs:
        if os.path.exists(os.path.join(base_model_path, config)):
            print(f"Saving {config}")
            with open(os.path.join(base_model_path, config),'r') as f:
                obj = json.load(f)
            if config=='config.json':
                obj['vocab_size'] = len(tokenizers_and_loras[-1]['tokenizer'])
            if config=='pytorch_model.bin.index.json':
                obj['metadata']['total_size'] = total_size
            with open(os.path.join(output_dir, config), 'w') as f:
                json.dump(obj, f, indent=2)
print("Done.")
print(f"Check output dir: {output_dir}")

from chinese-llama-alpaca-2.

iMountTai avatar iMountTai commented on May 19, 2024

上面的报错信息是text-generation-webui的,不是合并lora报错的呀。你的意思是合并没出问题,但是推理时出了问题?

from chinese-llama-alpaca-2.

musellama avatar musellama commented on May 19, 2024

是的不是 lora 合并出的问题 是推理的时候 出的问题,用 alpaca-lora 合并出来的 能跑,用pt完的就不能跑了。

from chinese-llama-alpaca-2.

airaria avatar airaria commented on May 19, 2024

合并脚本
python scripts/merge_llama2_with_chinese_lora_low_mem.py
--base_model path_to_original_llama2_hf_dir
--lora_model pt_lora_model
--output_type huggingface
--output_dir path_to_output_dir

是基于原版llama-2合并?你是在chinese-alpaca-2上训练lora的,--base_model应该指向chinese-alpaca-2

from chinese-llama-alpaca-2.

musellama avatar musellama commented on May 19, 2024

是基于原版llama-2合并的,训练是在chinese-alpaca-2上训练lora的,就是在合并的时候 应该指向 chinese-alpaca-2 对吧

from chinese-llama-alpaca-2.

airaria avatar airaria commented on May 19, 2024

是基于原版llama-2合并的,训练是在chinese-alpaca-2上训练lora的,就是在合并的时候 应该指向 chinese-alpaca-2 对吧

是的

from chinese-llama-alpaca-2.

musellama avatar musellama commented on May 19, 2024

是基于原版llama-2合并的,训练是在chinese-alpaca-2上训练lora的,就是在合并的时候 应该指向 chinese-alpaca-2 对吧

是的
OK 感谢解答问题,那如果多个lora合并的话,是不是不能跟以前一样 用,分隔 模式了,从而变成了 , 多次合并呢?
训练基于 llama2 , 合并 指向 llama2
在训练基于 合并后的,再次合并 指向 训练的这个呢?

from chinese-llama-alpaca-2.

github-actions avatar github-actions commented on May 19, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

from chinese-llama-alpaca-2.

github-actions avatar github-actions commented on May 19, 2024

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

from chinese-llama-alpaca-2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.