GithubHelp home page GithubHelp logo

hscspring / hcgf Goto Github PK

View Code? Open in Web Editor NEW
202.0 7.0 21.0 295 KB

Humanable Chat Generative-model Fine-tuning | LLM微调

License: Apache License 2.0

Makefile 0.23% Python 84.71% Jupyter Notebook 14.87% Shell 0.18%
chatglm chatgpt fine-tuning chatglm2 large-language-models llama llm

hcgf's Introduction

hcgf

A Humanable Chat Generative-model Fine-tuning tool.

Install

pip install hcgf

安装依赖:

pip install -r requirements.txt
  • 建议使用PyTorch2.0。
  • 未支持多节点。

Fine-tuning

支持的模型:

Dataset

每一行一个dict的.json文件,必须包含promptcompletion两个字段。示例如下:

{"prompt": "你是谁?", "completion": "不告诉你。"}

Command Fine-tuning

支持分布式Zero3、Zero2和DDP模式,使用方法请参考帮助文档。

hcgf_tune -h

至少要指定modeldata_path参数,如下。

hcgf_tune --model THUDM/chatglm-6b --data_path path/to/train_data.json lora

首先要理解一下,模型训练时,除了模型(也就是参数)占用的空间外,还有优化器、梯度等会占用显存。

一共五种策略:

  • fsdp_zero3:命令行模式默认策略,FULL_SHARD,参数、梯度、优化器状态SHARD,慢但是省显存,数据并行。
  • fsdp_zero2:GRAD_OP_SHARD,梯度、优化器状态SHARD,比上面那个快一些,数据并行。
  • mpdp(ddp):NO_SHARD,类似DDP,就是把模型分别加载到每张卡上,比上面2个都快,数据并行。
  • mpds(8bit):8bit模式(下面的《8bit Fine-tuning》),模型被分到多个卡(甚至CPU)上,没有数据并行,很慢。
  • msds(single_gpu):单卡模式(下面的《Single Device Fine-tuning》),能跑起来的情况下比较快。
卡数 显存 训练数据 策略
多卡 单卡跑不起模型 数据很多 fsdp_zero3/fsdp_zero2
单卡跑得起模型 数据很多 mpdp
单卡跑不起模型 数据很少 mpds
单卡跑得起模型 数据很少 msds
单卡 单卡跑不起模型 - mpds
单卡跑得起模型 - msds

注意事项:

  • 这里显存是在训练模式下的,和推理模式占用不同,可参考下面的《Configuration》。推理只支持后两种模式。
  • FSDP模式下可能还没有单卡快(单卡跑得起的时候),这是正常的,因为FSDP对数据分片了,而且为了更大限度地使用显存,还可能需要把一些数据倒腾到CPU。
  • 分布式训练下,batch_size其实是per_device_batch_size,真正的batch_size相当于device_num×per_device_batch_size。也就是说,同样的batch_size、数据和配置下,单卡比多卡更新的次数多。
  • 如果有accumulate_steps参数,则需要再乘以它才是真正更新参数的batch_size。

Single Device Fine-tuning

至少需要一张16G显存的卡。如果不指定显卡,默认为cuda

#===== 微调 =====#
import hcgf
gl = hcgf.GlmLora("THUDM/chatglm-6b", device="cuda:0")
gl.load_data("/path/to/data.json").tune()

#===== 推理 =====#
gl = hcgf.GlmLora("THUDM/chatglm-6b", device="cuda:0")
gl.load_pretrained("/path/to/lora_pt").eval()
gl.chat("你是谁?")

#===== 切换模式 =====#
gl = hcgf.GlmLora("THUDM/chatglm-6b", device="cuda:0")
gl.load_data("/path/to/data.json").tune()
# 切换到推理模式
gl.eval()
gl.chat("你是谁?")
# 切换回微调模式,还是用原来的数据继续跑
gl.tune()
# 如果有新的数据集,参考上面的写法,先加载数据
gl.load_data("/path/to/new_data.json").tune()
# 如果在原来的基础上用新数据继续微调,先加载之前的pt文件,再加载数据微调
gl.load_pretrained("/path/to/lora_pt").load_data("/path/to/new_data.json").tune()

当然,也可以使用hcgf_tune:

hcgf_tune strategy msds --model THUDM/chatglm-6b --data_path path/to/train_data.json lora

8bit Fine-tuning

至少需要一张12G显存的卡。不指定device。只需要初始化时改一下即可,其他操作和上面正常微调一样。

需要安装依赖: bitsandbytes

gl = hcgf.GlmLora("THUDM/chatglm-6b", load_in_8bit=True)

当然,也可以使用hcgf_tune:

hcgf_tune strategy mpds --model THUDM/chatglm-6b --data_path path/to/train_data.json lora

Continually Fine-tuning

先加载之前的pt文件,然后加载数据微调。

gl.load_pretrained("/path/to/lora_pt").load_data("/path/to/new_data.json").tune()

Demo/Inference

请执行hcgf_infer -h查看帮助。

Parameters

主要方法参数,有值的表示默认值。

load_data(
    data_path: str, 
    max_seq_len: int = 512, # 句子最大长度,超过会截断。注意,这里指Prompt或Completion的长度,应保证两者长度之和不大于模型最大长度。
)
tune(
    batch_size: int = 8,
    lr: float = 2e-4,
    num_epochs: int = 3,
    warmup_steps: Optional[int] = None,     # 为None时会用1/3个Epoch进行warmup
    accumulate_steps: Optional[int] = None, # 为None时等价于1
    out_dir: str = "./output/",
    print_every: Optional[int] = None,      # 为None时每1/10Epoch个Steps打印一次输出(Step、Loss、LearningRate)
)
# 未说明参数含义同`chat`
generate(
    sents: Union[str, List[str]],           # 输入的句子,可以是str或列表(多个输入),**注意**需要根据训练样本格式构造好输入。
    do_sample: bool = True,
    num_beams: int = 1,
    temperature: float = 0.2,
    top_p: float = 0.7,
    repetition_penalty: float = 1.02,
)
# ChatGLM only
chat(
    inp: str, 
    history: List[Tuple[str, str]] = None,  # (问,答)Pair对
    max_new_tokens: int = 512,              # 生成的文本最大长度,Prompt的长度=支持的最大长度-max_new_tokens,Prompt长度超过会被截断
    do_sample: bool = True,                 # 采样
    num_beams: int = 1,                     # Beam Search 的 beam 数量
    temperature: float = 0.95,              # 越小越确定,越大越随机,比如你微调后可以把它改成0.1
    top_p: float = 0.7,                     # 同上,两者不要同时调
    repetition_penalty: float = 1.02,       # 生成内容重复惩罚,越大越不容易重复
    stop: List[str] = []                    # 停止文本,可以是标点、特定词或句子等,输出不包含停止文本
)

Better Practice:

  • 一般只需调整temerature

Configuration

有几个影响显存的参数可以配置:max_seq_lenbatch_size

(
gl
.load_data("./data/chatgpt_finetune_faq.json", max_seq_len=128)
.tune(batch_size=1)
)

以下配置针对ChatGLM-6B

不同配置 8bit 资源占用:

max_seq_len batch_size memory
64 1 11G
128 1 12G
512 1 22G
128 2 15G
128 4 21G

不同配置正常资源占用:

max_seq_len batch_size memory
64 1 15G
128 1 16G
512 1 30G
128 2 19G
128 4 25G

RM

使用小模型(如BERT等)训练。

Training

Dataset

需要pair对数据,计算logits过程和普通预训练模型一样(一个Batch多个pair对);计算loss时属于同一个pair对的logits放一块算。

推理时直接用logits就行。

Test

# 全部测试
python -m pytest
# 测试训练和推理,比较慢
python -m pytest -s -m slow
# 测试其他的
python -m pytest -m "not slow"

Other

如果遇到加载超时,可以直接load本地cache下的模型:

GlmLora("/path/to/huggingface/models--THUDM--chatglm-6b/snapshots/<id>/")

ChangeLog

  • v0.4.0 20230909
    • 支持Qwen、ChatGLM2、Baichuan等
    • 支持IA3微调
  • v0.3.0 20230526
    • 支持LLaMA(包括Native、Alpaca、Ziya等)
  • v0.2.0 20230513
    • 支持分布式微调
    • 调整推理模式,支持Batch
  • v0.1.0 20230412
    • 支持ChatGLM新版Tokenizer
    • 使用官方调整后的MASK方式
  • v0.0.7 20230405

hcgf's People

Contributors

hscspring avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

hcgf's Issues

多卡场景下微调完进行推理的时候报错

gl.load_pretrained("lora-ckpt-last-1110.pt").eval()
Switch to inference mode...
gl.chat("你是谁?")
Traceback (most recent call last):
File "", line 1, in
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/hcgf-0.0.7-py3.10.egg/hcgf/sft/lora_ft.py", line 253, in chat
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 43, in generator_context
response = gen.send(None)
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/hcgf-0.0.7-py3.10.egg/hcgf/sft/lora_ft.py", line 225, in stream_chat
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 43, in generator_context
response = gen.send(None)
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/hcgf-0.0.7-py3.10.egg/hcgf/sft/chatglm/modeling_chatglm.py", line 1345, in stream_generate
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/hcgf-0.0.7-py3.10.egg/hcgf/sft/chatglm/modeling_chatglm.py", line 1148, in forward
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/hcgf-0.0.7-py3.10.egg/hcgf/sft/chatglm/modeling_chatglm.py", line 895, in forward
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 160, in forward
return F.embedding(
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper__index_select)

无法直接eval模型

gl = hcgf.GlmLora("THUDM/chatglm-6b", device="cuda:0")
gl.load_pretrained("lora.pt")
gl.eval()

如果没有gl.load_pretrained("lora.pt")这句话,就会

Loading tokenizer THUDM/chatglm-6b
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Switch to inference mode...
Traceback (most recent call last):
  File "/home/user/git/hcgf/test2/webui.py", line 12, in <module>
    gl.eval()
  File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/hcgf/sft/ft.py", line 279, in eval
    self.model.half()
AttributeError: 'GlmLora' object has no attribute 'model'. Did you mean: 'mode'?

ModuleNotFoundError: No module named 'hcgf'

import hcgf
gl = hcgf.GlmLora("model", load_in_8bit=True, lora_r=32)

ModuleNotFoundError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_15056\644290863.py in
----> 1 import hcgf
2 gl = hcgf.GlmLora("model", load_in_8bit=True, lora_r=32)

ModuleNotFoundError: No module named 'hcgf'

gl.load_data("./tests/test_data.json").tune()

NameError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_15056\3044540026.py in
----> 1 gl.load_data("./tests/test_data.json").tune()

NameError: name 'gl' is not defined

微调和推理不能兼得

# 微调
import hcgf
gl = hcgf.GlmLora("THUDM/chatglm-6b", device="cuda:0")
gl.load_data("./data/chatgpt_finetune_faq.json").tune()

# 推理
import hcgf
gl = hcgf.GlmLora("THUDM/chatglm-6b", device="cuda:0", infer_mode=True)
gl.load_pretrained("/path/to/lora_pt").eval()
gl.chat("你是谁?")

这是微调和推理代码,有办法在只调用一次GlmLora的情况下(或者尽可能复用显存),将微调和推理代码合并起来,先微调,然后直接开始推理微调后的结果,然后又能继续微调,而不是反复重复加载模型呢

我试过并不成功,似乎存在half和float冲突,不知道要怎么完美融合微调和推理

微调后保存模型?

# 微调
import hcgf
gl = hcgf.GlmLora("THUDM/chatglm-6b", device="cuda:0")
gl.load_data("./data/chatgpt_finetune_faq.json").tune()

没有看见怎么设置保存路径呢...

如何扩大 trainable param?

每一次微调,lora.pt总是14MB,总感觉会有瓶颈?它的参数体积是样本数量决定的吗?还是需要设定其他参数

微调后推理报错 Expected 4-dimensional input...

(glm-finetune) user@calculator:~/git/hcgf/test2$ python3 infer.py
Loading tokenizer and model of THUDM/chatglm-6b
Loading checkpoint shards:  38%|██████████▉                  | 3/8 [00:02<00:03,  1.43it/s]Loading checkpoint shards:  50%|██████████████▌              | 4/8 [00:02<00:02,  1.43it/s]Loading checkpoint shards: 100%|█████████████████████████████| 8/8 [00:05<00:00,  1.60it/s]
Processing peft model
/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/peft/tuners/lora.py:173: UserWarning: fan_in_fan_out is set to True but the target module is not a Conv1D. Setting fan_in_fan_out to False.
  warnings.warn(
trainable params: 3670016 || all params: 6258876416
trainable%: 0.05863697820615348
Traceback (most recent call last):
  File "/home/user/git/hcgf/test2/infer.py", line 4, in <module>
    gl.load_pretrained("output/ckpt/lora-ckpt-last-52.pt").eval()
  File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/hcgf/sft/lora_ft.py", line 148, in eval
    self.model.to(self.device).eval()
  File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1930, in eval
    return self.train(False)
  File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1911, in train
    module.train(mode)
  File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1911, in train
    module.train(mode)
  File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1911, in train
    module.train(mode)
  [Previous line repeated 4 more times]
  File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/peft/tuners/lora.py", line 417, in train
    delta_w = F.conv1d(
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [8192, 8, 1, 1], but got 3-dimensional input of size [1, 16, 4096] instead

推理代码

# 推理
import hcgf
gl = hcgf.GlmLora("THUDM/chatglm-6b", device="cuda:0", infer_mode=True)
gl.load_pretrained("output/ckpt/lora-ckpt-last-52.pt").eval()
gl.chat("你是谁?")

调了个寂寞

用 tests/test_data/test_data.json 调了一下,还是谁都不认识

如果数据集中含有换行符的话会报错 JSONDecodeError

如果数据集中含有换行符的话会报错
Traceback (most recent call last):
File "", line 1, in
File "/mnt/data/dev/hcgf/hcgf/hcgf/sft/lora_ft.py", line 130, in load_data
self.dataloader = GlmDataLoader(data_path, self.tokenizer, max_seq_len)
File "/mnt/data/dev/hcgf/hcgf/hcgf/dataloader/data_loader.py", line 28, in init
self.data = self._read_files(data_path)
File "/mnt/data/dev/hcgf/hcgf/hcgf/dataloader/data_loader.py", line 34, in _read_files
js = json.loads(line.text.strip())
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 63 (char 62)

解决方法:修改hcgf/dataloader/data_loader.py 34行的 js = json.loads(line.text.strip())为 js = json.loads(line.text.strip(), strict=False)

chat添加stop参数后出现 slice() cannot be applied to a 0-dim tensor.

File "/home/user/git/hcgf/test2/./webui.py", line 97, in predict
    response, history = gl.chat(inp=input, history=history, max_len=max_length,stop=['<eop>'])
  File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/hcgf/sft/lora_ft.py", line 250, in chat
    custom_stop_tensor_list = create_token_tensor_list(
  File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/hcgf/utils/utils.py", line 60, in create_token_tensor_list
    tids = tokenizer(
IndexError: slice() cannot be applied to a 0-dim tensor.

无法导入transformers.generation,该如何解决?

import hcgf
Traceback (most recent call last):
File "", line 1, in
File "/home/work/hcgf/hcgf/init.py", line 1, in
from .sft import GlmLora
File "/home/work/hcgf/hcgf/sft/init.py", line 1, in
from .lora_ft import GlmLora
File "/home/work/hcgf/hcgf/sft/lora_ft.py", line 13, in
from .chatglm import ChatGLMForConditionalGeneration, ChatGLMTokenizer
File "/home/work/hcgf/hcgf/sft/chatglm/init.py", line 1, in
from .modeling_chatglm import ChatGLMForConditionalGeneration
File "/home/work/hcgf/hcgf/sft/chatglm/modeling_chatglm.py", line 30, in
from transformers.generation.logits_process import LogitsProcessor
ModuleNotFoundError: No module named 'transformers.generation'

无法导入transformers.generation 该如何解决,谢谢!

【Notice】v0.2.0发布

这个版本时间有点久,本来想做一下DDP就完事,后来觉得干脆一次到位把FSDP实现了一下。期间考虑用accelerate或deepspeed,也使用过上述框架,但后来想想觉得还是自己写吧,可控一些。

其实一直不太想给太多配置和选择,工业用最好能直接了当,选最好或相对比较好的就行了。所以,这个版本做了非常多的修改,往往今天想好的设计,写了一半又推倒重来。即便现在,依然很不满意,而且,其实也还没有非常充分的测试……不过实在不好再拖。

本次主要更新了分布式微调(训练),使用时需要用命令行,具体可参考README。其他部分与之前版本兼容(部分参数有微调)。

关于Lora(或类似的微调方式)得补充说明几句,它和正常的全量调整参数不同,本质上是一种对已有模型的「干扰」方案,现在已经有一些实验证明它会遗忘。而且在多轮对话上效果怎样,也没有权威的评测。不过,我们也一直在探索新的(或之前已有的当时由于不满足条件没被重视)微调方式。
微调的意义在于,我们想让其在能力不下降的同时,记住给的新领域知识。这其实不是更新已有知识,而是增加新的领域知识。当然,也可以让其根据新领域知识更新已有知识。这是两种不同的做法,行业处于探索阶段。
实际上,目前更多/实用的用法是Embedding召回+LLM辅助生成的方案,我在Hugging-LLM教程中有提及。总而言之,应首先明确自己的需求。

近期的Issue都没看,主要是gmail居然没有提醒(之前都是有的),每天看邮件没有看到提示所以就没注意到新的回复和Issue。很抱歉未能及时回复。

如何实现多轮对话微调训练?

gl.chat支持多轮对话,但是提供的训练样例都是单轮对话,请教一下多轮对话数据集如何设置呢?直接在"prompt"加上历史对话,可行么?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.