GithubHelp home page GithubHelp logo

hscspring / hcgf Goto Github PK

View Code? Open in Web Editor NEW
202.0 7.0 21.0 295 KB

Humanable Chat Generative-model Fine-tuning | LLM微调

License: Apache License 2.0

Makefile 0.23% Python 84.71% Jupyter Notebook 14.87% Shell 0.18%
chatglm chatgpt fine-tuning chatglm2 large-language-models llama llm

hcgf's Issues

无法导入transformers.generation,该如何解决?

import hcgf
Traceback (most recent call last):
File "", line 1, in
File "/home/work/hcgf/hcgf/init.py", line 1, in
from .sft import GlmLora
File "/home/work/hcgf/hcgf/sft/init.py", line 1, in
from .lora_ft import GlmLora
File "/home/work/hcgf/hcgf/sft/lora_ft.py", line 13, in
from .chatglm import ChatGLMForConditionalGeneration, ChatGLMTokenizer
File "/home/work/hcgf/hcgf/sft/chatglm/init.py", line 1, in
from .modeling_chatglm import ChatGLMForConditionalGeneration
File "/home/work/hcgf/hcgf/sft/chatglm/modeling_chatglm.py", line 30, in
from transformers.generation.logits_process import LogitsProcessor
ModuleNotFoundError: No module named 'transformers.generation'

无法导入transformers.generation 该如何解决,谢谢!

无法直接eval模型

gl = hcgf.GlmLora("THUDM/chatglm-6b", device="cuda:0")
gl.load_pretrained("lora.pt")
gl.eval()

如果没有gl.load_pretrained("lora.pt")这句话,就会

Loading tokenizer THUDM/chatglm-6b
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Switch to inference mode...
Traceback (most recent call last):
  File "/home/user/git/hcgf/test2/webui.py", line 12, in <module>
    gl.eval()
  File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/hcgf/sft/ft.py", line 279, in eval
    self.model.half()
AttributeError: 'GlmLora' object has no attribute 'model'. Did you mean: 'mode'?

如果数据集中含有换行符的话会报错 JSONDecodeError

如果数据集中含有换行符的话会报错
Traceback (most recent call last):
File "", line 1, in
File "/mnt/data/dev/hcgf/hcgf/hcgf/sft/lora_ft.py", line 130, in load_data
self.dataloader = GlmDataLoader(data_path, self.tokenizer, max_seq_len)
File "/mnt/data/dev/hcgf/hcgf/hcgf/dataloader/data_loader.py", line 28, in init
self.data = self._read_files(data_path)
File "/mnt/data/dev/hcgf/hcgf/hcgf/dataloader/data_loader.py", line 34, in _read_files
js = json.loads(line.text.strip())
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 63 (char 62)

解决方法:修改hcgf/dataloader/data_loader.py 34行的 js = json.loads(line.text.strip())为 js = json.loads(line.text.strip(), strict=False)

调了个寂寞

用 tests/test_data/test_data.json 调了一下,还是谁都不认识

【Notice】v0.2.0发布

这个版本时间有点久,本来想做一下DDP就完事,后来觉得干脆一次到位把FSDP实现了一下。期间考虑用accelerate或deepspeed,也使用过上述框架,但后来想想觉得还是自己写吧,可控一些。

其实一直不太想给太多配置和选择,工业用最好能直接了当,选最好或相对比较好的就行了。所以,这个版本做了非常多的修改,往往今天想好的设计,写了一半又推倒重来。即便现在,依然很不满意,而且,其实也还没有非常充分的测试……不过实在不好再拖。

本次主要更新了分布式微调(训练),使用时需要用命令行,具体可参考README。其他部分与之前版本兼容(部分参数有微调)。

关于Lora(或类似的微调方式)得补充说明几句,它和正常的全量调整参数不同,本质上是一种对已有模型的「干扰」方案,现在已经有一些实验证明它会遗忘。而且在多轮对话上效果怎样,也没有权威的评测。不过,我们也一直在探索新的(或之前已有的当时由于不满足条件没被重视)微调方式。
微调的意义在于,我们想让其在能力不下降的同时,记住给的新领域知识。这其实不是更新已有知识,而是增加新的领域知识。当然,也可以让其根据新领域知识更新已有知识。这是两种不同的做法,行业处于探索阶段。
实际上,目前更多/实用的用法是Embedding召回+LLM辅助生成的方案,我在Hugging-LLM教程中有提及。总而言之,应首先明确自己的需求。

近期的Issue都没看,主要是gmail居然没有提醒(之前都是有的),每天看邮件没有看到提示所以就没注意到新的回复和Issue。很抱歉未能及时回复。

多卡场景下微调完进行推理的时候报错

gl.load_pretrained("lora-ckpt-last-1110.pt").eval()
Switch to inference mode...
gl.chat("你是谁?")
Traceback (most recent call last):
File "", line 1, in
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/hcgf-0.0.7-py3.10.egg/hcgf/sft/lora_ft.py", line 253, in chat
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 43, in generator_context
response = gen.send(None)
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/hcgf-0.0.7-py3.10.egg/hcgf/sft/lora_ft.py", line 225, in stream_chat
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 43, in generator_context
response = gen.send(None)
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/hcgf-0.0.7-py3.10.egg/hcgf/sft/chatglm/modeling_chatglm.py", line 1345, in stream_generate
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/hcgf-0.0.7-py3.10.egg/hcgf/sft/chatglm/modeling_chatglm.py", line 1148, in forward
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/hcgf-0.0.7-py3.10.egg/hcgf/sft/chatglm/modeling_chatglm.py", line 895, in forward
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 160, in forward
return F.embedding(
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper__index_select)

chat添加stop参数后出现 slice() cannot be applied to a 0-dim tensor.

File "/home/user/git/hcgf/test2/./webui.py", line 97, in predict
    response, history = gl.chat(inp=input, history=history, max_len=max_length,stop=['<eop>'])
  File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/hcgf/sft/lora_ft.py", line 250, in chat
    custom_stop_tensor_list = create_token_tensor_list(
  File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/hcgf/utils/utils.py", line 60, in create_token_tensor_list
    tids = tokenizer(
IndexError: slice() cannot be applied to a 0-dim tensor.

微调和推理不能兼得

# 微调
import hcgf
gl = hcgf.GlmLora("THUDM/chatglm-6b", device="cuda:0")
gl.load_data("./data/chatgpt_finetune_faq.json").tune()

# 推理
import hcgf
gl = hcgf.GlmLora("THUDM/chatglm-6b", device="cuda:0", infer_mode=True)
gl.load_pretrained("/path/to/lora_pt").eval()
gl.chat("你是谁?")

这是微调和推理代码,有办法在只调用一次GlmLora的情况下(或者尽可能复用显存),将微调和推理代码合并起来,先微调,然后直接开始推理微调后的结果,然后又能继续微调,而不是反复重复加载模型呢

我试过并不成功,似乎存在half和float冲突,不知道要怎么完美融合微调和推理

ModuleNotFoundError: No module named 'hcgf'

import hcgf
gl = hcgf.GlmLora("model", load_in_8bit=True, lora_r=32)

ModuleNotFoundError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_15056\644290863.py in
----> 1 import hcgf
2 gl = hcgf.GlmLora("model", load_in_8bit=True, lora_r=32)

ModuleNotFoundError: No module named 'hcgf'

gl.load_data("./tests/test_data.json").tune()

NameError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_15056\3044540026.py in
----> 1 gl.load_data("./tests/test_data.json").tune()

NameError: name 'gl' is not defined

如何实现多轮对话微调训练?

gl.chat支持多轮对话,但是提供的训练样例都是单轮对话,请教一下多轮对话数据集如何设置呢?直接在"prompt"加上历史对话,可行么?

微调后推理报错 Expected 4-dimensional input...

(glm-finetune) user@calculator:~/git/hcgf/test2$ python3 infer.py
Loading tokenizer and model of THUDM/chatglm-6b
Loading checkpoint shards:  38%|██████████▉                  | 3/8 [00:02<00:03,  1.43it/s]Loading checkpoint shards:  50%|██████████████▌              | 4/8 [00:02<00:02,  1.43it/s]Loading checkpoint shards: 100%|█████████████████████████████| 8/8 [00:05<00:00,  1.60it/s]
Processing peft model
/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/peft/tuners/lora.py:173: UserWarning: fan_in_fan_out is set to True but the target module is not a Conv1D. Setting fan_in_fan_out to False.
  warnings.warn(
trainable params: 3670016 || all params: 6258876416
trainable%: 0.05863697820615348
Traceback (most recent call last):
  File "/home/user/git/hcgf/test2/infer.py", line 4, in <module>
    gl.load_pretrained("output/ckpt/lora-ckpt-last-52.pt").eval()
  File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/hcgf/sft/lora_ft.py", line 148, in eval
    self.model.to(self.device).eval()
  File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1930, in eval
    return self.train(False)
  File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1911, in train
    module.train(mode)
  File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1911, in train
    module.train(mode)
  File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1911, in train
    module.train(mode)
  [Previous line repeated 4 more times]
  File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/peft/tuners/lora.py", line 417, in train
    delta_w = F.conv1d(
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [8192, 8, 1, 1], but got 3-dimensional input of size [1, 16, 4096] instead

推理代码

# 推理
import hcgf
gl = hcgf.GlmLora("THUDM/chatglm-6b", device="cuda:0", infer_mode=True)
gl.load_pretrained("output/ckpt/lora-ckpt-last-52.pt").eval()
gl.chat("你是谁?")

如何扩大 trainable param?

每一次微调,lora.pt总是14MB,总感觉会有瓶颈?它的参数体积是样本数量决定的吗?还是需要设定其他参数

微调后保存模型?

# 微调
import hcgf
gl = hcgf.GlmLora("THUDM/chatglm-6b", device="cuda:0")
gl.load_data("./data/chatgpt_finetune_faq.json").tune()

没有看见怎么设置保存路径呢...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.