hscspring / hcgf Goto Github PK
View Code? Open in Web Editor NEWHumanable Chat Generative-model Fine-tuning | LLM微调
License: Apache License 2.0
Humanable Chat Generative-model Fine-tuning | LLM微调
License: Apache License 2.0
新版的hcgf输出最多只有10个字符,是要在哪里设置输出长度吗?
import hcgf
Traceback (most recent call last):
File "", line 1, in
File "/home/work/hcgf/hcgf/init.py", line 1, in
from .sft import GlmLora
File "/home/work/hcgf/hcgf/sft/init.py", line 1, in
from .lora_ft import GlmLora
File "/home/work/hcgf/hcgf/sft/lora_ft.py", line 13, in
from .chatglm import ChatGLMForConditionalGeneration, ChatGLMTokenizer
File "/home/work/hcgf/hcgf/sft/chatglm/init.py", line 1, in
from .modeling_chatglm import ChatGLMForConditionalGeneration
File "/home/work/hcgf/hcgf/sft/chatglm/modeling_chatglm.py", line 30, in
from transformers.generation.logits_process import LogitsProcessor
ModuleNotFoundError: No module named 'transformers.generation'
无法导入transformers.generation 该如何解决,谢谢!
gl = hcgf.GlmLora("THUDM/chatglm-6b", device="cuda:0")
gl.load_pretrained("lora.pt")
gl.eval()
如果没有gl.load_pretrained("lora.pt")这句话,就会
Loading tokenizer THUDM/chatglm-6b
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Switch to inference mode...
Traceback (most recent call last):
File "/home/user/git/hcgf/test2/webui.py", line 12, in <module>
gl.eval()
File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/hcgf/sft/ft.py", line 279, in eval
self.model.half()
AttributeError: 'GlmLora' object has no attribute 'model'. Did you mean: 'mode'?
如果数据集中含有换行符的话会报错
Traceback (most recent call last):
File "", line 1, in
File "/mnt/data/dev/hcgf/hcgf/hcgf/sft/lora_ft.py", line 130, in load_data
self.dataloader = GlmDataLoader(data_path, self.tokenizer, max_seq_len)
File "/mnt/data/dev/hcgf/hcgf/hcgf/dataloader/data_loader.py", line 28, in init
self.data = self._read_files(data_path)
File "/mnt/data/dev/hcgf/hcgf/hcgf/dataloader/data_loader.py", line 34, in _read_files
js = json.loads(line.text.strip())
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 63 (char 62)
解决方法:修改hcgf/dataloader/data_loader.py 34行的 js = json.loads(line.text.strip())为 js = json.loads(line.text.strip(), strict=False)
import hcgf
gl = hcgf.GlmLora("THUDM/chatglm-6b", device="cuda:0", lora_r=32)
出错:UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 68: illegal multibyte sequence
我环境是windows
示例中chatgpt_finetune_faq.json找不到,请问这个文件从哪里可以下载,我想了解格式,自己微调一下。谢谢
用 tests/test_data/test_data.json 调了一下,还是谁都不认识
如题,开始用800条数据训练,显存只有10几个g,换成5万条数据之后,现在32g显存直接溢出了,这个是什么原因呢
这个版本时间有点久,本来想做一下DDP就完事,后来觉得干脆一次到位把FSDP实现了一下。期间考虑用accelerate或deepspeed,也使用过上述框架,但后来想想觉得还是自己写吧,可控一些。
其实一直不太想给太多配置和选择,工业用最好能直接了当,选最好或相对比较好的就行了。所以,这个版本做了非常多的修改,往往今天想好的设计,写了一半又推倒重来。即便现在,依然很不满意,而且,其实也还没有非常充分的测试……不过实在不好再拖。
本次主要更新了分布式微调(训练),使用时需要用命令行,具体可参考README。其他部分与之前版本兼容(部分参数有微调)。
关于Lora(或类似的微调方式)得补充说明几句,它和正常的全量调整参数不同,本质上是一种对已有模型的「干扰」方案,现在已经有一些实验证明它会遗忘。而且在多轮对话上效果怎样,也没有权威的评测。不过,我们也一直在探索新的(或之前已有的当时由于不满足条件没被重视)微调方式。
微调的意义在于,我们想让其在能力不下降的同时,记住给的新领域知识。这其实不是更新已有知识,而是增加新的领域知识。当然,也可以让其根据新领域知识更新已有知识。这是两种不同的做法,行业处于探索阶段。
实际上,目前更多/实用的用法是Embedding召回+LLM辅助生成的方案,我在Hugging-LLM教程中有提及。总而言之,应首先明确自己的需求。
近期的Issue都没看,主要是gmail居然没有提醒(之前都是有的),每天看邮件没有看到提示所以就没注意到新的回复和Issue。很抱歉未能及时回复。
微调过一次后再读取模型出现这个报错,帮忙看一下,谢谢
如题
gl.load_pretrained("lora-ckpt-last-1110.pt").eval()
Switch to inference mode...
gl.chat("你是谁?")
Traceback (most recent call last):
File "", line 1, in
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/hcgf-0.0.7-py3.10.egg/hcgf/sft/lora_ft.py", line 253, in chat
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 43, in generator_context
response = gen.send(None)
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/hcgf-0.0.7-py3.10.egg/hcgf/sft/lora_ft.py", line 225, in stream_chat
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 43, in generator_context
response = gen.send(None)
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/hcgf-0.0.7-py3.10.egg/hcgf/sft/chatglm/modeling_chatglm.py", line 1345, in stream_generate
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/hcgf-0.0.7-py3.10.egg/hcgf/sft/chatglm/modeling_chatglm.py", line 1148, in forward
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/hcgf-0.0.7-py3.10.egg/hcgf/sft/chatglm/modeling_chatglm.py", line 895, in forward
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 160, in forward
return F.embedding(
File "/mnt/data/dev/anaconda3/envs/pytorch1131/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper__index_select)
File "/home/user/git/hcgf/test2/./webui.py", line 97, in predict
response, history = gl.chat(inp=input, history=history, max_len=max_length,stop=['<eop>'])
File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/hcgf/sft/lora_ft.py", line 250, in chat
custom_stop_tensor_list = create_token_tensor_list(
File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/hcgf/utils/utils.py", line 60, in create_token_tensor_list
tids = tokenizer(
IndexError: slice() cannot be applied to a 0-dim tensor.
如题,有时生成结果有大段重复,可以通过加loss约束或者后处理方法来解决吗
# 微调
import hcgf
gl = hcgf.GlmLora("THUDM/chatglm-6b", device="cuda:0")
gl.load_data("./data/chatgpt_finetune_faq.json").tune()
# 推理
import hcgf
gl = hcgf.GlmLora("THUDM/chatglm-6b", device="cuda:0", infer_mode=True)
gl.load_pretrained("/path/to/lora_pt").eval()
gl.chat("你是谁?")
这是微调和推理代码,有办法在只调用一次GlmLora
的情况下(或者尽可能复用显存),将微调和推理代码合并起来,先微调,然后直接开始推理微调后的结果,然后又能继续微调,而不是反复重复加载模型呢
我试过并不成功,似乎存在half和float冲突,不知道要怎么完美融合微调和推理
ModuleNotFoundError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_15056\644290863.py in
----> 1 import hcgf
2 gl = hcgf.GlmLora("model", load_in_8bit=True, lora_r=32)
NameError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_15056\3044540026.py in
----> 1 gl.load_data("./tests/test_data.json").tune()
NameError: name 'gl' is not defined
新手求教,如何设置训练停止步数和保存微调模型?
gl.chat支持多轮对话,但是提供的训练样例都是单轮对话,请教一下多轮对话数据集如何设置呢?直接在"prompt"加上历史对话,可行么?
(glm-finetune) user@calculator:~/git/hcgf/test2$ python3 infer.py
Loading tokenizer and model of THUDM/chatglm-6b
Loading checkpoint shards: 38%|██████████▉ | 3/8 [00:02<00:03, 1.43it/s]Loading checkpoint shards: 50%|██████████████▌ | 4/8 [00:02<00:02, 1.43it/s]Loading checkpoint shards: 100%|█████████████████████████████| 8/8 [00:05<00:00, 1.60it/s]
Processing peft model
/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/peft/tuners/lora.py:173: UserWarning: fan_in_fan_out is set to True but the target module is not a Conv1D. Setting fan_in_fan_out to False.
warnings.warn(
trainable params: 3670016 || all params: 6258876416
trainable%: 0.05863697820615348
Traceback (most recent call last):
File "/home/user/git/hcgf/test2/infer.py", line 4, in <module>
gl.load_pretrained("output/ckpt/lora-ckpt-last-52.pt").eval()
File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/hcgf/sft/lora_ft.py", line 148, in eval
self.model.to(self.device).eval()
File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1930, in eval
return self.train(False)
File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1911, in train
module.train(mode)
File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1911, in train
module.train(mode)
File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1911, in train
module.train(mode)
[Previous line repeated 4 more times]
File "/home/user/anaconda3/envs/glm-finetune/lib/python3.10/site-packages/peft/tuners/lora.py", line 417, in train
delta_w = F.conv1d(
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [8192, 8, 1, 1], but got 3-dimensional input of size [1, 16, 4096] instead
推理代码
# 推理
import hcgf
gl = hcgf.GlmLora("THUDM/chatglm-6b", device="cuda:0", infer_mode=True)
gl.load_pretrained("output/ckpt/lora-ckpt-last-52.pt").eval()
gl.chat("你是谁?")
每一次微调,lora.pt
总是14MB,总感觉会有瓶颈?它的参数体积是样本数量决定的吗?还是需要设定其他参数
# 微调
import hcgf
gl = hcgf.GlmLora("THUDM/chatglm-6b", device="cuda:0")
gl.load_data("./data/chatgpt_finetune_faq.json").tune()
没有看见怎么设置保存路径呢...
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.