xxw1995 / chatglm3-finetune Goto Github PK
View Code? Open in Web Editor NEW最容易上手的0门槛 chatglm3 & agent & langchain 项目
最容易上手的0门槛 chatglm3 & agent & langchain 项目
请问博主是吧bge-large-zh模型剥去了吗?直接跑agent不能用。
单卡速度还是太慢了
不存在对应的路径。
在跑agent的代码时,需要用到data/npc_data.csv文件,项目里面并没有啊
请问博主有没有在推理时infer.py遇到过这个问题
Traceback (most recent call last):
File "infer.py", line 48, in
out = model.generate(
File "/data/Wangkh/anaconda3/envs/langchain/lib/python3.8/site-packages/peft/peft_model.py", line 1130, in generate
outputs = self.base_model.generate(**kwargs)
File "/data/Wangkh/anaconda3/envs/langchain/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/data/Wangkh/anaconda3/envs/langchain/lib/python3.8/site-packages/transformers/generation/utils.py", line 1572, in generate
return self.sample(
File "/data/Wangkh/anaconda3/envs/langchain/lib/python3.8/site-packages/transformers/generation/utils.py", line 2655, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either inf
, nan
or element < 0
when running finetune error shows:
Traceback (most recent call last):
File "D:\dev\chatglm3-finetune\finetune.py", line 8, in
from model.modeling_chatglm import ChatGLMForConditionalGeneration
ModuleNotFoundError: No module named 'model'
the first line in finetune.py is
from model.modeling_chatglm import ChatGLMForConditionalGeneration
this should be chatglm model???
请问您有遇到过lora训chatglm3时存下来的checkpoint里是一个12G的pytorch_model.bin而不是几十M的adapter_model.bin的情况吗
输出都是很短的片段,请问是哪里出了问题?
预处理的时候我已经将max_seq_length加大到了2500,可是输出基本都不会超过50个汉字。
python tokenize_dataset_rows.py --jsonl_path ./alpaca_data.jsonl --save_path ./alpaca --max_seq_length 2500
2023-11-03 20:10:25,978 - WARNING - Loading data...
Traceback (most recent call last):
File "D:\test\chatglm3-base-tuning-master\train.py", line 52, in
trainer.train()
File "D:\test\chatglm3-base-tuning-master\trainer.py", line 19, in train
self.data_module = ChatDataModule(
^^^^^^^^^^^^^^^
File "D:\test\chatglm3-base-tuning-master\chat_data_module.py", line 75, in init
self.train_dataset = ChatDataset(tokenizer=tokenizer, data_path=data_path_train, max_tokens=max_tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\test\chatglm3-base-tuning-master\chat_data_module.py", line 37, in init
conversations = jload(data_path)
^^^^^^^^^^^^^^^^
File "D:\test\chatglm3-base-tuning-master\chat_data_module.py", line 28, in jload
jdict = json.load(f)
^^^^^^^^^^^^
File "D:\test\chatglm3-base-tuning-master\env\Lib\json_init_.py", line 293, in load
return loads(fp.read(),
^^^^^^^^^
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa4 in position 64: illegal multibyte sequence
使用的formatted_samples.json
This is a great repository which provide finetune feature of ChatGLM3. But when I followed the process in the README to run tokenize_dataset_rows.py scripts it reported these errors:
`python tokenize_dataset_rows.py --jsonl_path ./alpaca_data.jsonl --save_path ./alpaca --max_seq_length 200
Downloading and preparing dataset generator/default to C:/Users/yt758/.cache/huggingface/datasets/generator/default-10116cbfdb8a1e8b/0.0.0...
HF google storage unreachable. Downloading and preparing it from source
Generating train split: 0 examples [00:00, ? examples/s]'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /model/resolve/main/tokenizer_config.json (Caused by ProxyError('Unable to connect to proxy', SSLError(SSLZeroReturnError(6, 'TLS/SSL connection has been closed (EOF) (_ssl.c:1131)'))))"), '(Request ID: 09963958-bc38-4941-bfac-92e4491eae09)')' thrown while requesting HEAD https://huggingface.co/model/resolve/main/tokenizer_config.json
Generating train split: 0 examples [00:02, ? examples/s]urllib3.exceptions.SSLError: TLS/SSL connection has been closed (EOF) (_ssl.c:1131)
The above exception was the direct cause of the following exception:
urllib3.exceptions.ProxyError: ('Unable to connect to proxy', SSLError(SSLZeroReturnError(6, 'TLS/SSL connection has been closed (EOF) (_ssl.c:1131)')))
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\software\anaconda3\envs\t001\lib\site-packages\requests\adapters.py", line 486, in send
resp = conn.urlopen(
File "C:\software\anaconda3\envs\t001\lib\site-packages\urllib3\connectionpool.py", line 845, in urlopen
retries = retries.increment(
File "C:\software\anaconda3\envs\t001\lib\site-packages\urllib3\util\retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /model/resolve/main/tokenizer_config.json (Caused by ProxyError('Unable to connect to proxy', SSLError(SSLZeroReturnError(6, 'TLS/SSL connection has been closed (EOF) (_ssl.c:1131)'))))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\software\anaconda3\envs\t001\lib\site-packages\datasets\builder.py", line 1608, in _prepare_split_single
for key, record in generator:
File "C:\software\anaconda3\envs\t001\lib\site-packages\datasets\packaged_modules\generator\generator.py", line 30, in _generate_examples
for idx, ex in enumerate(self.config.generator(**gen_kwargs)):
File "tokenize_dataset_rows.py", line 21, in read_jsonl
tokenizer = transformers.AutoTokenizer.from_pretrained(
File "C:\software\anaconda3\envs\t001\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 643, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File "C:\software\anaconda3\envs\t001\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 487, in get_tokenizer_config
resolved_config_file = cached_file(
File "C:\software\anaconda3\envs\t001\lib\site-packages\transformers\utils\hub.py", line 417, in cached_file
resolved_file = hf_hub_download(
File "C:\software\anaconda3\envs\t001\lib\site-packages\huggingface_hub\utils_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "C:\software\anaconda3\envs\t001\lib\site-packages\huggingface_hub\file_download.py", line 1233, in hf_hub_download
metadata = get_hf_file_metadata(
File "C:\software\anaconda3\envs\t001\lib\site-packages\huggingface_hub\utils_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "C:\software\anaconda3\envs\t001\lib\site-packages\huggingface_hub\file_download.py", line 1613, in get_hf_file_metadata
r = _request_wrapper(
File "C:\software\anaconda3\envs\t001\lib\site-packages\huggingface_hub\file_download.py", line 418, in _request_wrapper
response = _request_wrapper(
File "C:\software\anaconda3\envs\t001\lib\site-packages\huggingface_hub\file_download.py", line 453, in _request_wrapper
return http_backoff(
File "C:\software\anaconda3\envs\t001\lib\site-packages\huggingface_hub\utils_http.py", line 274, in http_backoff
raise err
File "C:\software\anaconda3\envs\t001\lib\site-packages\huggingface_hub\utils_http.py", line 258, in http_backoff
response = session.request(method=method, url=url, **kwargs)
File "C:\software\anaconda3\envs\t001\lib\site-packages\requests\sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "C:\software\anaconda3\envs\t001\lib\site-packages\requests\sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "C:\software\anaconda3\envs\t001\lib\site-packages\huggingface_hub\utils_http.py", line 63, in send
return super().send(request, *args, **kwargs)
File "C:\software\anaconda3\envs\t001\lib\site-packages\requests\adapters.py", line 513, in send
raise ProxyError(e, request=request)
requests.exceptions.ProxyError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /model/resolve/main/tokenizer_config.json (Caused by ProxyError('Unable to connect to proxy', SSLError(SSLZeroReturnError(6, 'TLS/SSL connection has been closed (EOF) (_ssl.c:1131)'))))"), '(Request ID: 09963958-bc38-4941-bfac-92e4491eae09)')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "tokenize_dataset_rows.py", line 48, in
main()
File "tokenize_dataset_rows.py", line 42, in main
dataset = datasets.Dataset.from_generator(
File "C:\software\anaconda3\envs\t001\lib\site-packages\datasets\arrow_dataset.py", line 1012, in from_generator
return GeneratorDatasetInputStream(
File "C:\software\anaconda3\envs\t001\lib\site-packages\datasets\io\generator.py", line 47, in read
self.builder.download_and_prepare(
File "C:\software\anaconda3\envs\t001\lib\site-packages\datasets\builder.py", line 872, in download_and_prepare
self._download_and_prepare(
File "C:\software\anaconda3\envs\t001\lib\site-packages\datasets\builder.py", line 1649, in _download_and_prepare
super()._download_and_prepare(
File "C:\software\anaconda3\envs\t001\lib\site-packages\datasets\builder.py", line 967, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "C:\software\anaconda3\envs\t001\lib\site-packages\datasets\builder.py", line 1488, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "C:\software\anaconda3\envs\t001\lib\site-packages\datasets\builder.py", line 1644, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset`
Traceback (most recent call last):
File "finetune.py", line 70, in
main()
File "finetune.py", line 55, in main
model = get_peft_model(model, peft_config).to("cuda")
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 989, in to
return self._apply(convert)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
[Previous line repeated 5 more times]
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 664, in _apply
param_applied = fn(param)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 987, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 72.00 MiB (GPU 0; 22.03 GiB total capacity; 20.75 GiB already allocated; 56.88 MiB free; 21.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
(chatglm3-finetune) root@g101:/data/ChatGLM3/chatglm3-finetune# python finetune.py --dataset_path ./alpaca --lora_rank 4 --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --max_steps 52000 --save_steps 1000 --save_total_limit 20 --learning_rate 1e-4 --remove_unused_columns false --logging_steps 50 --output_dir output
The argument trust_remote_code
is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 100%|████████████████████████████| 7/7 [00:08<00:00, 1.22s/it]
Traceback (most recent call last):
File "/data/ChatGLM3/chatglm3-finetune/finetune.py", line 70, in
main()
File "/data/ChatGLM3/chatglm3-finetune/finetune.py", line 55, in main
model = get_peft_model(model, peft_config).to("cuda:1")
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 989, in to
return self._apply(convert)
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 664, in _apply
param_applied = fn(param)
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 987, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1016.00 MiB (GPU 1; 23.69 GiB total capacity; 22.27 GiB already allocated; 691.69 MiB free; 22.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
在trainer中添加了eval_dataset,写了compute_metric函数来计算eval中的一些指标,比如funtion calling的precision/recall和回复文本的bleu score。
遇到问题,evaluate时内存暴增,本来训练时10+GB显存占用,到了eval时突然增到60GB+,最终增到OOM
请问你有遇到过类似的情况吗?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.