GithubHelp home page GithubHelp logo

chatglm3-finetune's Issues



RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Traceback (most recent call last):
File "", line 48, in
out = model.generate(
File "/data/Wangkh/anaconda3/envs/langchain/lib/python3.8/site-packages/peft/", line 1130, in generate
outputs = self.base_model.generate(**kwargs)
File "/data/Wangkh/anaconda3/envs/langchain/lib/python3.8/site-packages/torch/utils/", line 115, in decorate_context
return func(*args, **kwargs)
File "/data/Wangkh/anaconda3/envs/langchain/lib/python3.8/site-packages/transformers/generation/", line 1572, in generate
return self.sample(
File "/data/Wangkh/anaconda3/envs/langchain/lib/python3.8/site-packages/transformers/generation/", line 2655, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either inf, nan or element < 0

No module named 'model'

when running finetune error shows:

Traceback (most recent call last):
File "D:\dev\chatglm3-finetune\", line 8, in
from model.modeling_chatglm import ChatGLMForConditionalGeneration
ModuleNotFoundError: No module named 'model'

the first line in is
from model.modeling_chatglm import ChatGLMForConditionalGeneration

this should be chatglm model???




python --jsonl_path ./alpaca_data.jsonl --save_path ./alpaca --max_seq_length 2500

UnicodeDecodeError: 'gbk' codec can't decode byte 0xa4 in position 64: illegal multibyte sequence

2023-11-03 20:10:25,978 - WARNING - Loading data...
Traceback (most recent call last):
File "D:\test\chatglm3-base-tuning-master\", line 52, in
File "D:\test\chatglm3-base-tuning-master\", line 19, in train
self.data_module = ChatDataModule(
File "D:\test\chatglm3-base-tuning-master\", line 75, in init
self.train_dataset = ChatDataset(tokenizer=tokenizer, data_path=data_path_train, max_tokens=max_tokens)
File "D:\test\chatglm3-base-tuning-master\", line 37, in init
conversations = jload(data_path)
File "D:\test\chatglm3-base-tuning-master\", line 28, in jload
jdict = json.load(f)
File "D:\test\chatglm3-base-tuning-master\env\Lib\", line 293, in load
return loads(,
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa4 in position 64: illegal multibyte sequence


Something wrong in the data preprocess

This is a great repository which provide finetune feature of ChatGLM3. But when I followed the process in the README to run scripts it reported these errors:

`python --jsonl_path ./alpaca_data.jsonl --save_path ./alpaca --max_seq_length 200
Downloading and preparing dataset generator/default to C:/Users/yt758/.cache/huggingface/datasets/generator/default-10116cbfdb8a1e8b/0.0.0...
HF google storage unreachable. Downloading and preparing it from source
Generating train split: 0 examples [00:00, ? examples/s]'(MaxRetryError("HTTPSConnectionPool(host='', port=443): Max retries exceeded with url: /model/resolve/main/tokenizer_config.json (Caused by ProxyError('Unable to connect to proxy', SSLError(SSLZeroReturnError(6, 'TLS/SSL connection has been closed (EOF) (_ssl.c:1131)'))))"), '(Request ID: 09963958-bc38-4941-bfac-92e4491eae09)')' thrown while requesting HEAD
Generating train split: 0 examples [00:02, ? examples/s]urllib3.exceptions.SSLError: TLS/SSL connection has been closed (EOF) (_ssl.c:1131)

The above exception was the direct cause of the following exception:

urllib3.exceptions.ProxyError: ('Unable to connect to proxy', SSLError(SSLZeroReturnError(6, 'TLS/SSL connection has been closed (EOF) (_ssl.c:1131)')))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\software\anaconda3\envs\t001\lib\site-packages\requests\", line 486, in send
resp = conn.urlopen(
File "C:\software\anaconda3\envs\t001\lib\site-packages\urllib3\", line 845, in urlopen
retries = retries.increment(
File "C:\software\anaconda3\envs\t001\lib\site-packages\urllib3\util\", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='', port=443): Max retries exceeded with url: /model/resolve/main/tokenizer_config.json (Caused by ProxyError('Unable to connect to proxy', SSLError(SSLZeroReturnError(6, 'TLS/SSL connection has been closed (EOF) (_ssl.c:1131)'))))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\software\anaconda3\envs\t001\lib\site-packages\datasets\", line 1608, in _prepare_split_single
for key, record in generator:
File "C:\software\anaconda3\envs\t001\lib\site-packages\datasets\packaged_modules\generator\", line 30, in _generate_examples
for idx, ex in enumerate(self.config.generator(**gen_kwargs)):
File "", line 21, in read_jsonl
tokenizer = transformers.AutoTokenizer.from_pretrained(
File "C:\software\anaconda3\envs\t001\lib\site-packages\transformers\models\auto\", line 643, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File "C:\software\anaconda3\envs\t001\lib\site-packages\transformers\models\auto\", line 487, in get_tokenizer_config
resolved_config_file = cached_file(
File "C:\software\anaconda3\envs\t001\lib\site-packages\transformers\utils\", line 417, in cached_file
resolved_file = hf_hub_download(
File "C:\software\anaconda3\envs\t001\lib\site-packages\huggingface_hub\", line 118, in _inner_fn
return fn(*args, **kwargs)
File "C:\software\anaconda3\envs\t001\lib\site-packages\huggingface_hub\", line 1233, in hf_hub_download
metadata = get_hf_file_metadata(
File "C:\software\anaconda3\envs\t001\lib\site-packages\huggingface_hub\", line 118, in _inner_fn
return fn(*args, **kwargs)
File "C:\software\anaconda3\envs\t001\lib\site-packages\huggingface_hub\", line 1613, in get_hf_file_metadata
r = _request_wrapper(
File "C:\software\anaconda3\envs\t001\lib\site-packages\huggingface_hub\", line 418, in _request_wrapper
response = _request_wrapper(
File "C:\software\anaconda3\envs\t001\lib\site-packages\huggingface_hub\", line 453, in _request_wrapper
return http_backoff(
File "C:\software\anaconda3\envs\t001\lib\site-packages\huggingface_hub\", line 274, in http_backoff
raise err
File "C:\software\anaconda3\envs\t001\lib\site-packages\huggingface_hub\", line 258, in http_backoff
response = session.request(method=method, url=url, **kwargs)
File "C:\software\anaconda3\envs\t001\lib\site-packages\requests\", line 589, in request
resp = self.send(prep, **send_kwargs)
File "C:\software\anaconda3\envs\t001\lib\site-packages\requests\", line 703, in send
r = adapter.send(request, **kwargs)
File "C:\software\anaconda3\envs\t001\lib\site-packages\huggingface_hub\", line 63, in send
return super().send(request, *args, **kwargs)
File "C:\software\anaconda3\envs\t001\lib\site-packages\requests\", line 513, in send
raise ProxyError(e, request=request)
requests.exceptions.ProxyError: (MaxRetryError("HTTPSConnectionPool(host='', port=443): Max retries exceeded with url: /model/resolve/main/tokenizer_config.json (Caused by ProxyError('Unable to connect to proxy', SSLError(SSLZeroReturnError(6, 'TLS/SSL connection has been closed (EOF) (_ssl.c:1131)'))))"), '(Request ID: 09963958-bc38-4941-bfac-92e4491eae09)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "", line 48, in
File "", line 42, in main
dataset = datasets.Dataset.from_generator(
File "C:\software\anaconda3\envs\t001\lib\site-packages\datasets\", line 1012, in from_generator
return GeneratorDatasetInputStream(
File "C:\software\anaconda3\envs\t001\lib\site-packages\datasets\io\", line 47, in read
File "C:\software\anaconda3\envs\t001\lib\site-packages\datasets\", line 872, in download_and_prepare
File "C:\software\anaconda3\envs\t001\lib\site-packages\datasets\", line 1649, in _download_and_prepare
File "C:\software\anaconda3\envs\t001\lib\site-packages\datasets\", line 967, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "C:\software\anaconda3\envs\t001\lib\site-packages\datasets\", line 1488, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "C:\software\anaconda3\envs\t001\lib\site-packages\datasets\", line 1644, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset`

It is recommended to support multiple GPU cards

Traceback (most recent call last):
File "", line 70, in
File "", line 55, in main
model = get_peft_model(model, peft_config).to("cuda")
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/", line 989, in to
return self._apply(convert)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/", line 641, in _apply
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/", line 641, in _apply
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/", line 641, in _apply
[Previous line repeated 5 more times]
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/", line 664, in _apply
param_applied = fn(param)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/", line 987, in convert
return, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 72.00 MiB (GPU 0; 22.03 GiB total capacity; 20.75 GiB already allocated; 56.88 MiB free; 21.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

FineTune CUDA out of memory

(chatglm3-finetune) root@g101:/data/ChatGLM3/chatglm3-finetune# python --dataset_path ./alpaca --lora_rank 4 --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --max_steps 52000 --save_steps 1000 --save_total_limit 20 --learning_rate 1e-4 --remove_unused_columns false --logging_steps 50 --output_dir output
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 100%|████████████████████████████| 7/7 [00:08<00:00, 1.22s/it]
Traceback (most recent call last):
File "/data/ChatGLM3/chatglm3-finetune/", line 70, in
File "/data/ChatGLM3/chatglm3-finetune/", line 55, in main
model = get_peft_model(model, peft_config).to("cuda:1")
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/", line 989, in to
return self._apply(convert)
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/", line 641, in _apply
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/", line 641, in _apply
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/", line 641, in _apply
[Previous line repeated 1 more time]
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/", line 664, in _apply
param_applied = fn(param)
File "/root/miniconda3/envs/chatglm3-finetune/lib/python3.10/site-packages/torch/nn/modules/", line 987, in convert
return, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1016.00 MiB (GPU 1; 23.69 GiB total capacity; 22.27 GiB already allocated; 691.69 MiB free; 22.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF


在trainer中添加了eval_dataset,写了compute_metric函数来计算eval中的一些指标,比如funtion calling的precision/recall和回复文本的bleu score。



Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.