Comments (2)
你用分automodel方式加载的glm模型,模型里面这个moel_chatglm.py,你把这个文件放进去试试
from chatglm_lora_multi-gpu.
谢谢指导:)
我不太明白什么是“模型里面这个moel_chatglm.py,你把这个文件放进去试试”是把modeling_chatglm.py 跟在参数后面的意思吗,可这样也会报错
抱歉我有些新手
运行过程如下:
(GLM) wxd7@wxd7-EG341W-G21:~/glm/Chatglm_lora_multi-gpu-main$ torchrun --nproc_per_node=2 multi_gpu_fintune_belle.py --dataset_path /home/wxd7/glm/ChatGLM-Tuning-master/data/alpaca --lora_rank 8 --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --save_steps 2000 --save_total_limit 2 --learning_rate 2e-5 --fp16 --num_train_epochs 2 --remove_unused_columns false --logging_steps 50 --report_to wandb --output_dir output --deepspeed ds_config_zero3.json modeling_chatglm.py
WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
CUDA SETUP: CUDA runtime path found: /home/wxd7/anaconda3/envs/GLM/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: CUDA runtime path found: /home/wxd7/anaconda3/envs/GLM/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 113
/home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...
/home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...
Explicitly passing a revision
is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision
is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Traceback (most recent call last):
File "/home/wxd7/glm/Chatglm_lora_multi-gpu-main/multi_gpu_fintune_belle.py", line 360, in
main()
File "/home/wxd7/glm/Chatglm_lora_multi-gpu-main/multi_gpu_fintune_belle.py", line 211, in main
).parse_args_into_dataclasses()
File "/home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/transformers/hf_argparser.py", line 341, in parse_args_into_dataclasses
raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}")
ValueError: Some specified arguments are not used by the HfArgumentParser: ['--modeling_chatglm.py']
Traceback (most recent call last):
File "/home/wxd7/glm/Chatglm_lora_multi-gpu-main/multi_gpu_fintune_belle.py", line 360, in
main()
File "/home/wxd7/glm/Chatglm_lora_multi-gpu-main/multi_gpu_fintune_belle.py", line 211, in main
).parse_args_into_dataclasses()
File "/home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/transformers/hf_argparser.py", line 341, in parse_args_into_dataclasses
raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}")
ValueError: Some specified arguments are not used by the HfArgumentParser: ['--modeling_chatglm.py']
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 34807) of binary: /home/wxd7/anaconda3/envs/GLM/bin/python
Traceback (most recent call last):
File "/home/wxd7/anaconda3/envs/GLM/bin/torchrun", line 8, in
sys.exit(main())
File "/home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
multi_gpu_fintune_belle.py FAILED
Failures:
[1]:
time : 2023-04-13_12:11:39
host : wxd7-EG341W-G21
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 34808)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Root Cause (first observed failure):
[0]:
time : 2023-04-13_12:11:39
host : wxd7-EG341W-G21
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 34807)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
from chatglm_lora_multi-gpu.
Related Issues (20)
- 如何训练自己数据集 HOT 1
- 显存占用问题 HOT 2
- 你的README.md与Chatglm_lora_multi-gpu/data HOT 1
- 运行web_ui.py,报错:NameError: name 'LoraConfig' is not defined HOT 1
- 运行 web_feadback.py 报错 HOT 6
- deepspeed和lora HOT 4
- 模型是否存在信息泄露 HOT 4
- alps import error HOT 1
- 多卡并行问题 HOT 1
- chatglm用deepspeed多卡推理问题 HOT 1
- chatglm做图应用怎么使用 HOT 1
- 多卡deepspeed模式 HOT 2
- Deepspeed并未生效 HOT 1
- 推理问题 HOT 17
- langchain版本是多少 HOT 1
- huggingface_hub.utils._validators.HFValidationError HOT 3
- 解析相应报错 HOT 3
- 看下我的搜索结果对不对
- clip retrieval 成功运行app.py,但是不显示streamlit界面 HOT 2
- Chatglm_lora_multi-gpu/APP_example/real_time_draw/realtime_draw_01.py HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chatglm_lora_multi-gpu.