thu-ml / low-bit-optimizers Goto Github PK
View Code? Open in Web Editor NEWLow-bit optimizers for PyTorch
License: Apache License 2.0
Low-bit optimizers for PyTorch
License: Apache License 2.0
File "sft_low_bit.py", line 869, in train
train_result = trainer.train()
File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 1591, in train
return inner_training_loop(
File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 1971, in _inner_training_loop
self.optimizer.step()
File "/root/miniconda3/lib/python3.8/site-packages/accelerate/optimizer.py", line 145, in step
self.optimizer.step(closure)
File "/root/miniconda3/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 69, in wrapper
return wrapped(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/optim/optimizer.py", line 280, in wrapper
out = func(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/lpmm-0.0.0-py3.8-linux-x86_64.egg/lpmm/optim/adamw.py", line 230, in step
_single_tensor_adamw4bit(**kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/lpmm-0.0.0-py3.8-linux-x86_64.egg/lpmm/optim/adamw.py", line 426, in _single_tensor_adamw4bit
qx, gen = vectorwise_quant(exp_avg, qmap=exp_avgs_qmap[i], shape=param.shape, **exp_avg_qmetadata)
File "/root/miniconda3/lib/python3.8/site-packages/lpmm-0.0.0-py3.8-linux-x86_64.egg/lpmm/functional.py", line 53, in vectorwise_quant
qx = nonlinear_quant(qx, qmap, b, round_type=kwargs['round_type'])
File "/root/miniconda3/lib/python3.8/site-packages/lpmm-0.0.0-py3.8-linux-x86_64.egg/lpmm/functional.py", line 369, in nonlinear_quant
idx = real_nonlinear_quant(qx, qmap, b, False)
File "/root/miniconda3/lib/python3.8/site-packages/lpmm-0.0.0-py3.8-linux-x86_64.egg/lpmm/functional.py", line 363, in real_nonlinear_quant
return ext_quantization.pack_nonlinear(grouped_qx, qmap, b, stochastic)
RuntimeError: The type of data is not kFloat32 or kFloat16!
qx: tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0',
dtype=torch.bfloat16)这个数据不满足要求?
I removed the
TORCH_CHECK((name.dtype() == c10::BFloat16 || name.dtype() == torch::kFloat16), \ "The type of " #name " is not kFloat32 or kFloat16!");\
and got
RuntimeError: "pack_nonlinear_4bit" not implemented for 'BFloat16'
How can i apply the optimzer to bf16 model?
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1779, in train
return inner_training_loop(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2176, in _inner_training_loop
self.optimizer.step()
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/optimizer.py", line 145, in step
self.optimizer.step(closure)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
return wrapped(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/optim/optimizer.py", line 373, in wrapper
out = func(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/lpmm/optim/adamw.py", line 230, in step
_single_tensor_adamw4bit(**kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/lpmm/optim/adamw.py", line 426, in _single_tensor_adamw4bit
qx, gen = vectorwise_quant(exp_avg, qmap=exp_avgs_qmap[i], shape=param.shape, **exp_avg_qmetadata)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/lpmm/functional.py", line 53, in vectorwise_quant
qx = nonlinear_quant(qx, qmap, b, round_type=kwargs['round_type'])
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/lpmm/functional.py", line 369, in nonlinear_quant
idx = real_nonlinear_quant(qx, qmap, b, False)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/lpmm/functional.py", line 363, in real_nonlinear_quant
return ext_quantization.pack_nonlinear(grouped_qx, qmap, b, stochastic)
RuntimeError: The type of data is not kFloat32 or kFloat16!
Hi, thank you for the interesting idea and very helpful implementation! Actually, I tried to apply lpmm.optim.AdamW to transformers trainer for multiple gpus training but got an error below.
lib/python3.10/site-packages/accelerate/utils/operations.py", line 167, in send_to_device
return tensor.to(device, non_blocking=non_blocking)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
Doesn't your current code support the multiple gpus training? Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.