Comments (3)
Hi @zhaoyuchen1128 you can do that as follows for single GPU out of memory issue (對於單 GPU 記憶體不足問題,您可以執行以下操作):
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"Trelis/Llama-2-7b-chat-hf-function-calling-v2", low_cpu_mem_usage=True,offload_state_dict=True
)
low_cpu_mem_usage
: it will load the model using ~1x model size CPU memory (它將使用約 1 倍模型大小的 CPU 記憶體載入模型)
offload_state_dict
: it will temporarily offload the CPU state dict to hard drive and will prevent getting out of RAM (它將暫時將 CPU 狀態指令卸載到硬碟並防止記憶體溢出)
For multiple GPUs model loading use (用於多GPU模型載入使用) : device_map = 'auto'
as follows:
model = AutoModelForCausalLM.from_pretrained(
"Trelis/Llama-2-7b-chat-hf-function-calling-v2", low_cpu_mem_usage=True,offload_state_dict=True, device_map = 'auto'
)
device_map
: It loads a model onto multiple GPUs (它將模型載入到多個 GPU 上)
Cheers!
from transformers.
Hi @zhaoyuchen1128 you can do that as follows for single GPU out of memory issue (對於單 GPU 記憶體不足問題,您可以執行以下操作):
from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained( "Trelis/Llama-2-7b-chat-hf-function-calling-v2", low_cpu_mem_usage=True,offload_state_dict=True )
low_cpu_mem_usage
: it will load the model using ~1x model size CPU memory (它將使用約 1 倍模型大小的 CPU 記憶體載入模型)offload_state_dict
: it will temporarily offload the CPU state dict to hard drive and will prevent getting out of RAM (它將暫時將 CPU 狀態指令卸載到硬碟並防止記憶體溢出)For multiple GPUs model loading use (用於多GPU模型載入使用) :
device_map = 'auto'
as follows:model = AutoModelForCausalLM.from_pretrained( "Trelis/Llama-2-7b-chat-hf-function-calling-v2", low_cpu_mem_usage=True,offload_state_dict=True, device_map = 'auto' )
device_map
: It loads a model onto multiple GPUs (它將模型載入到多個 GPU 上)Cheers!
Thank you.But when I load a model ,it seems too slowly with two 4090 RTX,is that normal? The model is llama3:70b of 140GB
from transformers.
@zhaoyuchen1128 if you feel the model loading as pretty slow then please go through this guide of improving the speed with accelerate
library. It gives all the details regarding it and will pretty much answer all the questions and if any remains the issues are always open.
Cheers!
from transformers.
Related Issues (20)
- How to manually stop the LLM output? HOT 2
- Pipeline's "num_return_sequences" > greater than 1 causes a runtime error with Gemma-2-9B. HOT 6
- WavLM returns empty hidden states when loaded directly to GPU HOT 1
- "TypeError: Object of type device is not JSON serializable" when saving the model on TPU HOT 6
- Add Depth Anything v2 metric depth HOT 6
- `attention_mask` must be in the same device as model? HOT 1
- `Gemma2Model` not returning cache HOT 8
- the attention output from llama2 generate differs from other llama models HOT 3
- Whisper + Torch.Compile: torch._dynamo.exc.Unsupported: reconstruct: UserDefinedObjectVariable(EncoderDecoderCache) HOT 6
- Type mis-match in function make_log_bucket_position() of TF DeBERTa V2 HOT 1
- TFDebertaModel and TFDebertaV2Model throws TypeError when keras.fit with Mixed Precision HOT 1
- LlamaSdpaAttention vs output_attentions=True HOT 5
- Plans to Integrate LongRoPE into LLaMA? HOT 3
- 4.42.4version bug,@torch._custom_ops.impl_abstract("torchvision::nms") HOT 1
- pipeline gives a different result than the other approach in predicting word probability HOT 2
- DoLa decoding fails on cuda HOT 3
- Training Fails with attn_implementation="flash_attention_2" in gemma-2-9b Model HOT 2
- LLama can't use `torch.compile()` HOT 2
- Add support for Apple's DCLM-Baseline-7B model HOT 3
- GPT-2 Model Logits and Loss are different on MPS HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.