GithubHelp home page GithubHelp logo

Comments (3)

RUFFY-369 avatar RUFFY-369 commented on August 16, 2024

Hi @zhaoyuchen1128 you can do that as follows for single GPU out of memory issue (對於單 GPU 記憶體不足問題,您可以執行以下操作):

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
    "Trelis/Llama-2-7b-chat-hf-function-calling-v2", low_cpu_mem_usage=True,offload_state_dict=True
)

low_cpu_mem_usage : it will load the model using ~1x model size CPU memory (它將使用約 1 倍模型大小的 CPU 記憶體載入模型)
offload_state_dict : it will temporarily offload the CPU state dict to hard drive and will prevent getting out of RAM (它將暫時將 CPU 狀態指令卸載到硬碟並防止記憶體溢出)

For multiple GPUs model loading use (用於多GPU模型載入使用) : device_map = 'auto' as follows:

model = AutoModelForCausalLM.from_pretrained(
    "Trelis/Llama-2-7b-chat-hf-function-calling-v2", low_cpu_mem_usage=True,offload_state_dict=True, device_map = 'auto'
)

device_map : It loads a model onto multiple GPUs (它將模型載入到多個 GPU 上)

Cheers!

from transformers.

zhaoyuchen1128 avatar zhaoyuchen1128 commented on August 16, 2024

Hi @zhaoyuchen1128 you can do that as follows for single GPU out of memory issue (對於單 GPU 記憶體不足問題,您可以執行以下操作):

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
    "Trelis/Llama-2-7b-chat-hf-function-calling-v2", low_cpu_mem_usage=True,offload_state_dict=True
)

low_cpu_mem_usage : it will load the model using ~1x model size CPU memory (它將使用約 1 倍模型大小的 CPU 記憶體載入模型) offload_state_dict : it will temporarily offload the CPU state dict to hard drive and will prevent getting out of RAM (它將暫時將 CPU 狀態指令卸載到硬碟並防止記憶體溢出)

For multiple GPUs model loading use (用於多GPU模型載入使用) : device_map = 'auto' as follows:

model = AutoModelForCausalLM.from_pretrained(
    "Trelis/Llama-2-7b-chat-hf-function-calling-v2", low_cpu_mem_usage=True,offload_state_dict=True, device_map = 'auto'
)

device_map : It loads a model onto multiple GPUs (它將模型載入到多個 GPU 上)

Cheers!

Thank you.But when I load a model ,it seems too slowly with two 4090 RTX,is that normal? The model is llama3:70b of 140GB

from transformers.

RUFFY-369 avatar RUFFY-369 commented on August 16, 2024

@zhaoyuchen1128 if you feel the model loading as pretty slow then please go through this guide of improving the speed with accelerate library. It gives all the details regarding it and will pretty much answer all the questions and if any remains the issues are always open.

Cheers!

from transformers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.