Comments (3)
We cannot control the level of hierarchy project maintainers want to have in their projects.
For the things that are within our scope of control, we try to explicitly document things (such as enable_model_cpu_offload
or enable_sequential_cpu_offload
. Together when they are combined with other things such as prompt pre-computing, 8bit inference of the text encoders, etc. they do reduce quite a lot of VRAM consumption.
We cannot do everything on behalf of the users as requirements vary from project to project, but we can provide simple and easy-to-use APIs for the users so that they cover the most common use cases.
On a related note, @Wauplin has started to include a couple of modeling utilities within huggingface_hub
(such as the inclusion of a model sharding utility). So, I wonder if this is something to consider for him. Or maybe it lies more within the scope of accelerate
(maybe something already exists that I am unable to recollect). So, ccing @muellerzr @SunMarc.
from diffusers.
I observe:
x = AutoModel.from_pretrained(...)
y = AutoModel.from_pretrained(...)
assert memory_usage(x) + memory_usage(y) > gpu_memory_available()
load(x)
x()
unload(x)
load(y)
y()
unload(y)
reinvented over and over again by downstream users of diffusers
and PyTorch.
load
and unload
are kind of hacky ideas. They only make sense in the kind of 1 GPU, personal computer sort of context, that runs 1 task.
from diffusers.
Example implementation:
x = AutoModel.from_pretrained(...)
y = AutoModel.from_pretrained(...)
with sequential_offload(offload_device=torch.device('cpu'), load_device=torch.device('cuda:0')):
x(...)
y(...)
and forward
in the mixin would check a contextvar for loaded and unloaded models. alternatively:
with sequential_offload(offload_device=torch.device('cpu'), load_device=torch.device('cuda:0'), models=[x, y]):
x(...)
y(...)
could be implemented now with little issue.
More broadly:
# for example, prefer weights and inferencing in bfloat16, then gptq 8 bit if supported, then bitsandbytes 8 bit, etc.
llm_strategy = BinPacking(
devices=["cuda:0", "cuda:1", ...],
levels=[
BitsAndBytesConfig(...),
GPTQConfig(...),
torch.bfloat16
]
)
# some models do not perform well at 8 bit for example so shouldn't be used there at all
unet_strategy = BinPacking(
levels=[torch.float16, torch.bfloat16]
)
# maybe there is a separate inference and weights strategy
t5_strategy = BinPacking(load_in=[torch.float8, torch.float16], compute_in=[torch.float16])
with model_management(strategy=llm_strategy):
x(...)
with model_management(strategy=unet_strategy):
y(...)
Ultimately downstream projects keep writing this, and it fuels a misconception that "Hugging Face" doesn't "run on my machine."
For example: "Automatically loading/unloading models from memory [is something that Ollama supports that Hugging Face does not]"
You guys are fighting pretty pervasive misconceptions too: "Huggingface isn't local"
We cannot control the level of hierarchy project maintainers want to have in their projects.
Perhaps the contextlib approach is best.
from diffusers.
Related Issues (20)
- can not use custom_pipeline hd_painter with IP_adapter HOT 4
- Pixart-sigma is incompatible with mixed precision inference HOT 3
- Getting OOM error when using "--caption_column="caption"". HOT 9
- controlnet singlefile dont have config.json HOT 15
- Support `fuse_lora` on Stable Diffusion 3
- Classifier free guidance(CFG) on different prediction types and karras style schedulers HOT 2
- High Batch Size with SD3 Dreambooth Destabilizes Training HOT 1
- Running stable diffusion 3 medium : fused_layer_norm_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol HOT 2
- More thorough guidance for multiple IP adapter images/masks and a single IP Adapter HOT 5
- Stable audio open diffusers version HOT 1
- ImportError: cannot import name 'PixArtSigmaPipeline' from 'diffusers'
- getting bug on diffuser example HOT 7
- Integrate Lumina-T2X HOT 4
- Loading T5 encoder separately with StableDiffusion3Pipeline causes meta tensor error on sequential/model cpu offload HOT 5
- There is no create_diffusers_controlnet_model_from_ldm function in single_file_utils.py HOT 1
- StableDiffusionControlNetImg2ImgPipeline call report “argument of type 'NoneType' is not iterable” HOT 1
- SD3 - num_images_per_prompt no longer honoured (throws error) HOT 3
- configuration_utils.to_json_string() fails on WindowsPath HOT 2
- i2vgen-xl keep produce black gif HOT 4
- train_text_to_image_sdxl.py fail resume from checkpoint and also can not load for infer HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from diffusers.