GithubHelp home page GithubHelp logo

Not Implemented Error: Memory efficient attention with `xformers` is currently not supported when `self.added_kv_proj_dim` is defined about if HOT 10 OPEN

deep-floyd avatar deep-floyd commented on August 18, 2024 1
Not Implemented Error: Memory efficient attention with `xformers` is currently not supported when `self.added_kv_proj_dim` is defined

from if.

Comments (10)

kanttouchthis avatar kanttouchthis commented on August 18, 2024 2

i had the same issue, my solution was to switch to pytorch 2.0.0+cu118 and disable the xformers memory efficient attention

from if.

bennyguo avatar bennyguo commented on August 18, 2024

Same here. I'm using torch==1.12.1+cu113 and xformers=0.0.20+6425fd0.d20230429. xformers is built from source.

from if.

bluestyle97 avatar bluestyle97 commented on August 18, 2024

same here

from if.

klei22 avatar klei22 commented on August 18, 2024

@kanttouchthis Thanks, this worked for me as well!

  1. Cloned a prior environment torch > 2.0 and with the cuda installation in a conda environment (cloning a prior environment set up for nanogpt).
  2. Commented out lines for xformers.
  3. Installed requirements.txt (after deleting torch from the list)
  4. Upgraded transformers library
  5. Upgraded accelerate library

In summary:

After installing the torch 2.0 with cuda support:

pip install -r requirements.txt # after removing torch
pip install transformers --upgrade
pip install accelerate --upgrade

Then the above test code ran without errors.

from if.

ddPn08 avatar ddPn08 commented on August 18, 2024

Same here.
I want to use xformers because I want to run deepfloyd on anything less than torch v2. If I don't use it I get an OOM error.

from if.

kanttouchthis avatar kanttouchthis commented on August 18, 2024

Same here. I want to use xformers because I want to run deepfloyd on anything less than torch v2. If I don't use it I get an OOM error.

pytorch 2.0 automatically applies the same memory efficient attention that xformers offers (see here). How much VRAM do you have? You can run the diffusers code with very little VRAM by using cpu offloading or sequential offloading:

#16 GB
stage_1.enable_model_cpu_offload()
stage_2.enable_model_cpu_offload()
stage_3.enable_model_cpu_offload()
#6 GB
stage_1.enable_sequential_cpu_offload()
stage_2.enable_model_cpu_offload()
stage_3.enable_model_cpu_offload()

from if.

ddPn08 avatar ddPn08 commented on August 18, 2024

pytorch 2.0 automatically applies the same memory efficient attention that xformers offers (see here). How much VRAM do you have? You can run the diffusers code with very little VRAM by using cpu offloading or sequential offloading:

#16 GB
stage_1.enable_model_cpu_offload()
stage_2.enable_model_cpu_offload()
stage_3.enable_model_cpu_offload()
#6 GB
stage_1.enable_sequential_cpu_offload()
stage_2.enable_model_cpu_offload()
stage_3.enable_model_cpu_offload()

Sure it works. But it's important to me that it works with pytorch < 2

from if.

KernelA avatar KernelA commented on August 18, 2024

I have the same issue because diffuser has explicit exception:

https://github.com/huggingface/diffusers/blob/79c0e24a1442741c59c9b1d2764ba4dbfe56ac71/src/diffusers/models/attention_processor.py#L162

from if.

astroyh avatar astroyh commented on August 18, 2024

Is there a solution to this problem with torch 1.x?

from if.

KernelA avatar KernelA commented on August 18, 2024

I found the follow solution with torch 1.x: load a model with 8bit text encoder and do not use xformers at all stages except upscale stage.

For 8bit loading you need bitsandbytes . Probably, it is possible to use XL models but reusing memory on the GPU:

# load stage 1
pipe =  ...
del pipe
# load stage 2
pipe = 

I run it on the 24 GB GPU, but you can use smaller models.

Preload

import torch
from diffusers import (DiffusionPipeline, IFPipeline,
                       IFSuperResolutionPipeline,
                       StableDiffusionUpscalePipeline)
from transformers import T5EncoderModel

if __name__ == "__main__":
    stage_1_model_name = "DeepFloyd/IF-I-L-v1.0"
    stage_2_model_name = "DeepFloyd/IF-II-L-v1.0"
    stage_3_model_name = "stabilityai/stable-diffusion-x4-upscaler"

    text_encoder = T5EncoderModel.from_pretrained(
        stage_1_model_name,
        subfolder="text_encoder",
        device_map="auto",
        load_in_8bit=True,
        variant="8bit",
        torch_dtype=torch.float16
    )

    text_encoder_pipeline = DiffusionPipeline.from_pretrained(
        stage_1_model_name,
        text_encoder=text_encoder,  # pass the previously instantiated 8bit text encoder
        unet=None,
        device_map="auto",
    )

    text_encoder_pipeline.save_pretrained("checkpoints/text_encoder")

    pipe1 = IFPipeline.from_pretrained(
        stage_1_model_name,
        text_encoder=text_encoder,  # pass the previously instantiated 8bit text encoder
        watermarker=None,
        feature_extractor=None,
        safety_checker=None,
        require_safety_checker=False
    )
    pipe1.save_pretrained("checkpoints/stage1")

    del pipe1

    pipe2 = IFSuperResolutionPipeline.from_pretrained(
        stage_2_model_name, feature_extractor=None, safety_checker=None, watermarker=None,
        text_encoder=None, variant="fp16", torch_dtype=torch.float16,
        require_safety_checker=False
    )

    pipe2.save_pretrained("checkpoints/stage2")

    del pipe2

    pipe3 = StableDiffusionUpscalePipeline.from_pretrained(
        stage_3_model_name,
        feature_extractor=None,
        safety_checker=None,
        watermarker=None,
        variant="fp16",
        torch_dtype=torch.float16,
    )

    pipe3.save_pretrained("checkpoints/stage3-super-resol")

Inference

import os
from typing import Optional

import torch
from diffusers import (DiffusionPipeline, IFPipeline,
                    IFSuperResolutionPipeline,
                    StableDiffusionUpscalePipeline)


class DeeplFloyd:
 def __init__(self, checkpoint_dir: str, device: torch.device):
     self._device = device
     self._text_encoder_pipeline = DiffusionPipeline.from_pretrained(
         os.path.join(checkpoint_dir, "text_encoder"),
         device_map="auto",
         unet=None,
         local_files_only=True,
         low_cpu_mem_usage=True,
     )

     self._text_encoder_pipeline.to(device)

     self._pipe_stage_1 = IFPipeline.from_pretrained(
         os.path.join(checkpoint_dir, "stage1"),
         # pass the previously instantiated 8bit text encoder
         text_encoder=self._text_encoder_pipeline.text_encoder,
         watermarker=None,
         feature_extractor=None,
         safety_checker=None,
         low_cpu_mem_usage=True,
         local_files_only=True,
         require_safety_checker=False
     )
     self._pipe_stage_1.set_progress_bar_config(mininterval=5)
     self._pipe_stage_1.to(device)

     self._pipe_stage_2 = IFSuperResolutionPipeline.from_pretrained(
         os.path.join(checkpoint_dir, "stage2"),
         feature_extractor=None, safety_checker=None, watermarker=None,
         text_encoder=None,
         variant="fp16",
         torch_dtype=torch.float16,
         require_safety_checker=False,
         local_files_only=True,
         low_cpu_mem_usage=True,
     )
     self._pipe_stage_2.set_progress_bar_config(mininterval=5)
     self._pipe_stage_2.to(device)

     self._pipe_stage_3 = StableDiffusionUpscalePipeline.from_pretrained(
         os.path.join(checkpoint_dir, "stage3-super-resol"),
         feature_extractor=None,
         safety_checker=None,
         watermarker=None,
         variant="fp16",
         local_files_only=True,
         torch_dtype=torch.float16,
         low_cpu_mem_usage=True
     )

     self._pipe_stage_3.set_progress_bar_config(mininterval=5)
     self._pipe_stage_3.enable_xformers_memory_efficient_attention()
     self._pipe_stage_3.enable_model_cpu_offload()
     self._generator = torch.Generator()

 def __call__(self,
              prompt: str,
              seed: int,
              num_inference_steps: int = 100,
              num_upscale_steps: int = 70,
              neg_prompt: Optional[str] = None):
     generator = self._generator.manual_seed(seed)
     prompt_embeds, negative_embeds = self._text_encoder_pipeline.encode_prompt(
         prompt,
         negative_prompt=neg_prompt)

     with torch.autocast(self._pipe_stage_1.device.type):
         image = self._pipe_stage_1(
             prompt_embeds=prompt_embeds,
             negative_prompt_embeds=negative_embeds,
             output_type="pt",
             num_inference_steps=num_inference_steps,
             generator=generator,
         ).images

     with torch.autocast(self._pipe_stage_2.device.type):
         image = self._pipe_stage_2(
             image=image,
             num_inference_steps=max(num_inference_steps // 2, 1),
             prompt_embeds=prompt_embeds,
             negative_prompt_embeds=negative_embeds,
             output_type="pt",
             generator=generator,
         ).images

     images = self._pipe_stage_3(
         image=image,
         prompt=prompt,
         num_inference_steps=num_upscale_steps,
         generator=generator,
     ).images

     return images[0]

from if.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.