GithubHelp home page GithubHelp logo

shivamshrirao / diffusers Goto Github PK

View Code? Open in Web Editor NEW

This project forked from huggingface/diffusers

1.9K 1.9K 508.0 22.89 MB

๐Ÿค— Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch

Home Page: https://huggingface.co/docs/diffusers

License: Apache License 2.0

Python 99.78% Makefile 0.06% Dockerfile 0.16%

diffusers's People

Contributors

anton-l avatar apolinario avatar daspartho avatar duongna21 avatar haofanwang avatar hysts avatar isamu-isozaki avatar kashif avatar kig avatar mishig25 avatar narsil avatar natolambert avatar nouamanetazi avatar osanseviero avatar patil-suraj avatar patrickvonplaten avatar pcuenca avatar prathikr avatar ryanrussell avatar santiviquez avatar sayakpaul avatar shirayu avatar shivamshrirao avatar skytnt avatar standardai avatar stevhliu avatar vvvm23 avatar williamberman avatar ydshieh avatar yiyixuxu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

diffusers's Issues

Subject name as opposed to class for better results?

Is your feature request related to a problem? Please describe.
Yes, not getting quite the results as Collab Dreambooth

Describe the solution you'd like
A person on reddit posted that he uses SUBJECT_NAME on collab and not using class at all and getting better results on subjects.

Describe alternatives you've considered
Is there a way to do this?

Additional context
I don't know how to do it, I only see class related items in the Launch.sh file.

Please could somebody help

Hey guys, I'm not quite sure what's going on here however I followed nerdy rodents video to get this working on ubuntu (in windows) however I am getting this error. have I done something wrong? Any help/advice would be welcome as I'm not the best at this kind of thing. thank you :)

The following values were not passed to accelerate launchand had defaults used instead:--num_cpu_threads_per_processwas set to12to improve out-of-box performance To avoid this warning pass in values for each of the problematic parameters or runaccelerate config`.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link

/home/bobross/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/paths.py:86: UserWarning: /home/bobross/anaconda3/envs/diffusers did not contain libcudart.so as expected! Searching further paths...
warn(
/home/bobross/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/paths.py:98: UserWarning: /usr/lib/wsl/lib: did not contain libcudart.so as expected! Searching further paths...
warn(
/home/bobross/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/paths.py:20: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('CompVis/stable-diffusion-v1-4')}
warn(
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/bobross/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Traceback (most recent call last):
File "/home/bobross/github/diffusers/examples/dreambooth/train_dreambooth.py", line 637, in
main()
File "/home/bobross/github/diffusers/examples/dreambooth/train_dreambooth.py", line 450, in main
train_dataset = DreamBoothDataset(
File "/home/bobross/github/diffusers/examples/dreambooth/train_dreambooth.py", line 230, in init
raise ValueError("Instance images root doesn't exists.")
ValueError: Instance images root doesn't exists.
Traceback (most recent call last):
File "/home/bobross/anaconda3/envs/diffusers/bin/accelerate", line 8, in
sys.exit(main())
File "/home/bobross/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/home/bobross/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/home/bobross/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/bobross/anaconda3/envs/diffusers/bin/python', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--instance_data_dir=path-to-instance-images', '--output_dir=path-to-save-model', '--instance_prompt=a photo of sks dog', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=2', '--gradient_checkpointing', '--use_8bit_adam', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=400']' returned non-zero exit status 1.`

I have added my 512x512 images into the "training folder" but that's it.

caching latents doubt...

Describe the bug

Screenshot_3

I downloaded from a source 6650 img to use as classes, so not need to download it before training.

Now i see this Caching Latents eats a lot of VRAM, so what's the max numbers in classes images folder i can use to not run out of memory before the training start?

Reproduction

No response

Logs

No response

System Info

  • diffusers version: 0.4.0.dev0
  • Platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
  • Python version: 3.9.13
  • PyTorch version (GPU?): 1.12.1+cu116 (True)
  • Huggingface_hub version: 0.10.0
  • Transformers version: 4.22.2
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Support for multiple fine-tuned concepts & classes on one model with Dreambooth

Is your feature request related to a problem? Please describe.
I'd like to add multiple people and/or objects to a single model and finetune my model for that. For example, I'd like to add an entire family, together with their pets and their car, and this with as little degradation of my overall model as humanly possible.

Describe the solution you'd like
I'd like to add multiple people and/or objects to a model and finetune my model with a minimal amount of degradation.

Describe alternatives you've considered
I've been tinkering around with different classes and concepts, but thusfar any attempt to use a different class or concept thusfar always lead to extreme degradation.

Additional context
Apparently someone already forked JoePenna/Dreambooth-Stable-Diffusion at kanewallmann/Dreambooth-Stable-Diffusion to add support for multiple concepts & classes on one model. However, that version of Dreambooth SD requires at least 24GB VRAM, which means I can't run it on either my machine or a free Google Colab, which is what I've been using so far.

Related :

Finetuning

Is it possible to use this with deepseed and adam to do traditional finetuning at 8gb if so how?

Works great but can't run offline, Can't dreambooth train custom models

Describe the bug

I have everything working but I want to run this offline. Nerdy Rodent mentioned that I can do this by typing in
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 as the transformers need an internet connection, info here https://huggingface.co/docs/transformers/v4.15.0/installation

I tried typing that but still does not run offline. and gives https error if internet disconnected. In addition I tried to follow the section about running the model offline also and specifying one manually. Did not work.
https://huggingface.co/docs/transformers/v4.15.0/installation

Can anyone give any clear assistance here? I really don't want to just merge the models in automatic1111 gui, I really feel I would get better results dreambooth training a custom model instead. Thanks!

Reproduction

Disconnect internet

Logs

No response

System Info

GTX 3080 10GB

UnboundLocalError: local variable 'text_encoder' referenced before assignment

Describe the bug

Normal run thru colab
https://colab.research.google.com/github/ShivamShrirao/diffusers/blob/main/examples/dreambooth/DreamBooth_Stable_Diffusion.ipynb

without gdrive and with--gradient_checkpointing

https://github.com/ShivamShrirao/diffusers/blob/main/examples/dreambooth/train_dreambooth.py#L665

Reproduction

No response

Logs

Steps: 100% 1000/1000 [18:37<00:00,  1.11s/it, loss=0.193, lr=5e-6]Traceback (most recent call last):
  File "train_dreambooth.py", line 677, in <module>
    main()
  File "train_dreambooth.py", line 665, in main
    text_encoder=accelerator.unwrap_model(text_encoder),
UnboundLocalError: local variable 'text_encoder' referenced before assignment
Steps: 100% 1000/1000 [18:37<00:00,  1.12s/it, loss=0.193, lr=5e-6]
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

System Info

https://colab.research.google.com/github/ShivamShrirao/diffusers/blob/main/examples/dreambooth/DreamBooth_Stable_Diffusion.ipynb

T4, Colab free

Error while generating class images

Describe the bug

(diffusers) user@powerpc:~/github/diffusers/examples/dreambooth$ ./my_training.sh
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_cpu_threads_per_process` was set to `12` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Fetching 16 files: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 16/16 [00:00<00:00, 16221.63it/s]
Generating class images:   0%|                                                                   | 0/25 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/user/github/diffusers/examples/dreambooth/train_dreambooth.py", line 658, in <module>
    main()
  File "/home/user/github/diffusers/examples/dreambooth/train_dreambooth.py", line 389, in main
    with context:
AttributeError: __enter__
Traceback (most recent call last):
  File "/home/user/anaconda3/envs/diffusers/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: 
Command '['/home/user/anaconda3/envs/diffusers/bin/python', 
'train_dreambooth.py', 
'--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', 
'--use_auth_token', 
'--instance_data_dir=training', 
'--with_prior_preservation', 
'--prior_loss_weight=1.0', 
'--instance_prompt=a photo of a dog', 
'--class_prompt=a photo of dog', 
'--resolution=512', 
'--train_batch_size=1', 
'--gradient_accumulation_steps=2', 
'--gradient_checkpointing', 
'--use_8bit_adam', 
'--learning_rate=5e-6', 
'--lr_scheduler=constant', 
'--lr_warmup_steps=0', 
'--max_train_steps=800', 
'--class_data_dir=classes']' returned non-zero exit status 1.

Reproduction

Using this shell script I get the error above:

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="training"
export OUTPUT_DIR="my_model"

accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME --use_auth_token \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="a photo of a dog" \
  --class_prompt="a photo of dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=2 --gradient_checkpointing \
  --use_8bit_adam \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 --max_train_steps=800 \
  --class_data_dir="classes"

Logs

No response

System Info

Ubuntu on WSL, 12GB 3060

Stable Diffusion v1.5

Since stable diffusion 1.5 is out, I was hoping to use it with the imagic colab. Could you upgrade diffusers to use the new model instead of the old one?

How to use a custom initial model?

I tried replacing the "MODEL_NAME" in the "Settings and run" section with the path to my own chkpt file on google drive, but it doesn't seem to work. I can't seem to get it to work with anything other than the default "CompVis/stable-diffusion-v1-4"

Did I misunderstand or do something wrong?

Getting an Error

Describe the bug

I get a huge error when I fill in the steps as shown on the video by James Cunliffe here: https://www.youtube.com/watch?v=FaLTztGGueQ

I have posted the error below and the colab was a Tesla T4, 15109 MiB

This error occurs when I press "play" on the "Start Training" section.

Reproduction

Follow his steps to the letter, except change his "realjames" for something else like whatever you called your Hugging Face token.

This is my settings:

!accelerate launch train_dreambooth.py
--pretrained_model_name_or_path=$MODEL_NAME
--instance_data_dir=$INSTANCE_DIR
--class_data_dir=$CLASS_DIR
--output_dir=$OUTPUT_DIR
--with_prior_preservation --prior_loss_weight=1.0
--instance_prompt="MY TOKEN NAME"
--class_prompt="person"
--seed=1337
--resolution=512
--train_batch_size=1
--train_text_encoder
--mixed_precision="fp16"
--use_8bit_adam
--gradient_accumulation_steps=1
--learning_rate=5e-6
--lr_scheduler="constant"
--lr_warmup_steps=0
--num_class_images=50
--sample_batch_size=4
--max_train_steps=1000

Logs

The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--num_cpu_threads_per_process` was set to `1` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/huggingface_hub/utils/_errors.py", line 213, in hf_raise_for_status
    response.raise_for_status()
  File "/usr/local/lib/python3.7/dist-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/model_index.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/diffusers/configuration_utils.py", line 234, in get_config_dict
    revision=revision,
  File "/usr/local/lib/python3.7/dist-packages/huggingface_hub/file_download.py", line 1057, in hf_hub_download
    timeout=etag_timeout,
  File "/usr/local/lib/python3.7/dist-packages/huggingface_hub/file_download.py", line 1359, in get_hf_file_metadata
    hf_raise_for_status(r)
  File "/usr/local/lib/python3.7/dist-packages/huggingface_hub/utils/_errors.py", line 254, in hf_raise_for_status
    raise HfHubHTTPError(str(HTTPError), response=response) from e
huggingface_hub.utils._errors.HfHubHTTPError: <class 'requests.exceptions.HTTPError'> (Request ID: ssVb9YQlJa6gYKCWwu9Tk)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train_dreambooth.py", line 695, in <module>
    main()
  File "train_dreambooth.py", line 376, in main
    args.pretrained_model_name_or_path, torch_dtype=torch_dtype, use_auth_token=True
  File "/usr/local/lib/python3.7/dist-packages/diffusers/pipeline_utils.py", line 373, in from_pretrained
    revision=revision,
  File "/usr/local/lib/python3.7/dist-packages/diffusers/configuration_utils.py", line 256, in get_config_dict
    "There was a specific connection error when trying to load"
OSError: There was a specific connection error when trying to load CompVis/stable-diffusion-v1-4:
<class 'requests.exceptions.HTTPError'> (Request ID: ssVb9YQlJa6gYKCWwu9Tk)
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--instance_data_dir=/content/data/RealErnesttInput', '--class_data_dir=/content/data/person', '--output_dir=/content/drive/MyDrive/stable_diffusion_weights/sks', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=RealErnest', '--class_prompt=person', '--seed=1337', '--resolution=512', '--train_batch_size=1', '--mixed_precision=fp16', '--use_8bit_adam', '--gradient_accumulation_steps=1', '--learning_rate=1e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=50', '--sample_batch_size=4', '--max_train_steps=800']' returned non-zero exit status 1.

System Info

Google Colab: https://colab.research.google.com/github/ShivamShrirao/diffusers/blob/main/examples/dreambooth/DreamBooth_Stable_Diffusion.ipynb

Cannot Run the Colab Version

Describe the bug

I get a huge error when I fill in the steps as shown on the video by James Cunliffe here: https://www.youtube.com/watch?v=FaLTztGGueQ

I have posted the error below and the colab was a Tesla T4, 15109 MiB

This error occurs when I press "play" on the "Start Training" section.

Reproduction

Follow his steps to the letter, except change his "realjames" for something else like whatever you called your Hugging Face token.

This is my settings:

!accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME --use_auth_token \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  **--instance_prompt="THE NAME I GAVE MY TOKEN ON THE HUGGING FACE SITE" \**
  --class_prompt="person" \
  --seed=1337 \
  --resolution=512 \
  --center_crop \
  --train_batch_size=1 \
  --mixed_precision="no" \
  --use_8_bit_adam \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=12 \
  --sample_batch_size=4 \
  --max_train_steps=900

Logs

The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--num_cpu_threads_per_process` was set to `1` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
usage: train_dreambooth.py [-h] --pretrained_model_name_or_path
                           PRETRAINED_MODEL_NAME_OR_PATH
                           [--tokenizer_name TOKENIZER_NAME]
                           --instance_data_dir INSTANCE_DATA_DIR
                           [--class_data_dir CLASS_DATA_DIR]
                           [--instance_prompt INSTANCE_PROMPT]
                           [--class_prompt CLASS_PROMPT]
                           [--with_prior_preservation]
                           [--prior_loss_weight PRIOR_LOSS_WEIGHT]
                           [--num_class_images NUM_CLASS_IMAGES]
                           [--output_dir OUTPUT_DIR] [--seed SEED]
                           [--resolution RESOLUTION] [--center_crop]
                           [--train_batch_size TRAIN_BATCH_SIZE]
                           [--sample_batch_size SAMPLE_BATCH_SIZE]
                           [--num_train_epochs NUM_TRAIN_EPOCHS]
                           [--max_train_steps MAX_TRAIN_STEPS]
                           [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
                           [--gradient_checkpointing]
                           [--learning_rate LEARNING_RATE] [--scale_lr]
                           [--lr_scheduler LR_SCHEDULER]
                           [--lr_warmup_steps LR_WARMUP_STEPS]
                           [--use_8bit_adam] [--adam_beta1 ADAM_BETA1]
                           [--adam_beta2 ADAM_BETA2]
                           [--adam_weight_decay ADAM_WEIGHT_DECAY]
                           [--adam_epsilon ADAM_EPSILON]
                           [--max_grad_norm MAX_GRAD_NORM] [--push_to_hub]
                           [--use_auth_token] [--hub_token HUB_TOKEN]
                           [--hub_model_id HUB_MODEL_ID]
                           [--logging_dir LOGGING_DIR]
                           [--log_interval LOG_INTERVAL]
                           [--mixed_precision {no,fp16,bf16}]
                           [--not_cache_latents] [--local_rank LOCAL_RANK]
train_dreambooth.py: error: unrecognized arguments: --use_8_bit_adam
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=/content/data/SteveInput', '--class_data_dir=/content/data/person', '--output_dir=/content/drive/MyDrive/stable_diffusion_weights/SteveOutput', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=nut', '--class_prompt=person', '--seed=1337', '--resolution=512', '--center_crop', '--train_batch_size=2', '--mixed_precision=no', '--use_8_bit_adam', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=12', '--sample_batch_size=4', '--max_train_steps=900']' returned non-zero exit status 2.

Additional Info

If I ignore it then I get another issue further on that states:

---------------------------------------------------------------------------
OSError Traceback (most recent call last)
in
6 model_path = OUTPUT_DIR # If you want to use previously trained model saved in gdrive, replace this with the full path of model in gdrive
7
----> 8 pipe = StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16).to("cuda")
9 g_cuda = None

1 frames
[/usr/local/lib/python3.7/dist-packages/diffusers/configuration_utils.py](https://localhost:8080/#) in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
215 else:
216 raise EnvironmentError(
--> 217 f"Error no file named {cls.config_name} found in directory {pretrained_model_name_or_path}."
218 )
219 else:

OSError: Error no file named model_index.json found in directory /content/drive/MyDrive/stable_diffusion_weights/SteveOutput.

System Info

Google Colab: https://colab.research.google.com/github/ShivamShrirao/diffusers/blob/main/examples/dreambooth/DreamBooth_Stable_Diffusion.ipynb

convert_original_stable_diffusion_to_diffusers error

Describe the bug

Traceback (most recent call last):
File "/content/convert_original_stable_diffusion_to_diffusers.py", line 698, in
feature_extractor=feature_extractor,
TypeError: init() got an unexpected keyword argument 'safety_checker'

Reproduction

No response

Logs

[!] Not using xformers memory efficient attention.
--2022-10-20 07:01:02--  https://raw.githubusercontent.com/CompVis/stable-diffusion/main/configs/stable-diffusion/v1-inference.yaml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1873 (1.8K) [text/plain]
Saving to: โ€˜v1-inference.yamlโ€™

v1-inference.yaml   100%[===================>]   1.83K  --.-KB/s    in 0s      

2022-10-20 07:01:02 (29.2 MB/s) - โ€˜v1-inference.yamlโ€™ saved [1873/1873]

Downloading: 100% 4.52k/4.52k [00:00<00:00, 4.14MB/s]
Downloading: 100% 1.71G/1.71G [00:25<00:00, 68.4MB/s]
Some weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPTextModel: ['vision_model.encoder.layers.21.self_attn.q_proj.bias', 'vision_model.encoder.layers.5.self_attn.out_proj.bias', 'vision_model.encoder.layers.8.self_attn.q_proj.weight', 'vision_model.encoder.layers.2.mlp.fc1.weight', 'vision_model.encoder.layers.19.self_attn.q_proj.bias', 'vision_model.encoder.layers.7.mlp.fc1.bias', 'vision_model.encoder.layers.4.self_attn.k_proj.bias', 'vision_model.encoder.layers.18.self_attn.out_proj.bias', 'vision_model.encoder.layers.4.self_attn.q_proj.weight', 'vision_model.encoder.layers.15.self_attn.k_proj.weight', 'vision_model.encoder.layers.11.self_attn.k_proj.bias', 'vision_model.encoder.layers.0.self_attn.q_proj.bias', 'vision_model.encoder.layers.0.layer_norm2.bias', 'vision_model.encoder.layers.17.layer_norm1.weight', 'vision_model.encoder.layers.2.layer_norm2.weight', 'vision_model.encoder.layers.15.layer_norm1.weight', 'vision_model.encoder.layers.5.self_attn.k_proj.weight', 'vision_model.encoder.layers.12.self_attn.out_proj.weight', 'vision_model.encoder.layers.17.self_attn.q_proj.bias', 'vision_model.encoder.layers.23.layer_norm1.bias', 'vision_model.encoder.layers.15.self_attn.v_proj.bias', 'vision_model.encoder.layers.5.layer_norm1.weight', 'vision_model.encoder.layers.2.self_attn.q_proj.weight', 'vision_model.encoder.layers.1.mlp.fc1.weight', 'vision_model.encoder.layers.21.self_attn.v_proj.bias', 'vision_model.post_layernorm.weight', 'vision_model.encoder.layers.21.self_attn.v_proj.weight', 'vision_model.encoder.layers.12.mlp.fc2.bias', 'vision_model.encoder.layers.2.self_attn.q_proj.bias', 'vision_model.encoder.layers.8.self_attn.v_proj.bias', 'vision_model.encoder.layers.17.self_attn.q_proj.weight', 'vision_model.encoder.layers.0.self_attn.k_proj.weight', 'vision_model.encoder.layers.0.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.self_attn.q_proj.bias', 'vision_model.encoder.layers.5.mlp.fc2.weight', 'vision_model.encoder.layers.14.self_attn.k_proj.weight', 'vision_model.encoder.layers.0.layer_norm1.weight', 'vision_model.encoder.layers.6.self_attn.out_proj.weight', 'vision_model.encoder.layers.3.layer_norm2.bias', 'vision_model.encoder.layers.23.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.layer_norm1.bias', 'vision_model.encoder.layers.18.mlp.fc1.weight', 'vision_model.encoder.layers.23.self_attn.q_proj.weight', 'vision_model.encoder.layers.19.self_attn.v_proj.weight', 'vision_model.encoder.layers.12.layer_norm1.bias', 'vision_model.encoder.layers.22.layer_norm1.bias', 'vision_model.encoder.layers.2.self_attn.v_proj.bias', 'vision_model.encoder.layers.0.self_attn.v_proj.weight', 'vision_model.encoder.layers.3.self_attn.out_proj.weight', 'vision_model.encoder.layers.6.mlp.fc1.bias', 'vision_model.encoder.layers.4.self_attn.out_proj.weight', 'vision_model.encoder.layers.21.layer_norm1.weight', 'vision_model.embeddings.patch_embedding.weight', 'vision_model.encoder.layers.22.self_attn.q_proj.weight', 'vision_model.encoder.layers.1.layer_norm2.bias', 'vision_model.encoder.layers.15.self_attn.q_proj.bias', 'vision_model.encoder.layers.13.self_attn.out_proj.bias', 'vision_model.encoder.layers.20.self_attn.v_proj.weight', 'vision_model.encoder.layers.16.layer_norm2.bias', 'vision_model.embeddings.position_embedding.weight', 'vision_model.encoder.layers.17.mlp.fc1.weight', 'vision_model.encoder.layers.1.self_attn.q_proj.bias', 'vision_model.encoder.layers.15.layer_norm2.bias', 'vision_model.encoder.layers.20.layer_norm1.bias', 'vision_model.encoder.layers.21.mlp.fc2.weight', 'vision_model.encoder.layers.4.mlp.fc2.weight', 'vision_model.encoder.layers.7.self_attn.k_proj.weight', 'vision_model.encoder.layers.12.layer_norm2.bias', 'vision_model.encoder.layers.6.self_attn.q_proj.bias', 'vision_model.encoder.layers.10.self_attn.v_proj.weight', 'vision_model.encoder.layers.15.mlp.fc1.weight', 'vision_model.encoder.layers.13.self_attn.k_proj.bias', 'vision_model.post_layernorm.bias', 'vision_model.encoder.layers.23.mlp.fc1.bias', 'vision_model.encoder.layers.17.layer_norm2.weight', 'vision_model.encoder.layers.5.layer_norm2.bias', 'vision_model.encoder.layers.20.self_attn.out_proj.bias', 'vision_model.encoder.layers.12.self_attn.k_proj.bias', 'vision_model.encoder.layers.0.layer_norm2.weight', 'vision_model.encoder.layers.19.self_attn.out_proj.weight', 'vision_model.encoder.layers.21.self_attn.k_proj.bias', 'vision_model.encoder.layers.7.self_attn.q_proj.weight', 'vision_model.encoder.layers.3.layer_norm2.weight', 'vision_model.encoder.layers.7.mlp.fc2.weight', 'vision_model.encoder.layers.0.layer_norm1.bias', 'vision_model.encoder.layers.18.mlp.fc1.bias', 'vision_model.encoder.layers.17.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.self_attn.q_proj.weight', 'vision_model.encoder.layers.8.self_attn.k_proj.weight', 'vision_model.encoder.layers.16.self_attn.k_proj.bias', 'vision_model.pre_layrnorm.weight', 'vision_model.encoder.layers.1.mlp.fc2.bias', 'vision_model.encoder.layers.12.self_attn.v_proj.weight', 'vision_model.encoder.layers.18.mlp.fc2.weight', 'vision_model.encoder.layers.17.mlp.fc1.bias', 'vision_model.encoder.layers.4.layer_norm1.weight', 'vision_model.encoder.layers.3.mlp.fc2.bias', 'vision_model.encoder.layers.12.self_attn.q_proj.bias', 'vision_model.encoder.layers.3.mlp.fc1.bias', 'vision_model.encoder.layers.10.layer_norm2.bias', 'vision_model.encoder.layers.7.layer_norm2.weight', 'vision_model.encoder.layers.5.self_attn.out_proj.weight', 'vision_model.encoder.layers.8.mlp.fc1.bias', 'vision_model.encoder.layers.3.self_attn.k_proj.weight', 'vision_model.encoder.layers.4.mlp.fc1.bias', 'vision_model.encoder.layers.11.layer_norm1.weight', 'vision_model.encoder.layers.3.layer_norm1.bias', 'vision_model.encoder.layers.8.self_attn.out_proj.weight', 'vision_model.encoder.layers.7.self_attn.q_proj.bias', 'vision_model.encoder.layers.21.layer_norm2.weight', 'vision_model.encoder.layers.9.mlp.fc1.weight', 'vision_model.encoder.layers.16.layer_norm1.bias', 'vision_model.encoder.layers.19.self_attn.k_proj.bias', 'vision_model.encoder.layers.22.mlp.fc1.weight', 'vision_model.embeddings.class_embedding', 'vision_model.encoder.layers.9.layer_norm1.bias', 'vision_model.encoder.layers.17.self_attn.k_proj.weight', 'vision_model.encoder.layers.4.layer_norm1.bias', 'vision_model.encoder.layers.20.layer_norm1.weight', 'vision_model.encoder.layers.11.self_attn.k_proj.weight', 'vision_model.encoder.layers.7.self_attn.v_proj.bias', 'vision_model.encoder.layers.6.layer_norm2.weight', 'vision_model.embeddings.position_ids', 'vision_model.encoder.layers.11.layer_norm1.bias', 'vision_model.encoder.layers.0.mlp.fc2.bias', 'vision_model.encoder.layers.15.self_attn.k_proj.bias', 'vision_model.encoder.layers.14.self_attn.q_proj.weight', 'vision_model.encoder.layers.18.self_attn.v_proj.bias', 'vision_model.encoder.layers.11.layer_norm2.weight', 'vision_model.encoder.layers.14.self_attn.q_proj.bias', 'vision_model.encoder.layers.22.mlp.fc1.bias', 'vision_model.encoder.layers.13.mlp.fc1.weight', 'vision_model.encoder.layers.2.self_attn.k_proj.bias', 'vision_model.encoder.layers.3.self_attn.v_proj.bias', 'vision_model.encoder.layers.3.layer_norm1.weight', 'vision_model.encoder.layers.7.mlp.fc2.bias', 'vision_model.encoder.layers.8.layer_norm1.bias', 'vision_model.encoder.layers.12.self_attn.out_proj.bias', 'vision_model.encoder.layers.5.mlp.fc2.bias', 'vision_model.encoder.layers.16.mlp.fc2.weight', 'vision_model.encoder.layers.20.mlp.fc2.bias', 'vision_model.encoder.layers.2.self_attn.out_proj.weight', 'vision_model.encoder.layers.23.self_attn.k_proj.bias', 'vision_model.encoder.layers.2.layer_norm2.bias', 'vision_model.encoder.layers.0.mlp.fc2.weight', 'vision_model.encoder.layers.4.self_attn.v_proj.weight', 'vision_model.encoder.layers.14.mlp.fc2.bias', 'vision_model.encoder.layers.10.mlp.fc1.bias', 'vision_model.encoder.layers.5.self_attn.v_proj.weight', 'vision_model.encoder.layers.1.self_attn.q_proj.weight', 'vision_model.encoder.layers.14.self_attn.k_proj.bias', 'vision_model.encoder.layers.14.mlp.fc2.weight', 'vision_model.encoder.layers.19.mlp.fc1.weight', 'vision_model.encoder.layers.9.layer_norm2.weight', 'vision_model.encoder.layers.8.layer_norm1.weight', 'vision_model.encoder.layers.0.self_attn.k_proj.bias', 'vision_model.encoder.layers.6.self_attn.out_proj.bias', 'vision_model.encoder.layers.4.mlp.fc1.weight', 'vision_model.encoder.layers.6.layer_norm1.weight', 'vision_model.encoder.layers.15.self_attn.out_proj.weight', 'vision_model.encoder.layers.2.mlp.fc2.weight', 'vision_model.encoder.layers.3.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.self_attn.k_proj.bias', 'vision_model.encoder.layers.17.mlp.fc2.bias', 'vision_model.encoder.layers.10.mlp.fc2.weight', 'vision_model.encoder.layers.7.mlp.fc1.weight', 'vision_model.encoder.layers.9.self_attn.out_proj.weight', 'vision_model.encoder.layers.7.self_attn.out_proj.bias', 'vision_model.encoder.layers.19.mlp.fc1.bias', 'vision_model.encoder.layers.1.layer_norm2.weight', 'vision_model.encoder.layers.18.self_attn.q_proj.weight', 'vision_model.encoder.layers.2.self_attn.k_proj.weight', 'vision_model.encoder.layers.10.layer_norm1.weight', 'vision_model.encoder.layers.2.layer_norm1.bias', 'vision_model.encoder.layers.6.self_attn.k_proj.bias', 'vision_model.encoder.layers.2.mlp.fc2.bias', 'vision_model.encoder.layers.20.mlp.fc2.weight', 'vision_model.encoder.layers.20.self_attn.k_proj.weight', 'vision_model.encoder.layers.20.self_attn.q_proj.bias', 'vision_model.encoder.layers.11.mlp.fc2.bias', 'vision_model.encoder.layers.13.mlp.fc2.bias', 'vision_model.encoder.layers.0.mlp.fc1.weight', 'vision_model.encoder.layers.7.layer_norm2.bias', 'vision_model.encoder.layers.23.mlp.fc2.bias', 'vision_model.encoder.layers.14.mlp.fc1.weight', 'vision_model.encoder.layers.1.self_attn.k_proj.bias', 'vision_model.encoder.layers.13.mlp.fc2.weight', 'vision_model.encoder.layers.2.layer_norm1.weight', 'vision_model.encoder.layers.3.self_attn.out_proj.bias', 'vision_model.encoder.layers.8.layer_norm2.weight', 'vision_model.encoder.layers.18.layer_norm1.weight', 'vision_model.encoder.layers.21.self_attn.q_proj.weight', 'vision_model.encoder.layers.8.self_attn.v_proj.weight', 'vision_model.encoder.layers.4.mlp.fc2.bias', 'vision_model.encoder.layers.1.self_attn.k_proj.weight', 'vision_model.encoder.layers.1.self_attn.v_proj.bias', 'vision_model.encoder.layers.16.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.self_attn.out_proj.weight', 'vision_model.encoder.layers.23.self_attn.v_proj.weight', 'vision_model.encoder.layers.7.self_attn.v_proj.weight', 'vision_model.encoder.layers.4.self_attn.v_proj.bias', 'vision_model.encoder.layers.11.self_attn.v_proj.weight', 'vision_model.encoder.layers.19.layer_norm2.weight', 'vision_model.encoder.layers.21.mlp.fc1.bias', 'vision_model.encoder.layers.13.self_attn.v_proj.bias', 'vision_model.encoder.layers.12.self_attn.k_proj.weight', 'vision_model.encoder.layers.9.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.layer_norm1.weight', 'vision_model.encoder.layers.23.self_attn.q_proj.bias', 'vision_model.encoder.layers.20.layer_norm2.weight', 'vision_model.encoder.layers.16.mlp.fc1.weight', 'vision_model.encoder.layers.5.self_attn.q_proj.bias', 'vision_model.encoder.layers.23.self_attn.out_proj.bias', 'vision_model.encoder.layers.13.layer_norm2.weight', 'vision_model.encoder.layers.8.self_attn.out_proj.bias', 'vision_model.encoder.layers.18.layer_norm2.bias', 'vision_model.encoder.layers.21.self_attn.out_proj.weight', 'vision_model.encoder.layers.6.self_attn.q_proj.weight', 'vision_model.encoder.layers.14.layer_norm2.bias', 'vision_model.encoder.layers.13.layer_norm1.weight', 'vision_model.encoder.layers.19.self_attn.k_proj.weight', 'vision_model.encoder.layers.16.self_attn.k_proj.weight', 'vision_model.encoder.layers.6.self_attn.v_proj.bias', 'vision_model.encoder.layers.21.self_attn.k_proj.weight', 'vision_model.encoder.layers.4.self_attn.out_proj.bias', 'vision_model.encoder.layers.5.self_attn.v_proj.bias', 'vision_model.encoder.layers.18.layer_norm1.bias', 'vision_model.encoder.layers.13.layer_norm2.bias', 'vision_model.encoder.layers.10.layer_norm1.bias', 'vision_model.encoder.layers.6.mlp.fc2.bias', 'vision_model.encoder.layers.19.layer_norm1.bias', 'vision_model.encoder.layers.10.self_attn.q_proj.bias', 'vision_model.encoder.layers.11.self_attn.out_proj.weight', 'vision_model.encoder.layers.14.layer_norm1.weight', 'vision_model.encoder.layers.17.self_attn.out_proj.bias', 'vision_model.encoder.layers.5.layer_norm1.bias', 'vision_model.encoder.layers.1.layer_norm1.bias', 'vision_model.encoder.layers.23.self_attn.v_proj.bias', 'vision_model.encoder.layers.16.self_attn.q_proj.weight', 'vision_model.encoder.layers.22.layer_norm2.weight', 'vision_model.encoder.layers.4.layer_norm2.bias', 'vision_model.encoder.layers.7.layer_norm1.bias', 'vision_model.encoder.layers.22.mlp.fc2.bias', 'vision_model.pre_layrnorm.bias', 'vision_model.encoder.layers.2.self_attn.v_proj.weight', 'vision_model.encoder.layers.14.mlp.fc1.bias', 'vision_model.encoder.layers.17.mlp.fc2.weight', 'vision_model.encoder.layers.22.self_attn.v_proj.weight', 'text_projection.weight', 'vision_model.encoder.layers.22.self_attn.v_proj.bias', 'vision_model.encoder.layers.1.self_attn.v_proj.weight', 'vision_model.encoder.layers.10.mlp.fc2.bias', 'vision_model.encoder.layers.4.layer_norm2.weight', 'vision_model.encoder.layers.17.self_attn.out_proj.weight', 'vision_model.encoder.layers.17.layer_norm2.bias', 'vision_model.encoder.layers.14.self_attn.out_proj.weight', 'vision_model.encoder.layers.3.mlp.fc2.weight', 'vision_model.encoder.layers.7.self_attn.k_proj.bias', 'vision_model.encoder.layers.9.layer_norm2.bias', 'vision_model.encoder.layers.23.layer_norm2.weight', 'vision_model.encoder.layers.15.mlp.fc2.weight', 'vision_model.encoder.layers.21.mlp.fc1.weight', 'vision_model.encoder.layers.11.layer_norm2.bias', 'vision_model.encoder.layers.22.self_attn.k_proj.weight', 'vision_model.encoder.layers.23.mlp.fc2.weight', 'vision_model.encoder.layers.13.layer_norm1.bias', 'vision_model.encoder.layers.16.self_attn.out_proj.bias', 'vision_model.encoder.layers.22.self_attn.q_proj.bias', 'vision_model.encoder.layers.4.self_attn.k_proj.weight', 'vision_model.encoder.layers.1.layer_norm1.weight', 'vision_model.encoder.layers.10.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.layer_norm2.bias', 'vision_model.encoder.layers.10.layer_norm2.weight', 'vision_model.encoder.layers.10.self_attn.out_proj.weight', 'vision_model.encoder.layers.16.mlp.fc1.bias', 'vision_model.encoder.layers.16.layer_norm1.weight', 'vision_model.encoder.layers.9.mlp.fc1.bias', 'vision_model.encoder.layers.1.self_attn.out_proj.bias', 'vision_model.encoder.layers.0.self_attn.out_proj.bias', 'vision_model.encoder.layers.6.self_attn.v_proj.weight', 'vision_model.encoder.layers.3.self_attn.v_proj.weight', 'vision_model.encoder.layers.6.self_attn.k_proj.weight', 'vision_model.encoder.layers.5.layer_norm2.weight', 'vision_model.encoder.layers.13.mlp.fc1.bias', 'vision_model.encoder.layers.2.mlp.fc1.bias', 'vision_model.encoder.layers.19.mlp.fc2.bias', 'vision_model.encoder.layers.22.layer_norm2.bias', 'vision_model.encoder.layers.19.layer_norm1.weight', 'visual_projection.weight', 'vision_model.encoder.layers.9.self_attn.v_proj.weight', 'vision_model.encoder.layers.7.self_attn.out_proj.weight', 'vision_model.encoder.layers.0.self_attn.q_proj.weight', 'vision_model.encoder.layers.12.layer_norm1.weight', 'vision_model.encoder.layers.6.mlp.fc1.weight', 'vision_model.encoder.layers.10.self_attn.k_proj.weight', 'vision_model.encoder.layers.22.mlp.fc2.weight', 'vision_model.encoder.layers.22.layer_norm1.weight', 'vision_model.encoder.layers.4.self_attn.q_proj.bias', 'vision_model.encoder.layers.11.mlp.fc1.bias', 'vision_model.encoder.layers.15.layer_norm2.weight', 'vision_model.encoder.layers.17.self_attn.v_proj.weight', 'vision_model.encoder.layers.22.self_attn.out_proj.bias', 'vision_model.encoder.layers.18.self_attn.k_proj.bias', 'vision_model.encoder.layers.10.mlp.fc1.weight', 'vision_model.encoder.layers.15.self_attn.q_proj.weight', 'vision_model.encoder.layers.6.layer_norm1.bias', 'vision_model.encoder.layers.5.mlp.fc1.weight', 'vision_model.encoder.layers.21.layer_norm2.bias', 'vision_model.encoder.layers.9.self_attn.k_proj.weight', 'vision_model.encoder.layers.20.layer_norm2.bias', 'vision_model.encoder.layers.21.mlp.fc2.bias', 'vision_model.encoder.layers.6.layer_norm2.bias', 'vision_model.encoder.layers.11.self_attn.out_proj.bias', 'vision_model.encoder.layers.3.self_attn.q_proj.bias', 'vision_model.encoder.layers.19.mlp.fc2.weight', 'vision_model.encoder.layers.9.mlp.fc2.weight', 'vision_model.encoder.layers.18.self_attn.out_proj.weight', 'vision_model.encoder.layers.23.layer_norm1.weight', 'vision_model.encoder.layers.10.self_attn.k_proj.bias', 'vision_model.encoder.layers.13.self_attn.v_proj.weight', 'vision_model.encoder.layers.8.mlp.fc2.bias', 'vision_model.encoder.layers.12.layer_norm2.weight', 'vision_model.encoder.layers.13.self_attn.q_proj.bias', 'vision_model.encoder.layers.23.self_attn.out_proj.weight', 'vision_model.encoder.layers.21.layer_norm1.bias', 'vision_model.encoder.layers.19.self_attn.v_proj.bias', 'vision_model.encoder.layers.9.mlp.fc2.bias', 'vision_model.encoder.layers.15.mlp.fc1.bias', 'vision_model.encoder.layers.16.self_attn.out_proj.weight', 'vision_model.encoder.layers.18.layer_norm2.weight', 'vision_model.encoder.layers.8.mlp.fc2.weight', 'vision_model.encoder.layers.5.self_attn.q_proj.weight', 'vision_model.encoder.layers.1.mlp.fc1.bias', 'vision_model.encoder.layers.16.layer_norm2.weight', 'vision_model.encoder.layers.15.mlp.fc2.bias', 'vision_model.encoder.layers.20.self_attn.v_proj.bias', 'vision_model.encoder.layers.20.self_attn.q_proj.weight', 'vision_model.encoder.layers.10.self_attn.q_proj.weight', 'vision_model.encoder.layers.12.self_attn.v_proj.bias', 'vision_model.encoder.layers.12.self_attn.q_proj.weight', 'vision_model.encoder.layers.14.layer_norm2.weight', 'vision_model.encoder.layers.14.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.self_attn.out_proj.bias', 'vision_model.encoder.layers.23.mlp.fc1.weight', 'vision_model.encoder.layers.16.self_attn.v_proj.bias', 'vision_model.encoder.layers.14.layer_norm1.bias', 'vision_model.encoder.layers.14.self_attn.out_proj.bias', 'vision_model.encoder.layers.16.self_attn.q_proj.bias', 'vision_model.encoder.layers.1.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.mlp.fc2.weight', 'vision_model.encoder.layers.9.self_attn.v_proj.bias', 'vision_model.encoder.layers.3.mlp.fc1.weight', 'vision_model.encoder.layers.9.self_attn.out_proj.bias', 'vision_model.encoder.layers.9.self_attn.q_proj.bias', 'vision_model.encoder.layers.22.self_attn.k_proj.bias', 'vision_model.encoder.layers.11.self_attn.q_proj.bias', 'vision_model.encoder.layers.3.self_attn.k_proj.bias', 'vision_model.encoder.layers.18.self_attn.v_proj.weight', 'vision_model.encoder.layers.11.mlp.fc2.weight', 'vision_model.encoder.layers.12.mlp.fc2.weight', 'vision_model.encoder.layers.11.mlp.fc1.weight', 'vision_model.encoder.layers.17.self_attn.k_proj.bias', 'vision_model.encoder.layers.15.self_attn.v_proj.weight', 'vision_model.encoder.layers.2.self_attn.out_proj.bias', 'vision_model.encoder.layers.20.mlp.fc1.bias', 'vision_model.encoder.layers.6.mlp.fc2.weight', 'vision_model.encoder.layers.15.layer_norm1.bias', 'vision_model.encoder.layers.8.mlp.fc1.weight', 'vision_model.encoder.layers.20.self_attn.k_proj.bias', 'vision_model.encoder.layers.18.self_attn.q_proj.bias', 'vision_model.encoder.layers.12.mlp.fc1.weight', 'vision_model.encoder.layers.18.self_attn.k_proj.weight', 'vision_model.encoder.layers.5.self_attn.k_proj.bias', 'vision_model.encoder.layers.5.mlp.fc1.bias', 'vision_model.encoder.layers.16.mlp.fc2.bias', 'vision_model.encoder.layers.0.mlp.fc1.bias', 'vision_model.encoder.layers.21.self_attn.out_proj.bias', 'vision_model.encoder.layers.23.layer_norm2.bias', 'vision_model.encoder.layers.7.layer_norm1.weight', 'vision_model.encoder.layers.10.self_attn.out_proj.bias', 'vision_model.encoder.layers.20.self_attn.out_proj.weight', 'vision_model.encoder.layers.13.self_attn.k_proj.weight', 'vision_model.encoder.layers.15.self_attn.out_proj.bias', 'vision_model.encoder.layers.22.self_attn.out_proj.weight', 'vision_model.encoder.layers.20.mlp.fc1.weight', 'logit_scale', 'vision_model.encoder.layers.19.layer_norm2.bias', 'vision_model.encoder.layers.13.self_attn.out_proj.weight', 'vision_model.encoder.layers.14.self_attn.v_proj.weight', 'vision_model.encoder.layers.18.mlp.fc2.bias', 'vision_model.encoder.layers.11.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.self_attn.k_proj.bias', 'vision_model.encoder.layers.11.self_attn.q_proj.weight', 'vision_model.encoder.layers.12.mlp.fc1.bias', 'vision_model.encoder.layers.13.self_attn.q_proj.weight']
- This IS expected if you are initializing CLIPTextModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CLIPTextModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Downloading: 100% 961k/961k [00:00<00:00, 6.35MB/s]
Downloading: 100% 525k/525k [00:00<00:00, 3.69MB/s]
Downloading: 100% 389/389 [00:00<00:00, 406kB/s]
Downloading: 100% 905/905 [00:00<00:00, 870kB/s]
Downloading: 100% 4.55k/4.55k [00:00<00:00, 4.02MB/s]
Downloading: 100% 1.22G/1.22G [00:29<00:00, 41.1MB/s]
Downloading: 100% 342/342 [00:00<00:00, 341kB/s]
Traceback (most recent call last):
  File "/content/convert_original_stable_diffusion_to_diffusers.py", line 698, in <module>
    feature_extractor=feature_extractor,
TypeError: __init__() got an unexpected keyword argument 'safety_checker'

System Info

google colab

cant find stable-diffusion-v1-4 when running in terminal

Describe the bug

When running the training script via terminal (ubuntu 18.04) I get an error

train_booth.sh: line 5: --pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4: No such file or directory

This doesn't happen when running the script in Pycharm

Reproduction

accelerate launch train_dreambooth.py
--pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4"
--instance_data_dir="PATH"
--class_data_dir="woman"
--output_dir="PATH"
--with_prior_preservation --prior_loss_weight=1.0
--instance_prompt="propt"
--class_prompt="photo of a woman"
--seed=1337
--resolution=512
--train_batch_size=1
--mixed_precision="fp16"
--use_8bit_adam
--gradient_accumulation_steps=1
--learning_rate=5e-6
--lr_scheduler="constant"
--lr_warmup_steps=0
--num_class_images=50
--sample_batch_size=4
--max_train_steps=1000
--gradient_checkpointing

Logs

CUDA_VISIBLE_DEVICES=1 bash train_booth.sh
The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_cpu_threads_per_process` was set to `8` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
usage: train_dreambooth.py [-h] --pretrained_model_name_or_path PRETRAINED_MODEL_NAME_OR_PATH [--tokenizer_name TOKENIZER_NAME] --instance_data_dir INSTANCE_DATA_DIR [--class_data_dir CLASS_DATA_DIR]
                           [--instance_prompt INSTANCE_PROMPT] [--class_prompt CLASS_PROMPT] [--with_prior_preservation] [--prior_loss_weight PRIOR_LOSS_WEIGHT] [--num_class_images NUM_CLASS_IMAGES] [--output_dir OUTPUT_DIR]
                           [--seed SEED] [--resolution RESOLUTION] [--center_crop] [--train_batch_size TRAIN_BATCH_SIZE] [--sample_batch_size SAMPLE_BATCH_SIZE] [--num_train_epochs NUM_TRAIN_EPOCHS]
                           [--max_train_steps MAX_TRAIN_STEPS] [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS] [--gradient_checkpointing] [--learning_rate LEARNING_RATE] [--scale_lr] [--lr_scheduler LR_SCHEDULER]
                           [--lr_warmup_steps LR_WARMUP_STEPS] [--use_8bit_adam] [--adam_beta1 ADAM_BETA1] [--adam_beta2 ADAM_BETA2] [--adam_weight_decay ADAM_WEIGHT_DECAY] [--adam_epsilon ADAM_EPSILON]
                           [--max_grad_norm MAX_GRAD_NORM] [--push_to_hub] [--hub_token HUB_TOKEN] [--hub_model_id HUB_MODEL_ID] [--logging_dir LOGGING_DIR] [--log_interval LOG_INTERVAL] [--mixed_precision {no,fp16,bf16}]
                           [--not_cache_latents] [--local_rank LOCAL_RANK]
train_dreambooth.py: error: the following arguments are required: --pretrained_model_name_or_path, --instance_data_dir
Traceback (most recent call last):
  File "/home/galgozes/anaconda3/envs/dbooth/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/galgozes/anaconda3/envs/dbooth/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/home/galgozes/anaconda3/envs/dbooth/lib/python3.10/site-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/home/galgozes/anaconda3/envs/dbooth/lib/python3.10/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/galgozes/anaconda3/envs/dbooth/bin/python', 'train_dreambooth.py']' returned non-zero exit status 2.
train_booth.sh: line 5: --pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4: No such file or directory

System Info

  • diffusers version: 0.6.0.dev0
  • Platform: Linux-5.4.0-1089-gcp-x86_64-with-glibc2.27
  • Python version: 3.10.6
  • PyTorch version (GPU?): 1.12.1 (True)
  • Huggingface_hub version: 0.10.1
  • Transformers version: 4.23.1

Getting an error

I've been following this guide to try to get this to work and am getting an error I can't seem to fix.
https://www.youtube.com/watch?v=w6PTviOCYQY

It looks like its a CUDA issue. I ran the following and it looks like it ran fine, but when i run the training sh it throws an error.

--Error--

The following values were not passed to accelerate launch and had defaults used instead:
--num_cpu_threads_per_process was set to 12 to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link

/root/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/paths.py:86: UserWarning: /root/anaconda3/envs/diffusers did not contain libcudart.so as expected! Searching further paths...
warn(
/root/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/paths.py:20: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('CompVis/stable-diffusion-v1-4')}
warn(
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
Traceback (most recent call last):
File "/root/github/diffusers/examples/dreambooth/train_dreambooth.py", line 638, in
main()
File "/root/github/diffusers/examples/dreambooth/train_dreambooth.py", line 429, in main
import bitsandbytes as bnb
File "/root/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/init.py", line 6, in
from .autograd._functions import (
File "/root/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 5, in
import bitsandbytes.functional as F
File "/root/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/functional.py", line 13, in
from .cextension import COMPILED_WITH_CUDA, lib
File "/root/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cextension.py", line 41, in
lib = CUDALibrary_Singleton.get_instance().lib
File "/root/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cextension.py", line 37, in get_instance cls._instance.initialize()
File "/root/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cextension.py", line 15, in initialize
binary_name = evaluate_cuda_setup()
File "/root/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py", line 132, in evaluate_cuda_setup
cc = get_compute_capability(cuda)
File "/root/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py", line 105, in get_compute_capability
ccs = get_compute_capabilities(cuda)
File "/root/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py", line 83, in get_compute_capabilities
check_cuda_result(cuda, cuda.cuDeviceGetCount(ctypes.byref(nGpus)))
AttributeError: 'NoneType' object has no attribute 'cuDeviceGetCount'
Traceback (most recent call last):
File "/root/anaconda3/envs/diffusers/bin/accelerate", line 8, in
sys.exit(main())
File "/root/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/root/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/root/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/anaconda3/envs/diffusers/bin/python', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--instance_data_dir=training', '--class_data_dir=classes', '--output_dir=output', '--instance_prompt=fillertext', '--class_prompt=fillertext', '--seed=3434554', '--resolution=512', '--center_crop', '--train_batch_size=1', '--mixed_precision=fp16', '--use_8bit_adam', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=100', '--sample_batch_size=4', '--max_train_steps=800']' returned non-zero exit status 1.

UnboundLocalError: local variable 'text_encoder' referenced before assignment | OSError: Error no file named model_index.json found

Describe the bug

I've been using dreambooth training script for a while but all of a sudden I am getting this error when the training is complete.

UnboundLocalError: local variable 'text_encoder' referenced before assignment

Since the training completed, I tried to run inference and it says:

OSError: Error no file named model_index.json found

Reproduction

!wget -q https://github.com/ShivamShrirao/diffusers/raw/main/examples/dreambooth/train_dreambooth.py
%pip install -qq git+https://github.com/ShivamShrirao/diffusers
%pip install -q -U --pre triton
%pip install -q accelerate==0.12.0 transformers ftfy bitsandbytes gradio

%pip install -q https://github.com/metrolobo/xformers_wheels/releases/download/1d31a3ac_various_6/xformers-0.0.14.dev0-cp37-cp37m-linux_x86_64.whl

#@markdown Name/Path of the initial model.
MODEL_NAME = "/content/drive/MyDrive/stable_diffusion_weights/andrewjarvis5" #@param {type:"string"}

#@markdown Path for images of the concept for training.
INSTANCE_DIR = "/content/data/andrewjarvis" #@param {type:"string"}
!mkdir -p $INSTANCE_DIR

#@markdown A general name for class like dog for dog images.
CLASS_NAME = "image" #@param {type:"string"}
CLASS_DIR = f"/content/data/{CLASS_NAME}"

#@markdown If model weights should be saved directly in google drive (takes around 4-5 GB).
save_to_gdrive = True #@param {type:"boolean"}
if save_to_gdrive:
from google.colab import drive
drive.mount('/content/drive')

#@markdown Enter the directory name to save model at.

OUTPUT_DIR = "stable_diffusion_weights/andrewjarvis6" #@param {type:"string"}
if save_to_gdrive:
OUTPUT_DIR = "/content/drive/MyDrive/" + OUTPUT_DIR
else:
OUTPUT_DIR = "/content/" + OUTPUT_DIR

print(f"[*] Weights will be saved at {OUTPUT_DIR}")

!mkdir -p $OUTPUT_DIR

#@markdown sks is a rare identifier, feel free to replace it.

accelerate launch train_dreambooth.py
--pretrained_model_name_or_path=$MODEL_NAME
--instance_data_dir=$INSTANCE_DIR
--class_data_dir=$CLASS_DIR
--output_dir=$OUTPUT_DIR
--with_prior_preservation --prior_loss_weight=1.0
--instance_prompt="andrewjarvis {CLASS_NAME}"
--class_prompt="{CLASS_NAME}"
--seed=14100
--resolution=512
--train_batch_size=1
--mixed_precision="fp16"
--use_8bit_adam
--gradient_accumulation_steps=1
--learning_rate=5e-6
--lr_scheduler="constant"
--lr_warmup_steps=0
--num_class_images=50
--sample_batch_size=4
--max_train_steps=3000

Logs

CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 112
CUDA SETUP: Loading binary /usr/local/lib/python3.7/dist-packages/bitsandbytes/libbitsandbytes_cuda112.so...
Caching latents: 100% 5563/5563 [19:19<00:00,  4.80it/s]
Steps: 100% 3000/3000 [46:10<00:00,  1.08it/s, loss=0.19, lr=5e-6]Traceback (most recent call last):
  File "train_dreambooth.py", line 688, in <module>
    main()
  File "train_dreambooth.py", line 676, in main
    text_encoder=accelerator.unwrap_model(text_encoder),
UnboundLocalError: local variable 'text_encoder' referenced before assignment
Steps: 100% 3000/3000 [46:10<00:00,  1.08it/s, loss=0.19, lr=5e-6]
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=/content/drive/MyDrive/stable_diffusion_weights/andrewjarvis5', '--instance_data_dir=/content/data/andrewjarvis', '--class_data_dir=/content/data/image', '--output_dir=/content/drive/MyDrive/stable_diffusion_weights/andrewjarvis6', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=andrewjarvis image', '--class_prompt=image', '--seed=14100', '--resolution=512', '--train_batch_size=1', '--mixed_precision=fp16', '--use_8bit_adam', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=50', '--sample_batch_size=4', '--max_train_steps=3000']' returned non-zero exit status 1.


## After this error trying to run inference
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-7-bb26acbc4cb5> in <module>
      6 model_path = OUTPUT_DIR             # If you want to use previously trained model saved in gdrive, replace this with the full path of model in gdrive
      7 
----> 8 pipe = StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16).to("cuda")
      9 g_cuda = None

1 frames
/usr/local/lib/python3.7/dist-packages/diffusers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
    216             else:
    217                 raise EnvironmentError(
--> 218                     f"Error no file named {cls.config_name} found in directory {pretrained_model_name_or_path}."
    219                 )
    220         else:

OSError: Error no file named model_index.json found in directory/content/drive/MyDrive/stable_diffusion_weights/andrewjarvis6.

System Info

Google Colab Pro

Extending Dreambooth with Kane Wallmann's implementation

This is the relevant repo I'm considering for this feature request.

It's a fork from Joe Penna's implementation, so I believe it uses checkpoint files instead of diffusers, however the implementation seems to be quite interesting and probably methodology independent.

The abstract is simply replacing instance prompt and class prompt with filenames from the training dataset, both for the trained subject and regularization images. The main benefit of this seems to be that we are able to train more than one subject/class to the same model, essentially more of a fine-tuning approach. Here's an example model using this method.

Can this be adapted to your implementation? Since this will require more steps than a regular Dreambooth session to reach satisfactory results and considering Colab's usage quotas, I think that a more streamlined training interruption and continuation would also be important to consider.

Error in the Dreambooth

Describe the bug

It it happens when you start the training. There's not a lot of information on why.

Reproduction

Regular flow of the colab file produces this

Logs

Traceback (most recent call last):
  File "train_dreambooth.py", line 658, in <module>
    main()
  File "train_dreambooth.py", line 510, in main
    for batch in tqdm(train_dataloader, desc="Caching latents"):
  File "/opt/conda/lib/python3.7/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 652, in __next__
    data = self._next_data()
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 692, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "train_dreambooth.py", line 270, in __getitem__
    instance_image = Image.open(self.instance_images_path[index % self.num_instance_images])
ZeroDivisionError: integer division or modulo by zero
Traceback (most recent call last):
  File "/opt/conda/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/opt/conda/lib/python3.7/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=/content/data/tomellis', '--class_data_dir=/content/data/guy', '--output_dir=/content/stable_diffusion_weights/tomellis', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=tomellis', '--class_prompt=tomellis guy']' returned non-zero exit status 1.

System Info

RTX 3090
(Colab A100 too produces an other error in the same step.)

Request: fix for mount issue when resuming (explanation provided)

Is your feature request related to a problem? Please describe.
mount issues when trying to resume from custom folder on this notebook:
https://colab.research.google.com/github/ShivamShrirao/diffusers/blob/main/examples/dreambooth/DreamBooth_Stable_Diffusion.ipynb

Describe the solution you'd like
Need to move the gdrive mount script in "Settings and Run" to the top of the file so that it does not create physical folders before the virtual MyDrive is mounted

eg in order to resume training, my MODEL_NAME=/content/drive/MyDrive/sd/stable_diffusion_weights/jmp909

so it creates a MyDrive physical folder in the colab instance with #!mkdir -p $INSTANCE_DIR prior to the virtual MyDrive being mounted, causing MyDrive to fail to mount

same issue for OUTPUT_DIR again moving the mount first, will stop the mkdir creating a directory in the colab space instead of inside the mounted gdrive

thanks

feature request: respect num_class_images when pointing to existing directory

Is your feature request related to a problem? Please describe.
If I have a folder of say 1500 class image and I set --num_class_images=500 it'll still cache all 1500

Describe the solution you'd like
would be useful if it just took the amount specified in --num_class_images (ie the first 500 out of 1500 in the case above)
potentially leave blank or use -1 to specify "all in folder"

thanks

File model_index.json not found

Describe the bug

Screen Shot 2022-10-21 at 8 54 51 PM

also it doesn't seem to recognize my uploaded screenshot image?
"train_imagic.py: error: unrecognized arguments: Shot 2022-10-20 at 8.38.18 AM.png"

Thank you so much in advance for help!

Reproduction

Just running the cells as is

Logs

No response

System Info

Screen Shot 2022-10-21 at 8 57 46 PM

Still returned non-zero exit status 1.

Describe the bug

(diffusers) zerocool@DESKTOP-IFR8E96:~/github/diffusers/examples/dreambooth$ ./my_training.sh
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_cpu_threads_per_process` was set to `12` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
WARNING:root:Blocksparse is not available: the current GPU does not expose Tensor cores
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Moving 0 files to the new cache system
0it [00:00, ?it/s]
Caching latents:   9%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ                                                             | 4/44 [00:02<00:23,  1.73it/s]
Traceback (most recent call last):
  File "/home/zerocool/github/diffusers/examples/dreambooth/train_dreambooth.py", line 637, in <module>
    main()
  File "/home/zerocool/github/diffusers/examples/dreambooth/train_dreambooth.py", line 492, in main
    for batch in tqdm(train_dataloader, desc="Caching latents"):
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 721, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zerocool/github/diffusers/examples/dreambooth/train_dreambooth.py", line 261, in __getitem__
    instance_image = Image.open(self.instance_images_path[index % self.num_instance_images])
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/PIL/Image.py", line 3147, in open
    raise UnidentifiedImageError(
PIL.UnidentifiedImageError: cannot identify image file '/home/zerocool/github/diffusers/examples/dreambooth/training/is-that-what-you-think-photo-u2.jpg:Zone.Identifier'
Traceback (most recent call last):
  File "/home/zerocool/anaconda3/envs/diffusers/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/zerocool/anaconda3/envs/diffusers/bin/python', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--instance_data_dir=training', '--output_dir=my_model', '--instance_prompt=beaninstance', '--class_prompt=guy', '--resolution=512', '--train_batch_size=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--gradient_accumulation_steps=2', '--gradient_checkpointing', '--max_train_steps=1000']' returned non-zero exit status 1.
(diffusers) zerocool@DESKTOP-IFR8E96:~/github/diffusers/examples/dreambooth$

Reproduction

No response

Logs

No response

System Info

  • diffusers version: 0.4.0.dev0
  • Platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
  • Python version: 3.9.13
  • PyTorch version (GPU?): 1.12.1+cu116 (True)
  • Huggingface_hub version: 0.10.0
  • Transformers version: 4.22.2
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Request: optional backup and restore (1 version)

Is your feature request related to a problem? Please describe.
Since resuming seems to be possible by loading the old weights path as the MODEL_NAME from gdrive, it would be useful to be able to have the option to backup the weights prior to training.

Describe the solution you'd like
option to backup before training and also restore last backup if necessary (1 version is ok, just in case extra training broke something and need to revert)

Describe alternatives you've considered
of course I could just set up a new OUTPUT_DIR I guess rather than setting it the same as the MODEL_NAME?

thanks

Dreambooth colab - add option to reset cuda memory

Is your feature request related to a problem? Please describe.
Sometimes retraining in the same session gives a CUDA memory allocation error

Describe the solution you'd like
suggest potentially adding this as an option to clear memory:

pip install numba

then

from numba import cuda 
device = cuda.get_current_device()
device.reset()

worked for me!

thanks

Extreme degradation of existing model when running the Dreambooth script

Describe the bug

I did some testing regarding the impact of Dreambooth on different prompts, using the same seed.

Pretty much all of my tests produced results similar to this, when running running Dreambooth with class "man" and concept "johnslegers" :

afbeelding

Reproduction

Just run Dreambooth once, with "man" as a class and pretty much anything as a concept identifier.

Then compare output of "man" & a celebrity (eg. "Johnny Depp") of the original model with the new model. You'll notice rather extreme degradation.

I've tried using different config, but to no avail. The degradation persists. The degradation persists no matter how many input pics I use, how many class pics I use, what value I use for prior preservation, etc.

Logs

No response

System Info

The issue is system-independent.

See also huggingface#712

does it work on windows? winerror123 directory syntax incorrect

Describe the bug

im trying to make dreambooth work on windows, but with no luck
the following error appears
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: '"CompVis\stable-diffusion-v1-4"'

full traceback:

launch.bat
accelerate launch train_dreambooth.py --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" --use_auth_token --instance_data_dir="training_data\dog"   --class_data_dir="regularization"   --output_dir="outputs\models\dog"   --with_prior_preservation --prior_loss_weight=1.0   --instance_prompt="A sks dog"   --class_prompt="A dog"   --seed=3434554   --resolution=512   --center_crop   --train_batch_size=1      --use_8bit_adam   --gradient_accumulation_steps=1   --learning_rate=5e-6   --lr_scheduler="constant"   --lr_warmup_steps=0   --num_class_images=12 --sample_batch_size=4   --max_train_steps=800

The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_cpu_threads_per_process` was set to `12` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: Could not find module 'C:\Users\andre\anaconda3\envs\ldm\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.
  warn(f"Failed to load image Python extension: {e}")

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\bitsandbytes\cuda_setup\paths.py:20: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('C')}
  warn(
C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\bitsandbytes\cuda_setup\paths.py:86: UserWarning: C:\Users\andre\anaconda3\envs\ldm did not contain libcudart.so as expected! Searching further paths...
  warn(
Traceback (most recent call last):
  File "train_dreambooth.py", line 658, in <module>
    main()
  File "train_dreambooth.py", line 446, in main
    import bitsandbytes as bnb
  File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\bitsandbytes\__init__.py", line 6, in <module>
    from .autograd._functions import (
  File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\bitsandbytes\autograd\_functions.py", line 5, in <module>
    import bitsandbytes.functional as F
  File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\bitsandbytes\functional.py", line 13, in <module>
    from .cextension import COMPILED_WITH_CUDA, lib
  File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\bitsandbytes\cextension.py", line 41, in <module>
    lib = CUDALibrary_Singleton.get_instance().lib
  File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\bitsandbytes\cextension.py", line 37, in get_instance
    cls._instance.initialize()
  File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\bitsandbytes\cextension.py", line 15, in initialize
    binary_name = evaluate_cuda_setup()
  File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\bitsandbytes\cuda_setup\main.py", line 123, in evaluate_cuda_setup
    cudart_path = determine_cuda_runtime_lib_path()
  File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\bitsandbytes\cuda_setup\paths.py", line 110, in determine_cuda_runtime_lib_path
    cuda_runtime_libs.update(find_cuda_lib_in(value))
  File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\bitsandbytes\cuda_setup\paths.py", line 46, in find_cuda_lib_in
    resolve_paths_list(paths_list_candidate)
  File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\bitsandbytes\cuda_setup\paths.py", line 41, in resolve_paths_list
    return remove_non_existent_dirs(extract_candidate_paths(paths_list_candidate))
  File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\bitsandbytes\cuda_setup\paths.py", line 15, in remove_non_existent_dirs
    non_existent_directories: Set[Path] = {
  File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\bitsandbytes\cuda_setup\paths.py", line 16, in <setcomp>
    path for path in candidate_paths if not path.exists()
  File "C:\Users\andre\anaconda3\envs\ldm\lib\pathlib.py", line 1388, in exists
    self.stat()
  File "C:\Users\andre\anaconda3\envs\ldm\lib\pathlib.py", line 1194, in stat
    return self._accessor.stat(self)
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: '"CompVis\\stable-diffusion-v1-4"'
Traceback (most recent call last):
  File "C:\Users\andre\anaconda3\envs\ldm\Scripts\accelerate-script.py", line 9, in <module>
    sys.exit(main())
  File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\accelerate\commands\accelerate_cli.py", line 43, in main
    args.func(args)
  File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\accelerate\commands\launch.py", line 837, in launch_command
    simple_launcher(args)
  File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\accelerate\commands\launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\andre\\anaconda3\\envs\\ldm\\python.exe', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=training_data\\dog', '--class_data_dir=regularization', '--output_dir=outputs\\models\\dog', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=A sks dog', '--class_prompt=A dog', '--seed=3434554', '--resolution=512', '--center_crop', '--train_batch_size=1', '--use_8bit_adam', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=12', '--sample_batch_size=4', '--max_train_steps=800']' returned non-zero exit status 1.

Reproduction

No response

Logs

No response

System Info

Windows

colab code not running in linux (issue with libtorch_cuda_cpp.so and xformers)

Describe the bug

trying to run the colab code in linux but getting error

first:
libtorch_cuda_cpp.so: cannot open shared object file: No such file or directory
WARNING:root:WARNING: libtorch_cuda_cpp.so: cannot open shared object file: No such file or directory
Need to compile C++ extensions to get sparse attention suport. Please run python setup.py build develop

later:
RuntimeError: No such operator xformers::efficient_attention_forward_generic - did you forget to build xformers with python setup.py develop?
what am I doing wrong?

Reproduction

created a new conda env and ran the lines:

pip install -qq git+https://github.com/ShivamShrirao/diffusers
pip install -q -U --pre triton
pip install -q accelerate==0.12.0 transformers ftfy bitsandbytes gradio
pip install https://github.com/metrolobo/xformers_wheels/releases/download/1d31a3ac_various_6/xformers-0.0.14.dev0-cp37-cp37m-linux_x86_64.whl

then ran the code (using bash):

accelerate launch train_dreambooth.py
--pretrained_model_name_or_path=$MODEL_NAME
--instance_data_dir=$INSTANCE_DIR
--class_data_dir=$CLASS_DIR
--output_dir=$OUTPUT_DIR
--with_prior_preservation --prior_loss_weight=1.0
--instance_prompt="photo of sapirmo {CLASS_NAME}"
--class_prompt="photo of a {CLASS_NAME}"
--seed=1337
--resolution=512
--train_batch_size=1
--mixed_precision="fp16"
--use_8bit_adam
--gradient_accumulation_steps=1
--learning_rate=5e-6
--lr_scheduler="constant"
--lr_warmup_steps=0
--num_class_images=50
--sample_batch_size=4
--max_train_steps=1000
--gradient_checkpointing

Logs

bash train_booth.sh 
libtorch_cuda_cpp.so: cannot open shared object file: No such file or directory
WARNING:root:WARNING: libtorch_cuda_cpp.so: cannot open shared object file: No such file or directory
Need to compile C++ extensions to get sparse attention suport. Please run python setup.py build develop
Fetching 16 files: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 16/16 [00:00<00:00, 32832.13it/s]
The config attributes {'feature_extractor': ['transformers', 'CLIPFeatureExtractor'], 'safety_checker': ['stable_diffusion', 'StableDiffusionSafetyChecker']} were passed to StableDiffusionPipeline, but are not expected and will be ignored. Please verify your model_index.json configuration file.
Generating class images:   0%|                                                                                                                                                                                                                                            | 0/13 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train_dreambooth.py", line 638, in <module>
    main()
  File "train_dreambooth.py", line 381, in main
    images = pipeline(example["prompt"]).images
  File "/home/galgozes/anaconda3/envs/dreambooth/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/galgozes/anaconda3/envs/dreambooth/lib/python3.7/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 312, in __call__
    noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample
  File "/home/galgozes/anaconda3/envs/dreambooth/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/galgozes/anaconda3/envs/dreambooth/lib/python3.7/site-packages/diffusers/models/unet_2d_condition.py", line 286, in forward
    encoder_hidden_states=encoder_hidden_states,
  File "/home/galgozes/anaconda3/envs/dreambooth/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/galgozes/anaconda3/envs/dreambooth/lib/python3.7/site-packages/diffusers/models/unet_blocks.py", line 565, in forward
    hidden_states = attn(hidden_states, context=encoder_hidden_states)
  File "/home/galgozes/anaconda3/envs/dreambooth/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/galgozes/anaconda3/envs/dreambooth/lib/python3.7/site-packages/diffusers/models/attention.py", line 154, in forward
    hidden_states = block(hidden_states, context=context)
  File "/home/galgozes/anaconda3/envs/dreambooth/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/galgozes/anaconda3/envs/dreambooth/lib/python3.7/site-packages/diffusers/models/attention.py", line 203, in forward
    hidden_states = self.attn1(self.norm1(hidden_states)) + hidden_states
  File "/home/galgozes/anaconda3/envs/dreambooth/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/galgozes/anaconda3/envs/dreambooth/lib/python3.7/site-packages/diffusers/models/attention.py", line 276, in forward
    hidden_states = xformers.ops.memory_efficient_attention(query, key, value)
  File "/home/galgozes/anaconda3/envs/dreambooth/lib/python3.7/site-packages/xformers/ops.py", line 575, in memory_efficient_attention
    query=query, key=key, value=value, attn_bias=attn_bias, p=p
  File "/home/galgozes/anaconda3/envs/dreambooth/lib/python3.7/site-packages/xformers/ops.py", line 196, in forward_no_grad
    causal=isinstance(attn_bias, LowerTriangularMask),
  File "/home/galgozes/anaconda3/envs/dreambooth/lib/python3.7/site-packages/xformers/ops.py", line 46, in no_such_operator
    f"No such operator xformers::{name} - did you forget to build xformers with `python setup.py develop`?"
RuntimeError: No such operator xformers::efficient_attention_forward_generic - did you forget to build xformers with `python setup.py develop`?

System Info

  • diffusers version: 0.5.0.dev0
  • Platform: Linux-5.4.0-1087-gcp-x86_64-with-debian-buster-sid
  • Python version: 3.7.13
  • PyTorch version (GPU?): 1.12.1+cu102 (True)
  • Huggingface_hub version: 0.10.1
  • Transformers version: 4.23.1

Dreambooth training

Describe the bug

Hello,

I'd like to ask a question about the training of dreambooth on smaller GPUs: it is reported in README, that 9.82gb should be enogh to run the training with int8 adam, gradients checkpointing enabled and mixed_precision set to fp16.

We are trying ro run the training and it suceeds only if we downscale the resolution to 256 (512 is default i belive). Are those reported values for the 256 resolution and it has not been specified ? We are working with NVIDIA GeForce RTX 2080 Ti 11264MiB.

Reproduction

No response

Logs

No response

System Info

  • diffusers version: 0.4.0.dev0
  • Platform: Linux-5.4.0-125-generic-x86_64-with-glibc2.31
  • Python version: 3.9.14
  • PyTorch version (GPU?): 1.12.1+cu102 (True)
  • Huggingface_hub version: 0.10.0
  • Transformers version: 4.22.2
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Need a bit clearer instructions

Edit: Everything I posted below was before Nerdy Rodents video on Youtube. I was doing it all completely wrong haha. I left a comment below with some tips though because even with his video it took me 2 days to figure out, 3080 10gb here, works great!

OLD POST:

_Thanks for doing this, I have some feedback and issues though. The whole part in the instructions about "running the command" gave me an "export is not a recognized command". It took me forever to realize I only need to run the part "accelerate launch" and after that and adding those additional 2 parameters you mention in the beggining for 10gb vram (because I'm an idiot). But wanted to mention the launch.sh file does not match the instructions either and has different parameters. I tried to edit and use that but also get error. (Im using Windows 11/anaconda also btw)

I also having issues with the fact that I'm pretty new to this and I don't know how to format export MODEL_NAME="(MODEL LOCATION)"
If it's located in C:\models\model.ckpt do I put MODEL_NAME="C:\models\model.ckpt" or simply put "model.ckpt" or "./model.ckpt"

Basically just need ELI5 instructions for people like me. I tried all combination earlier but I still got this error:

The following values were not passed to accelerate launch and had defaults used instead:
--num_cpu_threads_per_process was set to 12 to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
usage: train_dreambooth.py [-h] --pretrained_model_name_or_path PRETRAINED_MODEL_NAME_OR_PATH [--tokenizer_name TOKENIZER_NAME] --instance_data_dir INSTANCE_DATA_DIR [--class_data_dir CLASS_DATA_DIR] [--instance_prompt INSTANCE_PROMPT] [--class_prompt CLASS_PROMPT] [--with_prior_preservation] [--prior_loss_weight PRIOR_LOSS_WEIGHT]
[--num_class_images NUM_CLASS_IMAGES] [--output_dir OUTPUT_DIR] [--seed SEED] [--resolution RESOLUTION] [--center_crop] [--train_batch_size TRAIN_BATCH_SIZE] [--sample_batch_size SAMPLE_BATCH_SIZE] [--num_train_epochs NUM_TRAIN_EPOCHS] [--max_train_steps MAX_TRAIN_STEPS]
[--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS] [--gradient_checkpointing] [--learning_rate LEARNING_RATE] [--scale_lr] [--lr_scheduler LR_SCHEDULER] [--lr_warmup_steps LR_WARMUP_STEPS] [--use_8bit_adam] [--adam_beta1 ADAM_BETA1] [--adam_beta2 ADAM_BETA2] [--adam_weight_decay ADAM_WEIGHT_DECAY]
[--adam_epsilon ADAM_EPSILON] [--max_grad_norm MAX_GRAD_NORM] [--push_to_hub] [--use_auth_token] [--hub_token HUB_TOKEN] [--hub_model_id HUB_MODEL_ID] [--logging_dir LOGGING_DIR] [--log_interval LOG_INTERVAL] [--mixed_precision {no,fp16,bf16}] [--not_cache_latents] [--local_rank LOCAL_RANK]
train_dreambooth.py: error: unrecognized arguments: \ \
Traceback (most recent call last):
File "C:\Users\New\anaconda3\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\New\anaconda3\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\New\anaconda3\Scripts\accelerate.exe_main
.py", line 7, in
File "C:\Users\New\anaconda3\lib\site-packages\accelerate\commands\accelerate_cli.py", line 43, in main
args.func(args)
File "C:\Users\New\anaconda3\lib\site-packages\accelerate\commands\launch.py", line 837, in launch_command
simple_launcher(args)
File "C:\Users\New\anaconda3\lib\site-packages\accelerate\commands\launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\New\anaconda3\python.exe', 'train_dreambooth.py', '--pretrained_model_name_or_path=model.ckpt', '--instance_data_dir=', '--output_dir=/dog2', '--instance_prompt=a photo of sks dog', '--resolution=512', '\', '--train_batch_size=1', '\', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '\', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=400', '--gradient_checkpointing', '--use_8bit_adam']' returned non-zero exit status 2.

EDIT: Accelerate config part also needs instructions for the various popup menus, no idea what deepspeed is or whatever, and a bunch of the other stuff. Do I press 0 in the first part about no distrubuted training if I have a single GPU for instance? Would love to see an option to simply download the zip file instead of git also. Thanks!_

guide on how to run it on windows

can't make it run on windows natively, getting this error after running python train_dreambooth.py

train_dreambooth.py: error: the following arguments are required: --pretrained_m
odel_name_or_path, --pretrained_vae_name_or_path, --instance_data_dir

maybe I'm doing something wrong? the only difference I made is to run python train_dreambooth.py instead of launch.sh for Linux.

Not saving ckpt converted module in my google drive

Describe the bug

I did everything and I can use the moduoe.ckpt in google colab.
I can see the module.ckpt (converted for automatic1111) in the sks folder but only in folder directory on the left side of Colab. When I go to my google drive I can see all the folder but not the module.ckpt in the sks fokder. There is module.json file only.
When I try to download I receive an error after few minutes, I can't download it in my local pc.
I'm going crazy...
thank you

Reproduction

No response

Logs

No response

System Info

windows 11, colab

Error caused by reg_images/images not being a tensor during start of training

Describe the bug

Something is going wrong with my reg class images during training when running with --with_prior_preservation
I can train just fine without --with_prior_preservation on the same images
Using latest code. Any ideas?

Traceback (most recent call last):
  File "train_dreambooth.py", line 658, in <module>
    main()
  File "train_dreambooth.py", line 605, in main
    noise_pred, noise_pred_prior = torch.chunk(noise_pred, 2, dim=0)
TypeError: chunk(): argument 'input' (position 1) must be Tensor, not dict

Reproduction

Here are my training params

accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path="../models/" \
  --instance_data_dir="../images/${FOLDER}" \
  --class_data_dir="../images/${REG_FOLDER}" \
  --output_dir="./output" \
  --with_prior_preservation \
  --instance_prompt="${KEYWORD}" \
  --class_prompt="${REG_KEYWORD}" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=92 \
  --max_train_steps=${STEPS} \
  --mixed_precision="fp16" \
  --not_cache_latents

Logs

I can verify the images and reg_images are both loading into dataloader

System Info

diffusers 0.3.0
torch 1.10.1+cu111
python 3.8

Using WSL2 on windows

CKPT FILES won't load on AUTOMATIC 1111 - it gives an error, see description

Describe the bug

SINCE its a CKPT, I am loading it on the MODEL FOLDER on automatic1111, since thats where i have many OTHER ckpt working models like WAIFU and EMA. As opposed from BIN MODELS in which they go in the EMBEDDINGS FOLDER, so i need to make sure these CKPT FILES also go also in the same folder as the MAIN MODEL.CKPT.

anyways,
I MADE 4 female models, and they all read the same code ""e02601f3""" see below.

start error:

Loading weights [e02601f3] from C:\TEMP\Stable diff\Autom111\stable-diffusion-webui\stable-diffusion-webui\models\Stable-diffusion\lacarlacapellisexy.ckpt
Traceback (most recent call last):
File "C:\TEMP\Stable diff\Autom111\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 273, in run_predict
output = await app.blocks.process_api(
File "C:\TEMP\Stable diff\Autom111\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 742, in process_api
result = await self.call_function(fn_index, inputs, iterator)
File "C:\TEMP\Stable diff\Autom111\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 653, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\TEMP\Stable diff\Autom111\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\TEMP\Stable diff\Autom111\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\TEMP\Stable diff\Autom111\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\TEMP\Stable diff\Autom111\stable-diffusion-webui\stable-diffusion-webui\modules\ui.py", line 955, in run_settings
opts.data_labels[key].onchange()
File "C:\TEMP\Stable diff\Autom111\stable-diffusion-webui\stable-diffusion-webui\webui.py", line 42, in f
res = func(*args, **kwargs)
File "C:\TEMP\Stable diff\Autom111\stable-diffusion-webui\stable-diffusion-webui\webui.py", line 78, in
shared.opts.onchange("sd_model_checkpoint", wrap_queued_call(lambda: modules.sd_models.reload_model_weights(shared.sd_model)))
File "C:\TEMP\Stable diff\Autom111\stable-diffusion-webui\stable-diffusion-webui\modules\sd_models.py", line 176, in reload_model_weights
load_model_weights(sd_model, checkpoint_info.filename, checkpoint_info.hash)
File "C:\TEMP\Stable diff\Autom111\stable-diffusion-webui\stable-diffusion-webui\modules\sd_models.py", line 124, in load_model_weights
pl_sd = torch.load(checkpoint_file, map_location="cpu")
File "C:\TEMP\Stable diff\Autom111\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\torch\serialization.py", line 705, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "C:\TEMP\Stable diff\Autom111\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\torch\serialization.py", line 242, in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

end error

Reproduction

No response

Logs

No response

System Info

sorry i dont know how to get this, but let me know how...

UnboundLocalError: local variable 'num_processes' referenced before assignment

Describe the bug

Do you wish to use FP16 or BF16 (mixed precision)? [NO/fp16/bf16]: fp16
Traceback (most recent call last):
  File "/home/zerocool/anaconda3/envs/diffusers/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/config/__init__.py", line 64, in config_command
    config = get_user_input()
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/config/__init__.py", line 37, in get_user_input
    config = get_cluster_input()
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/config/cluster.py", line 334, in get_cluster_input
    num_processes=num_processes,
UnboundLocalError: local variable 'num_processes' referenced before assignment

Reproduction

No response

Logs

No response

System Info

  • diffusers version: 0.3.0
  • Platform: Windows-10-10.0.19044-SP0
  • Python version: 3.10.6
  • PyTorch version (GPU?): 1.12.1+cu113 (True)
  • Huggingface_hub version: 0.9.1
  • Transformers version: 4.20.0
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

RuntimeError: CUDA out of memory on PC with RTX3060

Describe the bug

1.I run the launch.sh on my PC with RTX3060 which has 12G VRAM, the error info listed below.
2.I try set "export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:32" but not useful.

Reproduction

  1. here is my launch.sh. I hope someone can repoduce the process.

cat launch.sh

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="/workspace/develop/huggingface-diffusers/dreambooth/data/alvan"
export CLASS_DIR="/workspace/develop/huggingface-diffusers/dreambooth/data/dog"
export OUTPUT_DIR="/workspace/develop/huggingface-diffusers/dreambooth/models/dog2"

accelerate launch train_dreambooth.py
--pretrained_model_name_or_path=$MODEL_NAME --use_auth_token
--instance_data_dir=$INSTANCE_DIR
--class_data_dir=$CLASS_DIR
--output_dir=$OUTPUT_DIR
--with_prior_preservation --prior_loss_weight=1.0
--instance_prompt="photo of sks dog"
--class_prompt="photo of a dog"
--seed=3434554
--resolution=512
--center_crop
--train_batch_size=1
--mixed_precision="fp16"
--use_8bit_adam
--gradient_accumulation_steps=1 --gradient_checkpointing
--learning_rate=5e-6
--lr_scheduler="constant"
--lr_warmup_steps=0
--num_class_images=12
--sample_batch_size=4
--max_train_steps=800

Logs

/workspace/develop/dreambooth# ./launch.sh
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_cpu_threads_per_process` was set to `8` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
/opt/conda/envs/huggingface/lib/python3.8/site-packages/bitsandbytes/cuda_setup/paths.py:86: UserWarning: /opt/conda/envs/huggingface did not contain libcudart.so as expected! Searching further paths...
  warn(
/opt/conda/envs/huggingface/lib/python3.8/site-packages/bitsandbytes/cuda_setup/paths.py:20: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/cuda/compat/lib'), PosixPath('/usr/local/nvidia/lib64')}
  warn(
/opt/conda/envs/huggingface/lib/python3.8/site-packages/bitsandbytes/cuda_setup/paths.py:98: UserWarning: /opt/conda/lib/python3.8/site-packages/torch/lib:/opt/conda/lib/python3.8/site-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain libcudart.so as expected! Searching further paths...
  warn(
/opt/conda/envs/huggingface/lib/python3.8/site-packages/bitsandbytes/cuda_setup/paths.py:20: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('CompVis/stable-diffusion-v1-4')}
  warn(
/opt/conda/envs/huggingface/lib/python3.8/site-packages/bitsandbytes/cuda_setup/paths.py:20: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('41091'), PosixPath('http'), PosixPath('//172.30.0.1')}
  warn(
/opt/conda/envs/huggingface/lib/python3.8/site-packages/bitsandbytes/cuda_setup/paths.py:20: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('socks5'), PosixPath('1090'), PosixPath('//172.30.0.1')}
  warn(
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /opt/conda/envs/huggingface/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Caching latents: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 12/12 [00:05<00:00,  2.40it/s]
Steps:   0%|                                                                                    | 0/800 [00:00<?, ?it/s]Traceback (most recent call last):
  File "train_dreambooth.py", line 658, in <module>
    main()
  File "train_dreambooth.py", line 618, in main
    accelerator.backward(loss)
  File "/opt/conda/envs/huggingface/lib/python3.8/site-packages/accelerate/accelerator.py", line 882, in backward
    self.scaler.scale(loss).backward(**kwargs)
  File "/opt/conda/envs/huggingface/lib/python3.8/site-packages/torch/_tensor.py", line 402, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/opt/conda/envs/huggingface/lib/python3.8/site-packages/torch/autograd/__init__.py", line 191, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/opt/conda/envs/huggingface/lib/python3.8/site-packages/torch/autograd/function.py", line 253, in apply
    return user_fn(self, *args)
  File "/opt/conda/envs/huggingface/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 151, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/opt/conda/envs/huggingface/lib/python3.8/site-packages/torch/autograd/__init__.py", line 191, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 12.00 GiB total capacity; 9.04 GiB already allocated; 384.22 MiB free; 10.48 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Steps:   0%|                                                                                    | 0/800 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/opt/conda/envs/huggingface/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/envs/huggingface/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/opt/conda/envs/huggingface/lib/python3.8/site-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/opt/conda/envs/huggingface/lib/python3.8/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/envs/huggingface/bin/python3.8', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=/workspace/develop/huggingface-diffusers/dreambooth/data/alvan', '--class_data_dir=/workspace/develop/huggingface-diffusers/dreambooth/data/dog', '--output_dir=/workspace/develop/huggingface-diffusers/dreambooth/models/dog2', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=photo of sks dog', '--class_prompt=photo of a dog', '--seed=3434554', '--resolution=512', '--center_crop', '--train_batch_size=1', '--mixed_precision=fp16', '--use_8bit_adam', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=12', '--sample_batch_size=4', '--max_train_steps=800']' returned non-zero exit status 1.

System Info

  • diffusers version: 0.4.0.dev0
  • Platform: Linux-5.15.62.1-microsoft-standard-WSL2-x86_64-with-glibc2.10
  • Python version: 3.8.13
  • PyTorch version (GPU?): 1.13.0a0+d321be6 (True)
  • Huggingface_hub version: 0.9.1
  • Transformers version: 4.22.2
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

PyTorch version (GPU?): 1.12.1+cu116 (False)

Describe the bug

When i write diffusers-cli env, it says:

PyTorch version (GPU?): 1.12.1+cu116 (False)

Why?

Reproduction

No response

Logs

No response

System Info

  • diffusers version: 0.4.0.dev0
  • Platform: Linux-4.4.0-19041-Microsoft-x86_64-with-glibc2.31
  • Python version: 3.9.13
  • PyTorch version (GPU?): 1.12.1+cu116 (False)
  • Huggingface_hub version: 0.10.0
  • Transformers version: 4.22.2
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

CUDA Error

Describe the bug

Start Training step produces a CUBLAS_STATUS_INTERNAL_ERROR

Reproduction

Do the regular steps with
--max_train_steps=1500

Logs

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:99: UserWarning: /usr/lib64-nvidia did not contain libcudart.so as expected! Searching further paths...
  f'{candidate_env_vars["LD_LIBRARY_PATH"]} did not contain '
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('"/usr/local/bin/dap_multiplexer","enableLsp"'), PosixPath('true}'), PosixPath('{"kernelManagerProxyPort"'), PosixPath('"172.28.0.3","jupyterArgs"'), PosixPath('6000,"kernelManagerProxyHost"'), PosixPath('["--ip=172.28.0.2"],"debugAdapterMultiplexerPath"')}
  "WARNING: The following directories listed in your path were found to "
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('module'), PosixPath('//ipykernel.pylab.backend_inline')}
  "WARNING: The following directories listed in your path were found to "
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
  "WARNING: The following directories listed in your path were found to "
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 111
CUDA SETUP: Loading binary /usr/local/lib/python3.7/dist-packages/bitsandbytes/libbitsandbytes_cuda111.so...
Caching latents: 100% 26/26 [00:02<00:00,  9.91it/s]
Steps:   0% 0/1500 [00:00<?, ?it/s]Traceback (most recent call last):
  File "train_dreambooth.py", line 658, in <module>
    main()
  File "train_dreambooth.py", line 600, in main
    noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/utils/operations.py", line 507, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/usr/local/lib/python3.7/dist-packages/torch/amp/autocast_mode.py", line 12, in decorate_autocast
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_2d_condition.py", line 309, in forward
    upsample_size=upsample_size,
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_blocks.py", line 1151, in forward
    hidden_states = attn(hidden_states, context=encoder_hidden_states)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 154, in forward
    hidden_states = block(hidden_states, context=context)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 205, in forward
    hidden_states = self.ff(self.norm3(hidden_states)) + hidden_states
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 335, in forward
    return self.net(hidden_states)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)`
Steps:   0% 0/1500 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=/content/data/fatihInput', '--class_data_dir=/content/data/person', '--output_dir=/content/drive/MyDrive/stable_diffusion_weights/fatihOutput', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=fatih', '--class_prompt=person', '--seed=1337', '--resolution=512', '--center_crop', '--train_batch_size=1', '--mixed_precision=fp16', '--use_8bit_adam', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=12', '--sample_batch_size=4', '--max_train_steps=1500']' returned non-zero exit status 1.

System Info

A100 Google Colab Pro

WARNING: The following directories listed in your path were found to be non-existent

Describe the bug

(diffusers) zerocool@DESKTOP-IFR8E96:~/github/diffusers/examples/dreambooth$ ./my_training_press.sh
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_cpu_threads_per_process` was set to `12` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
WARNING:root:Blocksparse is not available: the current GPU does not expose Tensor cores

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/paths.py:86: UserWarning: /home/zerocool/anaconda3/envs/diffusers did not contain libcudart.so as expected! Searching further paths...
  warn(
/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/paths.py:98: UserWarning: /usr/lib/wsl/lib: did not contain libcudart.so as expected! Searching further paths...
  warn(
/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/paths.py:20: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('CompVis/stable-diffusion-v1-4')}
  warn(
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117_nocublaslt.so...
Caching latents:  18%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‹                                                  | 1771/10110 [06:46<31:51,  4.36it/s]

Can i ignore these warnings?

Reproduction

No response

Logs

No response

System Info

  • diffusers version: 0.4.0.dev0
  • Platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
  • Python version: 3.9.13
  • PyTorch version (GPU?): 1.12.1+cu116 (True)
  • Huggingface_hub version: 0.10.0
  • Transformers version: 4.22.2
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

NotImplementedError when running training on WSL Ubuntu

Hello! I am trying to get diffusers dreambooth running with WSL Ubuntu on my 3090ti computer. I have been following this youtube video by nerdy rodent. However when I run the training I receive this error:

--- Full Error Spoiler ---
(diffusers) user@DESKTOP-3PGA1RN:~/github/diffusers/examples/dreambooth$ ./my_training.sh
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_cpu_threads_per_process` was set to `8` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Caching latents: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 200/200 [59:25<00:00, 17.83s/it]
Steps:   0%|                                                                                    | 0/800 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/user/github/diffusers/examples/dreambooth/train_dreambooth.py", line 637, in <module>
    main()
  File "/home/user/github/diffusers/examples/dreambooth/train_dreambooth.py", line 582, in main
    noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/diffusers/models/unet_2d_condition.py", line 283, in forward
    sample, res_samples = downsample_block(
  File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/diffusers/models/unet_blocks.py", line 565, in forward
    hidden_states = attn(hidden_states, context=encoder_hidden_states)
  File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/diffusers/models/attention.py", line 154, in forward
    hidden_states = block(hidden_states, context=context)
  File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/diffusers/models/attention.py", line 203, in forward
    hidden_states = self.attn1(self.norm1(hidden_states)) + hidden_states
  File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/diffusers/models/attention.py", line 276, in forward
    hidden_states = xformers.ops.memory_efficient_attention(query, key, value)
  File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/xformers/ops.py", line 568, in memory_efficient_attention
    op = AttentionOpDispatch.from_arguments(
  File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/xformers/ops.py", line 531, in op
    raise NotImplementedError(f"No operator found for this attention: {self}")
NotImplementedError: No operator found for this attention: AttentionOpDispatch(dtype=torch.float32, device=device(type='cpu'), k=40, has_dropout=False, attn_bias_type=<class 'NoneType'>, kv_len=4096, q_len=4096)
Steps:   0%|                                                                                    | 0/800 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/user/anaconda3/envs/diffusers/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/user/anaconda3/envs/diffusers/bin/python', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--instance_data_dir=training', '--class_data_dir=classes', '--output_dir=output_models', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=a photo of sks dog', '--class_prompt=a photo of dog', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=200', '--max_train_steps=800']' returned non-zero exit status 1.
(diffusers) user@DESKTOP-3PGA1RN:~/github/diffusers/examples/dreambooth$

I have 16 images of my subject in ./training and 200 images of the subject class in ./classes.

It seems to me that maybe it is trying to run on the cpu but I don't really know for sure.
NotImplementedError: No operator found for this attention: AttentionOpDispatch(dtype=torch.float32, device=device(type='cpu'), k=40, has_dropout=False, attn_bias_type=<class 'NoneType'>, kv_len=4096, q_len=4096)

If anyone has any insight into this error I would very much appreciate your wisdom.

Returned non-zero exit status 1.

Describe the bug

(diffusers) zerocool@DESKTOP-IFR8E96:~/github/diffusers/examples/dreambooth$ ./my_training.sh
The following values were not passed to accelerate launch and had defaults used instead:
--num_cpu_threads_per_process was set to 12 to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
Traceback (most recent call last):
File "/home/zerocool/github/diffusers/examples/dreambooth/train_dreambooth.py", line 637, in
main()
File "/home/zerocool/github/diffusers/examples/dreambooth/train_dreambooth.py", line 481, in main
train_dataloader = torch.utils.data.DataLoader(
File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 353, in init
sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type]
File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/utils/data/sampler.py", line 107, in init
raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0
Traceback (most recent call last):
File "/home/zerocool/anaconda3/envs/diffusers/bin/accelerate", line 8, in
sys.exit(main())
File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/zerocool/anaconda3/envs/diffusers/bin/python', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--instance_data_dir=training', '--output_dir=my_model', '--instance_prompt=beaninstance', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=1000', '--gradient_accumulation_steps=2', '--gradient_checkpointing']' returned non-zero exit status 1.

Reproduction

No response

Logs

No response

System Info

  • diffusers version: 0.4.0.dev0
  • Platform: Linux-4.4.0-19041-Microsoft-x86_64-with-glibc2.31
  • Python version: 3.9.13
  • PyTorch version (GPU?): 1.12.1+cu116 (False)
  • Huggingface_hub version: 0.10.0
  • Transformers version: 4.22.2
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

RuntimeError: expected scalar type Half but found Float

Describe the bug

The code throws this error:

RuntimeError: expected scalar type Half but found Float

I tried different environments, closing and reopening mixed_precision in the config but it still throws this error.

Could you help me please?

Reproduction

No response

Logs

Time to load utils op: 0.0018088817596435547 seconds
Steps:   0%|                                                                                                                                                             | 0/800 [00:00<?, ?it/s]Traceback (most recent call last):
  File "train_dreambooth.py", line 646, in <module>
    main()
  File "train_dreambooth.py", line 591, in main
    noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
    return func(*args, **kwargs)
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/deepspeed/runtime/engine.py", line 1666, in forward
    loss = self.module(*inputs, **kwargs)
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/diffusers/models/unet_2d_condition.py", line 299, in forward
    encoder_hidden_states=encoder_hidden_states,
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/diffusers/models/unet_blocks.py", line 559, in forward
    create_custom_forward(attn), hidden_states, encoder_hidden_states
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 235, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 96, in forward
    outputs = run_function(*args)
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/diffusers/models/unet_blocks.py", line 553, in custom_forward
    return module(*inputs)
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/diffusers/models/attention.py", line 169, in forward
    hidden_states = block(hidden_states, context=context)
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/diffusers/models/attention.py", line 218, in forward
    hidden_states = self.attn1(self.norm1(hidden_states)) + hidden_states
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/diffusers/models/attention.py", line 297, in forward
    return self.to_out(hidden_states)
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lm/miniconda3/envs/dreambooth/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Half but found Float

System Info

  • diffusers version: 0.6.0.dev0
  • Platform: Linux-5.15.0-50-generic-x86_64-with-debian-bullseye-sid
  • Python version: 3.7.13
  • PyTorch version (GPU?): 1.12.1 (True)
  • Huggingface_hub version: 0.10.1
  • Transformers version: 4.23.1
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Repository Not Found for url: https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/tokenizer/vocab.json.

Describe the bug

I followed all the steps from this video:

https://www.youtube.com/watch?v=w6PTviOCYQY

(diffusers) zerocool@DESKTOP-IFR8E96:~/github/diffusers/examples/dreambooth$ ./my_training.sh
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_cpu_threads_per_process` was set to `12` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Traceback (most recent call last):
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 213, in hf_raise_for_status
    response.raise_for_status()
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/tokenizer/vocab.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/transformers/utils/hub.py", line 408, in cached_file
    resolved_file = hf_hub_download(
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1053, in hf_hub_download
    metadata = get_hf_file_metadata(
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1359, in get_hf_file_metadata
    hf_raise_for_status(r)
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 242, in hf_raise_for_status
    raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: baPvU1nBim9S_79aRkpng)

Repository Not Found for url: https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/tokenizer/vocab.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If the repo is private, make sure you are authenticated.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zerocool/github/diffusers/examples/dreambooth/train_dreambooth.py", line 657, in <module>
    main()
  File "/home/zerocool/github/diffusers/examples/dreambooth/train_dreambooth.py", line 420, in main
    tokenizer = CLIPTokenizer.from_pretrained(
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1734, in from_pretrained
    resolved_vocab_files[file_id] = cached_file(
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/transformers/utils/hub.py", line 423, in cached_file
    raise EnvironmentError(
OSError: CompVis/stable-diffusion-v1-4 is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.
Traceback (most recent call last):
  File "/home/zerocool/anaconda3/envs/diffusers/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/zerocool/anaconda3/envs/diffusers/bin/python', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--instance_data_dir=training', '--output_dir=my_model', '--instance_prompt=beaninstance', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=1000']' returned non-zero exit status 1.
./my_training.sh: line 17: --gradient_accumulation_steps=2: command not found

Reproduction

No response

Logs

No response

System Info

  • diffusers version: 0.4.0.dev0
  • Platform: Linux-4.4.0-19041-Microsoft-x86_64-with-glibc2.31
  • Python version: 3.9.13
  • PyTorch version (GPU?): 1.12.1+cu116 (False)
  • Huggingface_hub version: 0.10.0
  • Transformers version: 4.22.2
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

not mounting with drive correctly

Describe the bug

image
image
not sure why this is happening. occurs when I run the run conversion step. upon looking at my drive it seems it never saved it to my drive.

Reproduction

just run the collab I guess

Logs

Traceback (most recent call last):
  File "convert_diffusers_to_original_stable_diffusion.py", line 215, in <module>
    unet_state_dict = torch.load(unet_path, map_location="cpu")
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 699, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 230, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 211, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/MyDrive/stable_diffusion_weights/sks/unet/diffusion_pytorch_model.bin'
[*] Converted ckpt saved at /content/drive/MyDrive/stable_diffusion_weights/sks/model.ckpt








just now I realized earlier on there was this: 
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:99: UserWarning: /usr/lib64-nvidia did not contain libcudart.so as expected! Searching further paths...
  f'{candidate_env_vars["LD_LIBRARY_PATH"]} did not contain '
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('["--ip=172.28.0.2"],"debugAdapterMultiplexerPath"'), PosixPath('true}'), PosixPath('{"kernelManagerProxyPort"'), PosixPath('6000,"kernelManagerProxyHost"'), PosixPath('"172.28.0.3","jupyterArgs"'), PosixPath('"/usr/local/bin/dap_multiplexer","enableLsp"')}
  "WARNING: The following directories listed in your path were found to "
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//ipykernel.pylab.backend_inline'), PosixPath('module')}
  "WARNING: The following directories listed in your path were found to "
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
  "WARNING: The following directories listed in your path were found to "
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 112
CUDA SETUP: Loading binary /usr/local/lib/python3.7/dist-packages/bitsandbytes/libbitsandbytes_cuda112.so...
Caching latents:  15% 3/20 [00:02<00:12,  1.35it/s]
Traceback (most recent call last):
  File "train_dreambooth.py", line 637, in <module>
    main()
  File "train_dreambooth.py", line 492, in main
    for batch in tqdm(train_dataloader, desc="Caching latents"):
  File "/usr/local/lib/python3.7/dist-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 721, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "train_dreambooth.py", line 264, in __getitem__
    example["instance_images"] = self.image_transforms(instance_image)
  File "/usr/local/lib/python3.7/dist-packages/torchvision/transforms/transforms.py", line 94, in __call__
    img = t(img)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torchvision/transforms/transforms.py", line 349, in forward
    return F.resize(img, self.size, self.interpolation, self.max_size, self.antialias)
  File "/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional.py", line 430, in resize
    return F_pil.resize(img, size=size, interpolation=pil_interpolation, max_size=max_size)
  File "/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional_pil.py", line 275, in resize
    return img.resize((new_w, new_h), interpolation)
  File "/usr/local/lib/python3.7/dist-packages/PIL/Image.py", line 1886, in resize
    self.load()
  File "/usr/local/lib/python3.7/dist-packages/PIL/ImageFile.py", line 247, in load
    "(%d bytes not processed)" % len(b)
OSError: image file is truncated (76 bytes not processed)
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--instance_data_dir=/content/data/sks', '--class_data_dir=/content/data/guy', '--output_dir=/content/drive/MyDrive/stable_diffusion_weights/sks', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=photo of sks guy', '--class_prompt=photo of a guy', '--seed=1337', '--resolution=512', '--train_batch_size=1', '--mixed_precision=fp16', '--use_8bit_adam', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=20', '--sample_batch_size=4', '--max_train_steps=900']' returned non-zero exit status 1

System Info

google collab

Returned non-zero exit status 1. (Already using WSL 2) can someone help please?

Describe the bug

(diffusers) zerocool@DESKTOP-IFR124:~/github/diffusers/examples/dreambooth$ ./my_training.sh
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_cpu_threads_per_process` was set to `12` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
WARNING:root:Blocksparse is not available: the current GPU does not expose Tensor cores

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/paths.py:86: UserWarning: /home/zerocool/anaconda3/envs/diffusers did not contain libcudart.so as expected! Searching further paths...
  warn(
/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/paths.py:98: UserWarning: /usr/lib/wsl/lib: did not contain libcudart.so as expected! Searching further paths...
  warn(
/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/paths.py:20: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('CompVis/stable-diffusion-v1-4')}
  warn(
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117_nocublaslt.so...
Caching latents: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 27/27 [00:03<00:00,  7.42it/s]
Traceback (most recent call last):
  File "/home/zerocool/github/diffusers/examples/dreambooth/train_dreambooth.py", line 637, in <module>
    main()
  File "/home/zerocool/github/diffusers/examples/dreambooth/train_dreambooth.py", line 533, in main
    accelerator.init_trackers("dreambooth", config=vars(args))
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/accelerator.py", line 1061, in init_trackers
    tracker_init(project_name, self.logging_dir, **init_kwargs.get(str(tracker), {}))
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/tracking.py", line 133, in __init__
    self.writer = tensorboard.SummaryWriter(self.logging_dir, **kwargs)
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/utils/tensorboard/writer.py", line 246, in __init__
    self._get_file_writer()
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/utils/tensorboard/writer.py", line 276, in _get_file_writer
    self.file_writer = FileWriter(
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/utils/tensorboard/writer.py", line 75, in __init__
    self.event_writer = EventFileWriter(
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/tensorboard/summary/writer/event_file_writer.py", line 72, in __init__
    tf.io.gfile.makedirs(logdir)
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/tensorboard/compat/tensorflow_stub/io/gfile.py", line 900, in makedirs
    return get_filesystem(path).makedirs(path)
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/tensorboard/compat/tensorflow_stub/io/gfile.py", line 201, in makedirs
    os.makedirs(path, exist_ok=True)
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/os.py", line 225, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: 'my_model/logs'
Traceback (most recent call last):
  File "/home/zerocool/anaconda3/envs/diffusers/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/zerocool/anaconda3/envs/diffusers/bin/python', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--instance_data_dir=training', '--output_dir=my_model', '--instance_prompt=beaninstance', '--class_prompt=guy', '--resolution=512', '--train_batch_size=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--gradient_accumulation_steps=2', '--gradient_checkpointing', '--use_8bit_adam', '--max_train_steps=1000']' returned non-zero exit status 1.

Reproduction

No response

Logs

No response

System Info

  • diffusers version: 0.4.0.dev0
  • Platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
  • Python version: 3.9.13
  • PyTorch version (GPU?): 1.12.1+cu116 (True)
  • Huggingface_hub version: 0.10.0
  • Transformers version: 4.22.2
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.