Comments (10)
which line did the error occur in the training script.py ?
from diffusers.
which line did the error occur in the training script.py ?
Seemed on line 1473 .
# Predict the noise residual
model_pred = transformer(
hidden_states=noisy_model_input,
timestep=timesteps,
encoder_hidden_states=prompt_embeds,
pooled_projections=pooled_prompt_embeds,
return_dict=False,
)[0]
from diffusers.
which line did the error occur in the training script.py ?
Seemed on line 1473 .
# Predict the noise residual model_pred = transformer( hidden_states=noisy_model_input, timestep=timesteps, encoder_hidden_states=prompt_embeds, pooled_projections=pooled_prompt_embeds, return_dict=False, )[0]
Try train_batch_size=1 ? I remember that the SD1.x /SD2.0 Colab dreambooth tutorial suggest setting train_batch to 1 when prior_preservation = True
update: This should solve the problem~~ I've reproduce the same exception. Feel free to share the prior_preservation effect with me. My result with prior is worse then without it, strange
from diffusers.
which line did the error occur in the training script.py ?
Seemed on line 1473 .
# Predict the noise residual model_pred = transformer( hidden_states=noisy_model_input, timestep=timesteps, encoder_hidden_states=prompt_embeds, pooled_projections=pooled_prompt_embeds, return_dict=False, )[0]Try train_batch_size=1 ? I remember that the SD1.x /SD2.0 Colab dreambooth tutorial suggest setting train_batch to 1 when prior_preservation = True
update: This should solve the problem~~ I've reproduce the same exception. Feel free to share the prior_preservation effect with me. My result with prior is worse then without it, strange
Hi, I tried 1/2/4, none of them worked. But I will try again...
from diffusers.
Thanks for reporting. Will investigate :)
from diffusers.
On both scripts, train_dreambooth_lora_sd3.py and train_dreambooth_sd3.py got the same error for any batch size except 1.
Also never managed to run train_dreambooth_sd3, it fails with
File "/root/train_dreambooth_sd3.py", line 1782, in <module>
main(args)
File "/root/train_dreambooth_sd3.py", line 1570, in main
model_pred = transformer(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/opt/conda/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/diffusers-0.30.0.dev0-py3.10.egg/diffusers/models/transformers/transformer_sd3.py", line 309, in forward
hidden_states = torch.utils.checkpoint.checkpoint(
File "/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 251, in checkpoint
return _checkpoint_without_reentrant(
File "/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 432, in _checkpoint_without_reentrant
output = function(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/diffusers-0.30.0.dev0-py3.10.egg/diffusers/models/transformers/transformer_sd3.py", line 304, in custom_forward
return module(*inputs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/diffusers-0.30.0.dev0-py3.10.egg/diffusers/models/attention.py", line 162, in forward
norm_hidden_states, gate_msa, shift_mlp, scale_mlp, gate_mlp = self.norm1(hidden_states, emb=temb)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/diffusers-0.30.0.dev0-py3.10.egg/diffusers/models/normalization.py", line 83, in forward
x = self.norm(x) * (1 + scale_msa[:, None]) + shift_msa[:, None]
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 190, in forward
return F.layer_norm(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
TypeError: layer_norm(): argument 'input' (position 1) must be Tensor, not tuple
Oh, and there is probably bug on line 1582 of train_dreambooth_sd3.py in
prompt_embeds, pooled_prompt_embeds = encode_prompt(
text_encoders=[text_encoder_one, text_encoder_two, text_encoder_three],
tokenizers=None,
prompt=None,
text_input_ids_list=[tokens_one, tokens_two, tokens_three],
)
There is no such parameter 'text_input_ids_list' in encode_prompt - script fails if --train_text_encoder is set.
from diffusers.
I will look into the problem of training failure with batch size > 1 but could you please report a new issue for me? That would be much appreciated.
from diffusers.
With the latest version of the script, I am able to train with prior preservation, though:
export MODEL_NAME="stabilityai/stable-diffusion-3-medium-diffusers"
export INSTANCE_DIR="dog"
export CLASS_DIR="dog-class"
export OUTPUT_DIR="/raid/.cache/huggingface/trained-sd3"
accelerate launch train_dreambooth_sd3.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--class_data_dir=$CLASS_DIR \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--mixed_precision="fp16" \
--with_prior_preservation --prior_loss_weight=0.9 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=1024 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--learning_rate=1e-4 \
--report_to="wandb" \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=10 \
--validation_prompt="A photo of sks dog in a bucket" \
--validation_epochs=1 \
--seed="0"
It's a dummy run so don't consider the results, but here's the wandb dashabord: https://wandb.ai/sayakpaul/dreambooth-sd3/runs/phkljaow
What am I missing?
from diffusers.
Unable to also reproduce your error when training with a batch size > 1.
Command:
export MODEL_NAME="stabilityai/stable-diffusion-3-medium-diffusers"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="/raid/.cache/huggingface/trained-sd3"
accelerate launch train_dreambooth_sd3.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--mixed_precision="fp16" \
--instance_prompt="a photo of sks dog" \
--resolution=1024 \
--train_batch_size=2 \
--gradient_accumulation_steps=4 \
--learning_rate=1e-4 \
--report_to="wandb" \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=10 \
--validation_prompt="A photo of sks dog in a bucket" \
--validation_epochs=1 \
--seed="0"
WandB: https://wandb.ai/sayakpaul/dreambooth-sd3/runs/uj8ms66k
from diffusers.
Unable to also reproduce your error when training with a batch size > 1.
Command:
export MODEL_NAME="stabilityai/stable-diffusion-3-medium-diffusers" export INSTANCE_DIR="dog" export OUTPUT_DIR="/raid/.cache/huggingface/trained-sd3" accelerate launch train_dreambooth_sd3.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --instance_data_dir=$INSTANCE_DIR \ --output_dir=$OUTPUT_DIR \ --mixed_precision="fp16" \ --instance_prompt="a photo of sks dog" \ --resolution=1024 \ --train_batch_size=2 \ --gradient_accumulation_steps=4 \ --learning_rate=1e-4 \ --report_to="wandb" \ --lr_scheduler="constant" \ --lr_warmup_steps=0 \ --max_train_steps=10 \ --validation_prompt="A photo of sks dog in a bucket" \ --validation_epochs=1 \ --seed="0"WandB: https://wandb.ai/sayakpaul/dreambooth-sd3/runs/uj8ms66k
Thanks for the update, I will try again later. Also could it be related to "bf16" precision? (maybe?)
from diffusers.
Related Issues (20)
- Getting OOM error when using "--caption_column="caption"". HOT 9
- controlnet singlefile dont have config.json HOT 15
- Support `fuse_lora` on Stable Diffusion 3
- Classifier free guidance(CFG) on different prediction types and karras style schedulers HOT 2
- High Batch Size with SD3 Dreambooth Destabilizes Training HOT 1
- Running stable diffusion 3 medium : fused_layer_norm_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol HOT 2
- More thorough guidance for multiple IP adapter images/masks and a single IP Adapter HOT 9
- Stable audio open diffusers version HOT 1
- ImportError: cannot import name 'PixArtSigmaPipeline' from 'diffusers'
- getting bug on diffuser example HOT 7
- Integrate Lumina-T2X HOT 4
- Loading T5 encoder separately with StableDiffusion3Pipeline causes meta tensor error on sequential/model cpu offload HOT 5
- There is no create_diffusers_controlnet_model_from_ldm function in single_file_utils.py HOT 1
- StableDiffusionControlNetImg2ImgPipeline call report “argument of type 'NoneType' is not iterable” HOT 1
- SD3 - num_images_per_prompt no longer honoured (throws error) HOT 3
- configuration_utils.to_json_string() fails on WindowsPath HOT 2
- i2vgen-xl keep produce black gif HOT 4
- train_text_to_image_sdxl.py fail resume from checkpoint and also can not load for infer HOT 1
- AnimateDiff bug not sure if it use adapter or not HOT 2
- AnimateDiffSDXL + Multi Controlnets support HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from diffusers.