Hello, interesting work! I want to test on SD 1.4 with the default but the

ASPL script fails training on SD 1.4. about anti-dreambooth HOT 6 CLOSED

vinairesearch commented on June 1, 2024

ASPL script fails training on SD 1.4.

from anti-dreambooth.

Comments (6)

whulizheng commented on June 1, 2024 1

Thank you for your interest in our project! We'll try to reproduce your issue of NaN loss for ASPL with Stable Diffusion 1.4 later today. In the meantime, could you please provide some more details that would help us investigate:

Did you modify the default script at all, or are you using it as-is?

Which Python library versions are you using for PyTorch, Transformers, Diffusers

Does this happen right away in the first epoch, or after training for some amount of time?

Hi, thanks a lot. I only modified the model path of the script, and PyTorch, Transformers, and Diffusers are all the same version as they are in the requirements.txt, only the first epoch outputs normal loss, and after that, they all become "nan" like this:

Step #0, loss: 0.23414941132068634, prior_loss: 0.2278386950492859, instance_loss: 0.006310714408755302
Step #1, loss: nan, prior_loss: nan, instance_loss: nan
Step #2, loss: nan, prior_loss: nan, instance_loss: nan
PGD loss - step 0, loss: nan
PGD loss - step 1, loss: nan
PGD loss - step 2, loss: nan

from anti-dreambooth.

thuanz123 commented on June 1, 2024 1

yeah xformers is causing other repo to nan loss as well, try different version xformers to see if it's OK

from anti-dreambooth.

Luvata commented on June 1, 2024

Thank you for your interest in our project! We'll try to reproduce your issue of NaN loss for ASPL with Stable Diffusion 1.4 later today.
In the meantime, could you please provide some more details that would help us investigate:

Did you modify the default script at all, or are you using it as-is?
Which Python library versions are you using for PyTorch, Transformers, Diffusers
Does this happen right away in the first epoch, or after training for some amount of time?

from anti-dreambooth.

Luvata commented on June 1, 2024

Sorry I can't reproduce your issue, I've test both bf16, fp16 and no by changing mixed_precision in attack_with_aspl.sh

Below is the expected output in the terminal

bash scripts/attack_with_aspl.sh
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_processes` was set to a value of `1`
        `--num_machines` was set to a value of `1`
        `--mixed_precision` was set to a value of `'no'`
        `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
11/09/2023 17:39:43 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: no

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'timestep_post_act', 'resnet_time_scale_shift', 'mid_block_type', 'time_embedding_act_fn', 'addition_time_embed_dim', 'addition_embed_type', 'class_embed_type', 'use_linear_projection', 'projection_class_embeddings_input_dim', 'transformer_layers_per_
block', 'dual_cross_attention', 'resnet_skip_time_act', 'addition_embed_type_num_heads', 'cross_attention_norm', 'time_embedding_dim', 'encoder_hid_dim_type', 'mid_block_only_cross_attention', 'time_cond_proj_dim', 'attention_type', 'num_class_embeds',
 'resnet_out_scale_factor', 'encoder_hid_dim', 'num_attention_heads', 'time_embedding_type', 'conv_out_kernel', 'upcast_attention', 'only_cross_attention', 'conv_in_kernel', 'class_embeddings_concat'} was not found in config. Values will be initialized
 to default values.
{'clip_sample_range', 'thresholding', 'prediction_type', 'timestep_spacing', 'sample_max_value', 'dynamic_thresholding_ratio', 'variance_type'} was not found in config. Values will be initialized to default values.
{'scaling_factor', 'force_upcast', 'norm_num_groups'} was not found in config. Values will be initialized to default values.
Step #0, loss: 0.752631425857544, prior_loss: 0.7393704652786255, instance_loss: 0.013260948471724987
Step #1, loss: 0.18438610434532166, prior_loss: 0.1214214637875557, instance_loss: 0.06296463310718536
Step #2, loss: 0.23523551225662231, prior_loss: 0.13061320781707764, instance_loss: 0.10462230443954468
PGD loss - step 0, loss: 0.06669247150421143
PGD loss - step 1, loss: 0.23700952529907227
PGD loss - step 2, loss: 0.17454129457473755
PGD loss - step 3, loss: 0.30680063366889954
PGD loss - step 4, loss: 0.2727632522583008
PGD loss - step 5, loss: 0.3792399764060974
Step #0, loss: 0.5648417472839355, prior_loss: 0.5227689146995544, instance_loss: 0.042072828859090805
Step #1, loss: 0.244808629155159, prior_loss: 0.2364426851272583, instance_loss: 0.008365947753190994
Step #2, loss: 0.31962481141090393, prior_loss: 0.0035879784263670444, instance_loss: 0.31603682041168213

from anti-dreambooth.

Luvata commented on June 1, 2024

Since I can't reproduce it, could you please double check by re-running the default script attack_with_aspl.sh (change the stable diffusion path to your correct SD path), and let me know your hardware specs.
That will really help me understand the difference between our environments.

from anti-dreambooth.

whulizheng commented on June 1, 2024

Hi, after double checking and re-running, it still happens.

However, when I disabled arg "--enable_xformers_memory_efficient_attention ", it back to normal like:

11/09/2023 11:12:49 - INFO - __main__ - Distributed environment: NO

Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: no

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'class_embed_type', 'resnet_time_scale_shift', 'projection_class_embeddings_input_dim', 'upcast_attention', 'dual_cross_attention', 'conv_out_kernel', 'use_linear_projection', 'timestep_post_act', 'only_cross_attention', 'num_class_embeds', 'mid_block_type', 'time_cond_proj_dim', 'time_embedding_type', 'conv_in_kernel'} was not found in config. Values will be initialized to default values.
{'variance_type', 'clip_sample_range', 'prediction_type'} was not found in config. Values will be initialized to default values.
{'norm_num_groups'} was not found in config. Values will be initialized to default values.
Step #0, loss: 0.11835033446550369, prior_loss: 0.06459389626979828, instance_loss: 0.053756438195705414
Step #1, loss: 0.47151514887809753, prior_loss: 0.0061028883792459965, instance_loss: 0.4654122591018677
Step #2, loss: 0.3747083842754364, prior_loss: 0.309255450963974, instance_loss: `0.0654529333114624

My Xformers are installed from "pip install -r requirements.txt", and my GPU is NVIDIA RTX A6000 with Driver Version: 535.129.03. I guess it's the problem of Xformers and my GPU drivers, but it is still so weird why SD 2.1 works under Xformers while SD 1.4 fails with the same environment and config.

from anti-dreambooth.

ASPL script fails training on SD 1.4. about anti-dreambooth HOT 6 CLOSED

Comments (6)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs