GithubHelp home page GithubHelp logo

Comments (6)

whulizheng avatar whulizheng commented on June 1, 2024 1

Thank you for your interest in our project! We'll try to reproduce your issue of NaN loss for ASPL with Stable Diffusion 1.4 later today. In the meantime, could you please provide some more details that would help us investigate:

  • Did you modify the default script at all, or are you using it as-is?
  • Which Python library versions are you using for PyTorch, Transformers, Diffusers
  • Does this happen right away in the first epoch, or after training for some amount of time?

Hi, thanks a lot. I only modified the model path of the script, and PyTorch, Transformers, and Diffusers are all the same version as they are in the requirements.txt, only the first epoch outputs normal loss, and after that, they all become "nan" like this:

Step #0, loss: 0.23414941132068634, prior_loss: 0.2278386950492859, instance_loss: 0.006310714408755302
Step #1, loss: nan, prior_loss: nan, instance_loss: nan
Step #2, loss: nan, prior_loss: nan, instance_loss: nan
PGD loss - step 0, loss: nan
PGD loss - step 1, loss: nan
PGD loss - step 2, loss: nan

from anti-dreambooth.

thuanz123 avatar thuanz123 commented on June 1, 2024 1

yeah xformers is causing other repo to nan loss as well, try different version xformers to see if it's OK

from anti-dreambooth.

Luvata avatar Luvata commented on June 1, 2024

Thank you for your interest in our project! We'll try to reproduce your issue of NaN loss for ASPL with Stable Diffusion 1.4 later today.
In the meantime, could you please provide some more details that would help us investigate:

  • Did you modify the default script at all, or are you using it as-is?
  • Which Python library versions are you using for PyTorch, Transformers, Diffusers
  • Does this happen right away in the first epoch, or after training for some amount of time?

from anti-dreambooth.

Luvata avatar Luvata commented on June 1, 2024

Sorry I can't reproduce your issue, I've test both bf16, fp16 and no by changing mixed_precision in attack_with_aspl.sh

Below is the expected output in the terminal

bash scripts/attack_with_aspl.sh
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_processes` was set to a value of `1`
        `--num_machines` was set to a value of `1`
        `--mixed_precision` was set to a value of `'no'`
        `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
11/09/2023 17:39:43 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: no

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'timestep_post_act', 'resnet_time_scale_shift', 'mid_block_type', 'time_embedding_act_fn', 'addition_time_embed_dim', 'addition_embed_type', 'class_embed_type', 'use_linear_projection', 'projection_class_embeddings_input_dim', 'transformer_layers_per_
block', 'dual_cross_attention', 'resnet_skip_time_act', 'addition_embed_type_num_heads', 'cross_attention_norm', 'time_embedding_dim', 'encoder_hid_dim_type', 'mid_block_only_cross_attention', 'time_cond_proj_dim', 'attention_type', 'num_class_embeds',
 'resnet_out_scale_factor', 'encoder_hid_dim', 'num_attention_heads', 'time_embedding_type', 'conv_out_kernel', 'upcast_attention', 'only_cross_attention', 'conv_in_kernel', 'class_embeddings_concat'} was not found in config. Values will be initialized
 to default values.
{'clip_sample_range', 'thresholding', 'prediction_type', 'timestep_spacing', 'sample_max_value', 'dynamic_thresholding_ratio', 'variance_type'} was not found in config. Values will be initialized to default values.
{'scaling_factor', 'force_upcast', 'norm_num_groups'} was not found in config. Values will be initialized to default values.
Step #0, loss: 0.752631425857544, prior_loss: 0.7393704652786255, instance_loss: 0.013260948471724987
Step #1, loss: 0.18438610434532166, prior_loss: 0.1214214637875557, instance_loss: 0.06296463310718536
Step #2, loss: 0.23523551225662231, prior_loss: 0.13061320781707764, instance_loss: 0.10462230443954468
PGD loss - step 0, loss: 0.06669247150421143
PGD loss - step 1, loss: 0.23700952529907227
PGD loss - step 2, loss: 0.17454129457473755
PGD loss - step 3, loss: 0.30680063366889954
PGD loss - step 4, loss: 0.2727632522583008
PGD loss - step 5, loss: 0.3792399764060974
Step #0, loss: 0.5648417472839355, prior_loss: 0.5227689146995544, instance_loss: 0.042072828859090805
Step #1, loss: 0.244808629155159, prior_loss: 0.2364426851272583, instance_loss: 0.008365947753190994
Step #2, loss: 0.31962481141090393, prior_loss: 0.0035879784263670444, instance_loss: 0.31603682041168213

from anti-dreambooth.

Luvata avatar Luvata commented on June 1, 2024

Since I can't reproduce it, could you please double check by re-running the default script attack_with_aspl.sh (change the stable diffusion path to your correct SD path), and let me know your hardware specs.
That will really help me understand the difference between our environments.

from anti-dreambooth.

whulizheng avatar whulizheng commented on June 1, 2024

Hi, after double checking and re-running, it still happens.

However, when I disabled arg "--enable_xformers_memory_efficient_attention ", it back to normal like:

11/09/2023 11:12:49 - INFO - __main__ - Distributed environment: NO

Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: no

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'class_embed_type', 'resnet_time_scale_shift', 'projection_class_embeddings_input_dim', 'upcast_attention', 'dual_cross_attention', 'conv_out_kernel', 'use_linear_projection', 'timestep_post_act', 'only_cross_attention', 'num_class_embeds', 'mid_block_type', 'time_cond_proj_dim', 'time_embedding_type', 'conv_in_kernel'} was not found in config. Values will be initialized to default values.
{'variance_type', 'clip_sample_range', 'prediction_type'} was not found in config. Values will be initialized to default values.
{'norm_num_groups'} was not found in config. Values will be initialized to default values.
Step #0, loss: 0.11835033446550369, prior_loss: 0.06459389626979828, instance_loss: 0.053756438195705414
Step #1, loss: 0.47151514887809753, prior_loss: 0.0061028883792459965, instance_loss: 0.4654122591018677
Step #2, loss: 0.3747083842754364, prior_loss: 0.309255450963974, instance_loss: `0.0654529333114624

My Xformers are installed from "pip install -r requirements.txt", and my GPU is NVIDIA RTX A6000 with Driver Version: 535.129.03. I guess it's the problem of Xformers and my GPU drivers, but it is still so weird why SD 2.1 works under Xformers while SD 1.4 fails with the same environment and config.

from anti-dreambooth.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.