Describe the bug I try to use the train_dreambooth_sd3.py for tra

also <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-i

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

can you try adding QK Norm blocks back? <a class="issue-link js-issu

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Nan appears during sd3 training about diffusers HOT 17 CLOSED

jyy-1998 commented on July 20, 2024

Nan appears during sd3 training

from diffusers.

Comments (17)

sayakpaul commented on July 20, 2024 1

We don't support full text to image fine-tuning. However, you can refer to which provides this support: https://github.com/bghira/SimpleTuner.

from diffusers.

Luciennnnnnn commented on July 20, 2024

I also encounter NAN problem when adapting SD3 to another task

from diffusers.

bghira commented on July 20, 2024

please try --max_grad_norm=1

from diffusers.

bghira commented on July 20, 2024

also #8592 might help

from diffusers.

Luciennnnnnn commented on July 20, 2024

@bghira I do use these strategies, however, nan still exists.

from diffusers.

bghira commented on July 20, 2024

can you try adding QK Norm blocks back?

bghira/SimpleTuner#469

from diffusers.

bghira commented on July 20, 2024

this will extend the model to 3B parameters for higher performance (the 2B seems limited) and reintroduces the missing qk norm blocks

prepare for a longer training session

from diffusers.

Luciennnnnnn commented on July 20, 2024

Hi @bghira, thank you for your suggestion! It seems be hard to train for many users by introducing extra 1B parameters, i.e. extensive computing power and longer training iterations.

from diffusers.

Luciennnnnnn commented on July 20, 2024

I mean in order to fine-tuning a model, it is not wise to introduce at least 1B parameters. That should not be the correct direction for SD3.

from diffusers.

bghira commented on July 20, 2024

someone has to!

from diffusers.

AmericanPresidentJimmyCarter commented on July 20, 2024

I mean in order to fine-tuning a model, it is not wise to introduce at least 1B parameters. That should not be the correct direction for SD3.

You can just use the model there with the original number of blocks and copy in the parameters except QK norms, then init them as I had.

from diffusers.

sayakpaul commented on July 20, 2024

My question is what happens when you try a lower learning rate or try to overfit a single batch of training data? In any case, this seems more like a discussion to me and not an "issue".

from diffusers.

HaozheZhao commented on July 20, 2024

I also encounter NAN problem when tuning the SD3 for image editing. Is there any new updates?

from diffusers.

sayakpaul commented on July 20, 2024

Did you try the "logit_normal" weighting scheme?

from diffusers.

rardz commented on July 20, 2024

try --mixed_precision="bf16" may help

from diffusers.

BJQ123456 commented on July 20, 2024

Where is this file located? train_text_to_imagesd3.py, I couldn't find it.Thanks

from diffusers.

sayakpaul commented on July 20, 2024

Closing this since switching to "logit_normal" weighting scheme for loss resolves this issue in most cases. If not, please re-open and I will turn it into a discussion.

from diffusers.

Recommend Projects

Nan appears during sd3 training about diffusers HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs