GithubHelp home page GithubHelp logo

Comments (17)

sayakpaul avatar sayakpaul commented on July 20, 2024 1

We don't support full text to image fine-tuning. However, you can refer to which provides this support: https://github.com/bghira/SimpleTuner.

from diffusers.

Luciennnnnnn avatar Luciennnnnnn commented on July 20, 2024

I also encounter NAN problem when adapting SD3 to another task

from diffusers.

bghira avatar bghira commented on July 20, 2024

please try --max_grad_norm=1

from diffusers.

bghira avatar bghira commented on July 20, 2024

also #8592 might help

from diffusers.

Luciennnnnnn avatar Luciennnnnnn commented on July 20, 2024

@bghira I do use these strategies, however, nan still exists.

from diffusers.

bghira avatar bghira commented on July 20, 2024

can you try adding QK Norm blocks back?

bghira/SimpleTuner#469

from diffusers.

bghira avatar bghira commented on July 20, 2024

this will extend the model to 3B parameters for higher performance (the 2B seems limited) and reintroduces the missing qk norm blocks

prepare for a longer training session

from diffusers.

Luciennnnnnn avatar Luciennnnnnn commented on July 20, 2024

Hi @bghira, thank you for your suggestion! It seems be hard to train for many users by introducing extra 1B parameters, i.e. extensive computing power and longer training iterations.

from diffusers.

Luciennnnnnn avatar Luciennnnnnn commented on July 20, 2024

I mean in order to fine-tuning a model, it is not wise to introduce at least 1B parameters. That should not be the correct direction for SD3.

from diffusers.

bghira avatar bghira commented on July 20, 2024

someone has to!

from diffusers.

AmericanPresidentJimmyCarter avatar AmericanPresidentJimmyCarter commented on July 20, 2024

I mean in order to fine-tuning a model, it is not wise to introduce at least 1B parameters. That should not be the correct direction for SD3.

You can just use the model there with the original number of blocks and copy in the parameters except QK norms, then init them as I had.

from diffusers.

sayakpaul avatar sayakpaul commented on July 20, 2024

My question is what happens when you try a lower learning rate or try to overfit a single batch of training data? In any case, this seems more like a discussion to me and not an "issue".

from diffusers.

HaozheZhao avatar HaozheZhao commented on July 20, 2024

I also encounter NAN problem when tuning the SD3 for image editing. Is there any new updates?

from diffusers.

sayakpaul avatar sayakpaul commented on July 20, 2024

Did you try the "logit_normal" weighting scheme?

from diffusers.

rardz avatar rardz commented on July 20, 2024

try --mixed_precision="bf16" may help

from diffusers.

BJQ123456 avatar BJQ123456 commented on July 20, 2024

Where is this file located? train_text_to_imagesd3.py, I couldn't find it.Thanks

from diffusers.

sayakpaul avatar sayakpaul commented on July 20, 2024

Closing this since switching to "logit_normal" weighting scheme for loss resolves this issue in most cases. If not, please re-open and I will turn it into a discussion.

from diffusers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.