GithubHelp home page GithubHelp logo

Questions about ip-adapter HOT 4 OPEN

bonlime avatar bonlime commented on July 28, 2024
Questions

from ip-adapter.

Comments (4)

hkunzhe avatar hkunzhe commented on July 28, 2024 1

@hkunzhe if you one GPUs, you can't use deepspeed, in fact, we only has very small Trainable parameters, deepspeed has a small effect for the training speed. For deepspeed config, you can use this https://github.com/huggingface/accelerate/tree/main/examples/deepspeed_config_templates

Thanks for your reply! I got it!

from ip-adapter.

xiaohu2015 avatar xiaohu2015 commented on July 28, 2024

@bonlime hi, thank you for your interest in our work. The face-conditioned model is a experimental version, hence we're still working on improving it, we would like to train similar model for SDXL in the future. For IP-Adapter of sd1.5, we trained on a single machine with 8 V100 GPU, it took about 9 days for 1M steps. But as we observe, training about 200k steps can get good results.

for your ideas:

  1. We using the final CLIP embeddings like DALLE2, it can represent the content and style of the image. as discussed in 4.4.2 of our paper, we made a comparison of fine-grained features and global Features.
  2. The text encoding, clip embeddings is computed fast, so we don't cache them. Deepspeed is supported in accelerate https://github.com/huggingface/accelerate#launching-training-using-deepspeed
  3. you are right. In the training, we can segment the face as the condition, which will reduce the affect of background. Or, we can add attention score constraint in the training.

from ip-adapter.

hkunzhe avatar hkunzhe commented on July 28, 2024

@bonlime hi, thank you for your interest in our work. The face-conditioned model is a experimental version, hence we're still working on improving it, we would like to train similar model for SDXL in the future. For IP-Adapter of sd1.5, we trained on a single machine with 8 V100 GPU, it took about 9 days for 1M steps. But as we observe, training about 200k steps can get good results.

for your ideas:

  1. We using the final CLIP embeddings like DALLE2, it can represent the content and style of the image. as discussed in 4.4.2 of our paper, we made a comparison of fine-grained features and global Features.
  2. The text encoding, clip embeddings is computed fast, so we don't cache them. Deepspeed is supported in accelerate https://github.com/huggingface/accelerate#launching-training-using-deepspeed
  3. you are right. In the training, we can segment the face as the condition, which will reduce the affect of background. Or, we can add attention score constraint in the training.

Does DeepSpeed help with single GPU training? Can you share the DeepSpeed config file for ZeRO stage 2 as mentioned in your paper?

from ip-adapter.

xiaohu2015 avatar xiaohu2015 commented on July 28, 2024

@hkunzhe if you one GPUs, you can't use deepspeed, in fact, we only has very small Trainable parameters, deepspeed has a small effect for the training speed. For deepspeed config, you can use this https://github.com/huggingface/accelerate/tree/main/examples/deepspeed_config_templates

from ip-adapter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.