Hi! first of all thanks for a really great work. I see that you've a

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Questions about ip-adapter HOT 4 OPEN

bonlime commented on July 28, 2024

Questions

from ip-adapter.

Comments (4)

hkunzhe commented on July 28, 2024 1

@hkunzhe if you one GPUs, you can't use deepspeed, in fact, we only has very small Trainable parameters, deepspeed has a small effect for the training speed. For deepspeed config, you can use this https://github.com/huggingface/accelerate/tree/main/examples/deepspeed_config_templates

Thanks for your reply! I got it!

from ip-adapter.

xiaohu2015 commented on July 28, 2024

@bonlime hi, thank you for your interest in our work. The face-conditioned model is a experimental version, hence we're still working on improving it, we would like to train similar model for SDXL in the future. For IP-Adapter of sd1.5, we trained on a single machine with 8 V100 GPU, it took about 9 days for 1M steps. But as we observe, training about 200k steps can get good results.

for your ideas:

We using the final CLIP embeddings like DALLE2, it can represent the content and style of the image. as discussed in 4.4.2 of our paper, we made a comparison of fine-grained features and global Features.
The text encoding, clip embeddings is computed fast, so we don't cache them. Deepspeed is supported in accelerate https://github.com/huggingface/accelerate#launching-training-using-deepspeed
you are right. In the training, we can segment the face as the condition, which will reduce the affect of background. Or, we can add attention score constraint in the training.

from ip-adapter.

hkunzhe commented on July 28, 2024

@bonlime hi, thank you for your interest in our work. The face-conditioned model is a experimental version, hence we're still working on improving it, we would like to train similar model for SDXL in the future. For IP-Adapter of sd1.5, we trained on a single machine with 8 V100 GPU, it took about 9 days for 1M steps. But as we observe, training about 200k steps can get good results.

for your ideas:

We using the final CLIP embeddings like DALLE2, it can represent the content and style of the image. as discussed in 4.4.2 of our paper, we made a comparison of fine-grained features and global Features.

The text encoding, clip embeddings is computed fast, so we don't cache them. Deepspeed is supported in accelerate https://github.com/huggingface/accelerate#launching-training-using-deepspeed

you are right. In the training, we can segment the face as the condition, which will reduce the affect of background. Or, we can add attention score constraint in the training.

Does DeepSpeed help with single GPU training? Can you share the DeepSpeed config file for ZeRO stage 2 as mentioned in your paper?

from ip-adapter.

xiaohu2015 commented on July 28, 2024

@hkunzhe if you one GPUs, you can't use deepspeed, in fact, we only has very small Trainable parameters, deepspeed has a small effect for the training speed. For deepspeed config, you can use this https://github.com/huggingface/accelerate/tree/main/examples/deepspeed_config_templates

from ip-adapter.

Questions about ip-adapter HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs