vislearn / controlnet-xs Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Your work is good!
I want to learn from your work. Can you tell ControlNet-XS model position, I can't find it.
Thank you. @_@
I have been fine-tuning sdxl with controlnet for remote-sensing images of buildings and the seg-map as control, but when I tried your work , the sample results became weird...
This is my controlnet+sdxl result below:
And this is controlnet-XS + sdxl:
We can see difference between those two results, controlnet is more natural and controlnet-xs is weird.
Is that normal or I missed some detail ?
if self.learn_embedding:
emb = self.control_model.time_embed(t_emb) * self.control_scale ** 0.3 + base_model.time_embed(t_emb) * (1 -
self.control_scale ** 0.3)
Why do you set this learning embedding.
how to create config to train this model
When I ran the example code in README.md, I met a strange problem.
import scripts.control_utils as cu
import torch
from PIL import Image
path_to_config = 'configs/inference/sdxl/sdxl_encD_canny_48m.yaml'
model = cu.create_model(path_to_config).to('cuda')
image_path = 'IMAGES/00007.png'
canny_high_th = 250
canny_low_th = 100
size = 768
num_samples=2
image = cu.get_image(image_path, size=size)
edges = cu.get_canny_edges(image, low_th=canny_low_th, high_th=canny_high_th)
samples, controls = cu.get_sdxl_sample(
guidance=edges,
ddim_steps=10,
num_samples=num_samples,
model=model,
shape=[4, size // 8, size // 8],
control_scale=0.95,
prompt='cinematic, shoe in the streets, made from meat, photorealistic shoe, highly detailed',
n_prompt='lowres, bad anatomy, worst quality, low quality',
)
Image.fromarray(cu.create_image_grid(samples)).save('00007.png')
The error occured in the following line in File "ControlNet-XS\sgm\modules\encoders\modules.py", line 428:
model, _, _ = open_clip.create_model_and_transforms(
To fix it, I downloaded the laion CLIP-ViT-H-14-laion2B-s32B-b79K model manually and put it in a directory, then I use the model:
model, _, _ = open_clip.create_model_and_transforms(
arch,
device=torch.device("cpu"),
# pretrained=version,
pretrained="laion/CLIP-ViT-H-14-laion2B-s32B-b79K/open_clip_pytorch_model.bin",
)
Then I met the error I could not fix:
RuntimeError: Error(s) in loading state_dict for CLIP:
Missing key(s) in state_dict: "visual.transformer.resblocks.32.ln_1.weight", "visual.transformer.resblocks.32.ln_1.bias", "visual.transformer.resblocks.32.attn.in_proj_weight", "visual.transformer.resblocks.32.attn.in_proj_bias", "visual.transformer.resblocks.32.attn.out_proj.weight", "visual.transformer.resblocks.32.attn.out_proj.bias",
......
"transformer.resblocks.31.mlp.c_proj.weight", "transformer.resblocks.31.mlp.c_proj.bias".
size mismatch for positional_embedding: copying a param with shape torch.Size([77, 1024]) from checkpoint, the shape in current model is torch.Size([77, 1280]).
size mismatch for text_projection: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 1280]).
......
I wonder if it is the problem of the model or something else.
Great Job! How to train ControlNet-XS with my own dataset?
When I loaded SDXL and Control checkpoints, the model produced pictures normally. But the model generates noise when I load only SDXL's ckpt. According to the zero layer processing in the paper, should the model be consistent with SDXL at this time?
First when trying the use sd21 I got
cannot import name 'rank_zero_only' from 'pytorch_lightning.utilities.distributed'
which needed pytorch-lightning to be rolled back to v1.6.5 to get past that.
But now I get...
Traceback (most recent call last):
File "D:\ControlNet-XS\ControlNet-XS.py", line 69, in <module>
samples, controls = cu.get_sdxl_sample(
File "D:\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:ControlNet-XS\scripts\control_utils.py", line 180, in get_sdxl_sample
model.sampler.num_steps = ddim_steps
File "D:\venv\lib\site-packages\torch\nn\modules\module.py", line 1614, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'TwoStreamControlLDM' object has no attribute 'sampler'. Did you mean: 'sample'?
Any ideas on what needs to be tweaked for sd21 to work? sdxl works fine.
Thanks.
How to train?I haven't seen any tutorials about training
i see the blog. but it seems like that model A,B are same as the original controlnet. both them have a complete encoder of unet. and model C has a whole unet. why do they have smaller weights than original controlnet?
Hello @Sipirius,
Just to let you know I created a feature request on the diffusers library to support your architecture : huggingface/diffusers#5168
I have trained original controlnet, the loss is lower than 0.15 at the begin of training, due to zero conv module.
as for controlnet-xs, the model still uses "zero conv" module, but the initial loss is about 0.20.
Is it normal? can you introduce your training loss. can you show me, please!
Hello, please tell me how to download and use the data sets in the project. I downloaded the data sets in sgm, but it shows that Datasets not yet available. Can you explain it?
It is initialized in the CTOR and never changes afterwards.
As a consequence, setting control_scale
in the high-level get_sdxl_sample
API has no effect on it while this factor is involved in the timestep embedding computation.
(In other words, the base model time_embed
has no effect whatever the input control_scale
)
But also, setting control_scale=0.0
when calling get_sdxl_sample
is different than completely disabling ControlNet-XS via no_control = True
.
It looks to me, it should be changed via something like:
--- a/scripts/control_utils.py
+++ b/scripts/control_utils.py
@@ -185,6 +185,7 @@ def get_sdxl_sample(
if float(control_scale) != 1.0:
model.model.scale_list *= control_scale
+ model.model.control_scale = control_scale
print(f'[CONTROL CORRECTION OF {type(model).__name__} SCALED WITH {control_scale}]')
control = torch.stack([tt.ToTensor()(ds['hint'][..., None].repeat(3, 2))] * num_samples).float().to('cuda')
@@ -231,6 +232,7 @@ def get_sdxl_sample(
# reset scales
model.model.scale_list = model.model.scale_list * 0. + 1.
+ model.model.control_scale = 1.
return x_samples, control
--
Is it intended (bug or feature)? Am I missing something?
Is there a way to combine multiple controlnets, i.e. depth and canny, with ControlNet-XS?
Thanks for your wonderful job.
I want to ask two questiones:
Hi! Awesome job! This is a non-issue but discussion. I think your idea in the paper is essentially two things, more connections (or control just in time) and smaller control nets. I saw the team of diffusers did smaller control nets when they train a control net of SDXL here, and I saw stability uses a LoRA to be a control net here.
I haven't taken a closer look in the code of these two, but these make me wonder that probably we can take your smaller control nets as a LoRA that not only modify the behaviour of SD but also extends it, i.e. seeing the control images.
I also wonder whether we can just use LoRA to do your smaller control nets, then probably we can achieve even faster training and even smaller weights. I guess we can replace you networks with LoRAs while keeping your connections between generative encoders and control encoders.
Hi, thanks for sharing the source code. It seems that the ControlNet is using the same UNet config. Then, how to make the ControlNet smaller?
How much gpu vram needed to train controlnet-xs on custom dataset? Image size 512x512
Hi,
Thanks for the release of both the code and models.
I've set the path to the SDXL checkpoint and canny checkpoint in the config file but the inference code is still trying to download a checkpoint "ip_pytorch_model.bin" of 10.2GB.
What is it ?
Thanks
Thanks a lot for building this. I have 2 questions -
Hi,
Thanks for the release of both the code and models.
I've tried to run the code on Google Colab, and I've got an error. Probably, I'm missing something, but in case it is an error.
You can see the Colab.
Also, it looks like it needs too much VRAM for free Colab. Can you tell me if that is true?
Thanks
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[<ipython-input-2-1b395f27c622>](https://localhost:8080/#) in <cell line: 1>()
----> 1 import scripts.control_utils as cu
2 import torch
3 from PIL import Image
4
5 path_to_config = '/content/ControlNet-XS/configs/inference/sdxl/sdxl_encD_canny_48m.yaml'
13 frames
[/usr/local/lib/python3.10/dist-packages/numpy/testing/_private/utils.py](https://localhost:8080/#) in <module>
55 IS_PYSTON = hasattr(sys, "pyston_version_info")
56 HAS_REFCOUNT = getattr(sys, 'getrefcount', None) is not None and not IS_PYSTON
---> 57 HAS_LAPACK64 = numpy.linalg._umath_linalg._ilp64
58
59 _OLD_PROMOTION = lambda: np._get_promotion_state() == 'legacy'
AttributeError: module 'numpy.linalg._umath_linalg' has no attribute '_ilp64'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.