williamyang1991 / vtoonify Goto Github PK

View Code? Open in Web Editor NEW

3.5K 65.0 443.0 21.15 MB

[SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer

License: Other

Python 8.34% C++ 0.10% Cuda 0.63% Shell 0.05% Jupyter Notebook 90.87%

face siggraph-asia style-transfer stylegan2 toonify video-style-transfer

vtoonify's People

Contributors

Stargazers

Watchers

Forkers

zumbalamambo peterzs guoxiansong liuguoyou suzhoushr techthiyanes sewonist johndpope dream-well ilayshp wildkong ottonove rrrfrr xstarlink belzecue cedro3 xfer ssteo alizeevalerie jm4rc05 ishine codedk cv-synthesis vlostman saldetifr haifengzeng mcx joskid mohan-zhang-u ak391 neeksor destx0 piet4 kurnianggoro mbilalai stjordanis paperwave apinzonf derek-zl edddyguo deep-diver milky916 xleoman cybernhl slives-lab e7dal celestialized shaun95 jeremyspritelyco cuulee databill86 bissembert1618 ikj1992 kp-forks shankk24 silencessss w3ss davincibj xiaojize jackzhousz meadowair hokentech xloongcn woodshope benjinglin hancj-dev leedaga2 awangenh yqz1990 dandingbudanding stepyy linker666 twilightsight shadowkun amorjnyh fenghuanfun huanyouyu meriem-ds satyadevshetty 807183087 david20080125 graciousgpal vcip2015 donlinglok f221981 sabaiitj kasdream jerickpmontero asmedeus998 crzaizxw1314 hajungong007 alex-birdov svirmi fastflair andrealiu3 nanhaizhiyun blamezdn animebing skypeople101 stanleydukor

vtoonify's Issues

colab link not working anymore

关于模型结果不能眨眼的问题

我训练了一个256分辨率的模型，同时用了相应的stylegan，pspencoder，dualstylegan,这几个模型的单独效果都是ok的。
在训练vtoonify中，我修改了部分代码，directions是基于1024stylegan的，我注释掉了这部分的latent的变换。
xc, _ = g_ema.stylegan()([wc], input_is_latent=True, truncation=0.5, truncation_latent=0)
这个xc作为realoutput，是不是因为这个用dualstylegan生成的风格图没用闭眼的情况所以导致了最后vtoonify不能应对闭眼的图像？

我单独测试pspencoder，对闭眼图像还原的还是可以的，但是dualstylegan对闭眼图像的风格化，不能生成闭眼的图像。

Does it work with pets?

We all know how important this is. Does is it work with pets? 🐱

strange colors transfer

I use my own stylegan2 model and train it according to the method you provided, but I get some strange colors in generated photos, especially in the clothing parts.

Can you give me some advice？

Is it possible to change the resolution of output image?

Awesome model!

I would like to know if I can change the size of output image. It looks like the resolution of generated image is not the same as the input image, I mean, it just resizes the result. I wonder if I can adjust the setting to get detailed toonified results.

Suggestion: unique name of the output

It was a bit inconvenient to manually rename output files to avoid overwriting so I made a rather provisional hack:

savename = os.path.join(args.output_path, basename + '_vtoonify_' + args.backbone[0] + '-' + args.ckpt[24:29] + '-' + str(args.style_id).zfill(3) + '-' + str(100*args.style_degree).zfill(3) + '.jpg')

I think it would be useful to add something like this ("args.ckpt[24:29]" needs to be replaced by something more robust) into your code.

Weight control during inference

Hello!

I am pretty much enjoying DualStyleGAN and VToonify, so I would like to know how I can control them.

In DualStyleGAN, I can adjust interp_weights to control styles (like grid visualization in DualStyleGAN inference playground notebook), but it seems when training VToonift-D, the weight is fixed so that I can generate only single weight variation for each fine-tuned VToonify model. Is it correct?

If I want to put different interp_weights in VToonify, is it possible?

Thank you!

seems pretrain model is not used

Hi, nice work, appreciate it! But two questions confuse me.

in train_vtoonify_d.py, for pre-training, why save the weights of g_ema in line 172 / 387? Looks g_ema.eval() keep the weights unchanged, and g is the generator should to be trained.
after pre-training, in full train process, looks the pre-trained model is not loaded (from vtoonifu_d_cartoon/pretrain.pth).

Thanks!

Can I get your style Dataset?

I'm really impressed with your work.
May I ask if I can get style datasets you used such as cartoon, caricature, arcane, comic, pixar

out = torch.cat([f_G, abs(f_G-f_E)], dim=1) RuntimeError: The size of tensor a (126) must match the size of tensor b (125) at non-singleton dimension 3

Traceback (most recent call last):
File "/VToonify/style_transfer.py", line 226, in
y_tilde = vtoonify(inputs, s_w.repeat(inputs.size(0), 1, 1), d_s = args.style_degree)
File "/root/miniconda3/envs/python-app/lib/python3.9/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/VToonify/model/vtoonify.py", line 258, in forward
out, m_E = self.fusion_out[fusion_index](out, f_E, d_s)
File "/root/miniconda3/envs/python-app/lib/python3.9/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/VToonify/model/vtoonify.py", line 125, in forward
out = torch.cat([f_G, abs(f_G-f_E)], dim=1)
RuntimeError: The size of tensor a (126) must match the size of tensor b (125) at non-singleton dimension 3

error message please help

Hi, I run the cpu version of the code and got the following message , pls help, thanks!

Traceback (most recent call last):
File "/Users/chikiuso/Downloads/VToonify/style_transfer.py", line 63, in
vtoonify.load_state_dict(torch.load(args.ckpt, map_location=lambda storage, loc: storage)['g_ema'])
File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VToonify:
Missing key(s) in state_dict: "generator.generator.style.1.weight", "generator.generator.style.1.bias", "generator.generator.style.2.weight", "generator.generator.style.2.bias", "generator.generator.style.3.weight", "generator.generator.style.3.bias", "generator.generator.style.4.weight", "generator.generator.style.4.bias", "generator.generator.style.5.weight", "generator.generator.style.5.bias", "generator.generator.style.6.weight", "generator.generator.style.6.bias", "generator.generator.style.7.weight", "generator.generator.style.7.bias", "generator.generator.style.8.weight", "generator.generator.style.8.bias", "generator.generator.input.input", "generator.generator.conv1.conv.weight", "generator.generator.conv1.conv.modulation.weight", "generator.generator.conv1.conv.modulation.bias", "generator.generator.conv1.noise.weight", "generator.generator.conv1.activate.bias", "generator.generator.to_rgb1.bias", "generator.generator.to_rgb1.conv.weight", "generator.generator.to_rgb1.conv.modulation.weight", "generator.generator.to_rgb1.conv.modulation.bias", "generator.generator.convs.0.conv.weight", "generator.generator.convs.0.conv.blur.kernel", "generator.generator.convs.0.conv.modulation.weight", "generator.generator.convs.0.conv.modulation.bias", "generator.generator.convs.0.noise.weight", "generator.generator.convs.0.activate.bias", "generator.generator.convs.1.conv.weight", "generator.generator.convs.1.conv.modulation.weight", "generator.generator.convs.1.conv.modulation.bias", "generator.generator.convs.1.noise.weight", "generator.generator.convs.1.activate.bias", "generator.generator.convs.2.conv.weight", "generator.generator.convs.2.conv.blur.kernel", "generator.generator.convs.2.conv.modulation.weight", "generator.generator.convs.2.conv.modulation.bias", "generator.generator.convs.2.noise.weight", "generator.generator.convs.2.activate.bias", "generator.generator.convs.3.conv.weight", "generator.generator.convs.3.conv.modulation.weight", "generator.generator.convs.3.conv.modulation.bias", "generator.generator.convs.3.noise.weight", "generator.generator.convs.3.activate.bias", "generator.generator.convs.4.conv.weight", "generator.generator.convs.4.conv.blur.kernel", "generator.generator.convs.4.conv.modulation.weight", "generator.generator.convs.4.conv.modulation.bias", "generator.generator.convs.4.noise.weight", "generator.generator.convs.4.activate.bias", "generator.generator.convs.5.conv.weight", "generator.generator.convs.5.conv.modulation.weight", "generator.generator.convs.5.conv.modulation.bias", "generator.generator.convs.5.noise.weight", "generator.generator.convs.5.activate.bias", "generator.generator.convs.6.conv.weight", "generator.generator.convs.6.conv.blur.kernel", "generator.generator.convs.6.conv.modulation.weight", "generator.generator.convs.6.conv.modulation.bias", "generator.generator.convs.6.noise.weight", "generator.generator.convs.6.activate.bias", "generator.generator.convs.7.conv.weight", "generator.generator.convs.7.conv.modulation.weight", "generator.generator.convs.7.conv.modulation.bias", "generator.generator.convs.7.noise.weight", "generator.generator.convs.7.activate.bias", "generator.generator.convs.8.conv.weight", "generator.generator.convs.8.conv.blur.kernel", "generator.generator.convs.8.conv.modulation.weight", "generator.generator.convs.8.conv.modulation.bias", "generator.generator.convs.8.noise.weight", "generator.generator.convs.8.activate.bias", "generator.generator.convs.9.conv.weight", "generator.generator.convs.9.conv.modulation.weight", "generator.generator.convs.9.conv.modulation.bias", "generator.generator.convs.9.noise.weight", "generator.generator.convs.9.activate.bias", "generator.generator.convs.10.conv.weight", "generator.generator.convs.10.conv.blur.kernel", "generator.generator.convs.10.conv.modulation.weight", "generator.generator.convs.10.conv.modulation.bias", "generator.generator.convs.10.noise.weight", "generator.generator.convs.10.activate.bias", "generator.generator.convs.11.conv.weight", "generator.generator.convs.11.conv.modulation.weight", "generator.generator.convs.11.conv.modulation.bias", "generator.generator.convs.11.noise.weight", "generator.generator.convs.11.activate.bias", "generator.generator.convs.12.conv.weight", "generator.generator.convs.12.conv.blur.kernel", "generator.generator.convs.12.conv.modulation.weight", "generator.generator.convs.12.conv.modulation.bias", "generator.generator.convs.12.noise.weight", "generator.generator.convs.12.activate.bias", "generator.generator.convs.13.conv.weight", "generator.generator.convs.13.conv.modulation.weight", "generator.generator.convs.13.conv.modulation.bias", "generator.generator.convs.13.noise.weight", "generator.generator.convs.13.activate.bias", "generator.generator.convs.14.conv.weight", "generator.generator.convs.14.conv.blur.kernel", "generator.generator.convs.14.conv.modulation.weight", "generator.generator.convs.14.conv.modulation.bias", "generator.generator.convs.14.noise.weight", "generator.generator.convs.14.activate.bias", "generator.generator.convs.15.conv.weight", "generator.generator.convs.15.conv.modulation.weight", "generator.generator.convs.15.conv.modulation.bias", "generator.generator.convs.15.noise.weight", "generator.generator.convs.15.activate.bias", "generator.generator.to_rgbs.0.bias", "generator.generator.to_rgbs.0.upsample.kernel", "generator.generator.to_rgbs.0.conv.weight", "generator.generator.to_rgbs.0.conv.modulation.weight", "generator.generator.to_rgbs.0.conv.modulation.bias", "generator.generator.to_rgbs.1.bias", "generator.generator.to_rgbs.1.upsample.kernel", "generator.generator.to_rgbs.1.conv.weight", "generator.generator.to_rgbs.1.conv.modulation.weight", "generator.generator.to_rgbs.1.conv.modulation.bias", "generator.generator.to_rgbs.2.bias", "generator.generator.to_rgbs.2.upsample.kernel", "generator.generator.to_rgbs.2.conv.weight", "generator.generator.to_rgbs.2.conv.modulation.weight", "generator.generator.to_rgbs.2.conv.modulation.bias", "generator.generator.to_rgbs.3.bias", "generator.generator.to_rgbs.3.upsample.kernel", "generator.generator.to_rgbs.3.conv.weight", "generator.generator.to_rgbs.3.conv.modulation.weight", "generator.generator.to_rgbs.3.conv.modulation.bias", "generator.generator.to_rgbs.4.bias", "generator.generator.to_rgbs.4.upsample.kernel", "generator.generator.to_rgbs.4.conv.weight", "generator.generator.to_rgbs.4.conv.modulation.weight", "generator.generator.to_rgbs.4.conv.modulation.bias", "generator.generator.to_rgbs.5.bias", "generator.generator.to_rgbs.5.upsample.kernel", "generator.generator.to_rgbs.5.conv.weight", "generator.generator.to_rgbs.5.conv.modulation.weight", "generator.generator.to_rgbs.5.conv.modulation.bias", "generator.generator.to_rgbs.6.bias", "generator.generator.to_rgbs.6.upsample.kernel", "generator.generator.to_rgbs.6.conv.weight", "generator.generator.to_rgbs.6.conv.modulation.weight", "generator.generator.to_rgbs.6.conv.modulation.bias", "generator.generator.to_rgbs.7.bias", "generator.generator.to_rgbs.7.upsample.kernel", "generator.generator.to_rgbs.7.conv.weight", "generator.generator.to_rgbs.7.conv.modulation.weight", "generator.generator.to_rgbs.7.conv.modulation.bias", "generator.generator.noises.noise_0", "generator.generator.noises.noise_1", "generator.generator.noises.noise_2", "generator.generator.noises.noise_3", "generator.generator.noises.noise_4", "generator.generator.noises.noise_5", "generator.generator.noises.noise_6", "generator.generator.noises.noise_7", "generator.generator.noises.noise_8", "generator.generator.noises.noise_9", "generator.generator.noises.noise_10", "generator.generator.noises.noise_11", "generator.generator.noises.noise_12", "generator.generator.noises.noise_13", "generator.generator.noises.noise_14", "generator.generator.noises.noise_15", "generator.generator.noises.noise_16", "generator.res.0.conv.0.weight", "generator.res.0.conv.1.bias", "generator.res.0.conv2.0.weight", "generator.res.0.conv2.1.bias", "generator.res.0.norm.style.weight", "generator.res.0.norm.style.bias", "generator.res.0.norm2.style.weight", "generator.res.0.norm2.style.bias", "generator.res.1.conv.0.weight", "generator.res.1.conv.1.bias", "generator.res.1.conv2.0.weight", "generator.res.1.conv2.1.bias", "generator.res.1.norm.style.weight", "generator.res.1.norm.style.bias", "generator.res.1.norm2.style.weight", "generator.res.1.norm2.style.bias", "generator.res.2.conv.0.weight", "generator.res.2.conv.1.bias", "generator.res.2.conv2.0.weight", "generator.res.2.conv2.1.bias", "generator.res.2.norm.style.weight", "generator.res.2.norm.style.bias", "generator.res.2.norm2.style.weight", "generator.res.2.norm2.style.bias", "generator.res.3.conv.0.weight", "generator.res.3.conv.1.bias", "generator.res.3.conv2.0.weight", "generator.res.3.conv2.1.bias", "generator.res.3.norm.style.weight", "generator.res.3.norm.style.bias", "generator.res.3.norm2.style.weight", "generator.res.3.norm2.style.bias", "generator.res.4.conv.0.weight", "generator.res.4.conv.1.bias", "generator.res.4.conv2.0.weight", "generator.res.4.conv2.1.bias", "generator.res.4.norm.style.weight", "generator.res.4.norm.style.bias", "generator.res.4.norm2.style.weight", "generator.res.4.norm2.style.bias", "generator.res.5.conv.0.weight", "generator.res.5.conv.1.bias", "generator.res.5.conv2.0.weight", "generator.res.5.conv2.1.bias", "generator.res.5.norm.style.weight", "generator.res.5.norm.style.bias", "generator.res.5.norm2.style.weight", "generator.res.5.norm2.style.bias", "generator.res.6.conv.0.weight", "generator.res.6.conv.1.bias", "generator.res.6.conv2.0.weight", "generator.res.6.conv2.1.bias", "generator.res.6.norm.style.weight", "generator.res.6.norm.style.bias", "generator.res.6.norm2.style.weight", "generator.res.6.norm2.style.bias", "generator.res.7.weight", "generator.res.7.bias", "generator.res.8.weight", "generator.res.8.bias", "generator.res.9.weight", "generator.res.9.bias", "generator.res.10.weight", "generator.res.10.bias", "generator.res.11.weight", "generator.res.11.bias", "generator.res.12.weight", "generator.res.12.bias", "generator.res.13.weight", "generator.res.13.bias", "generator.res.14.weight", "generator.res.14.bias", "generator.res.15.weight", "generator.res.15.bias", "generator.res.16.weight", "generator.res.16.bias", "generator.res.17.weight", "generator.res.17.bias", "fusion_out.0.conv.weight", "fusion_out.0.conv.bias", "fusion_out.0.norm.style.weight", "fusion_out.0.norm.style.bias", "fusion_out.0.conv2.weight", "fusion_out.0.conv2.bias", "fusion_out.0.linear.0.weight", "fusion_out.0.linear.0.bias", "fusion_out.0.linear.2.weight", "fusion_out.0.linear.2.bias", "fusion_out.1.conv.weight", "fusion_out.1.conv.bias", "fusion_out.1.norm.style.weight", "fusion_out.1.norm.style.bias", "fusion_out.1.conv2.weight", "fusion_out.1.conv2.bias", "fusion_out.1.linear.0.weight", "fusion_out.1.linear.0.bias", "fusion_out.1.linear.2.weight", "fusion_out.1.linear.2.bias", "fusion_out.2.conv.weight", "fusion_out.2.conv.bias", "fusion_out.2.norm.style.weight", "fusion_out.2.norm.style.bias", "fusion_out.2.conv2.weight", "fusion_out.2.conv2.bias", "fusion_out.2.linear.0.weight", "fusion_out.2.linear.0.bias", "fusion_out.2.linear.2.weight", "fusion_out.2.linear.2.bias", "fusion_out.3.conv.weight", "fusion_out.3.conv.bias", "fusion_out.3.norm.style.weight", "fusion_out.3.norm.style.bias", "fusion_out.3.conv2.weight", "fusion_out.3.conv2.bias", "fusion_out.3.linear.0.weight", "fusion_out.3.linear.0.bias", "fusion_out.3.linear.2.weight", "fusion_out.3.linear.2.bias", "res.0.conv.0.weight", "res.0.conv.1.bias", "res.0.conv2.0.weight", "res.0.conv2.1.bias", "res.0.norm.style.weight", "res.0.norm.style.bias", "res.0.norm2.style.weight", "res.0.norm2.style.bias", "res.1.conv.0.weight", "res.1.conv.1.bias", "res.1.conv2.0.weight", "res.1.conv2.1.bias", "res.1.norm.style.weight", "res.1.norm.style.bias", "res.1.norm2.style.weight", "res.1.norm2.style.bias", "res.2.conv.0.weight", "res.2.conv.1.bias", "res.2.conv2.0.weight", "res.2.conv2.1.bias", "res.2.norm.style.weight", "res.2.norm.style.bias", "res.2.norm2.style.weight", "res.2.norm2.style.bias", "res.3.conv.0.weight", "res.3.conv.1.bias", "res.3.conv2.0.weight", "res.3.conv2.1.bias", "res.3.norm.style.weight", "res.3.norm.style.bias", "res.3.norm2.style.weight", "res.3.norm2.style.bias", "res.4.conv.0.weight", "res.4.conv.1.bias", "res.4.conv2.0.weight", "res.4.conv2.1.bias", "res.4.norm.style.weight", "res.4.norm.style.bias", "res.4.norm2.style.weight", "res.4.norm2.style.bias", "res.5.conv.0.weight", "res.5.conv.1.bias", "res.5.conv2.0.weight", "res.5.conv2.1.bias", "res.5.norm.style.weight", "res.5.norm.style.bias", "res.5.norm2.style.weight", "res.5.norm2.style.bias", "res.6.conv.0.weight", "res.6.conv.1.bias", "res.6.conv2.0.weight", "res.6.conv2.1.bias", "res.6.norm.style.weight", "res.6.norm.style.bias", "res.6.norm2.style.weight", "res.6.norm2.style.bias".
Unexpected key(s) in state_dict: "generator.input.input", "generator.conv1.conv.weight", "generator.conv1.conv.modulation.weight", "generator.conv1.conv.modulation.bias", "generator.conv1.noise.weight", "generator.conv1.activate.bias", "generator.to_rgb1.bias", "generator.to_rgb1.conv.weight", "generator.to_rgb1.conv.modulation.weight", "generator.to_rgb1.conv.modulation.bias", "generator.convs.0.conv.weight", "generator.convs.0.conv.blur.kernel", "generator.convs.0.conv.modulation.weight", "generator.convs.0.conv.modulation.bias", "generator.convs.0.noise.weight", "generator.convs.0.activate.bias", "generator.convs.1.conv.weight", "generator.convs.1.conv.modulation.weight", "generator.convs.1.conv.modulation.bias", "generator.convs.1.noise.weight", "generator.convs.1.activate.bias", "generator.convs.2.conv.weight", "generator.convs.2.conv.blur.kernel", "generator.convs.2.conv.modulation.weight", "generator.convs.2.conv.modulation.bias", "generator.convs.2.noise.weight", "generator.convs.2.activate.bias", "generator.convs.3.conv.weight", "generator.convs.3.conv.modulation.weight", "generator.convs.3.conv.modulation.bias", "generator.convs.3.noise.weight", "generator.convs.3.activate.bias", "generator.convs.4.conv.weight", "generator.convs.4.conv.blur.kernel", "generator.convs.4.conv.modulation.weight", "generator.convs.4.conv.modulation.bias", "generator.convs.4.noise.weight", "generator.convs.4.activate.bias", "generator.convs.5.conv.weight", "generator.convs.5.conv.modulation.weight", "generator.convs.5.conv.modulation.bias", "generator.convs.5.noise.weight", "generator.convs.5.activate.bias", "generator.convs.6.conv.weight", "generator.convs.6.conv.blur.kernel", "generator.convs.6.conv.modulation.weight", "generator.convs.6.conv.modulation.bias", "generator.convs.6.noise.weight", "generator.convs.6.activate.bias", "generator.convs.7.conv.weight", "generator.convs.7.conv.modulation.weight", "generator.convs.7.conv.modulation.bias", "generator.convs.7.noise.weight", "generator.convs.7.activate.bias", "generator.convs.8.conv.weight", "generator.convs.8.conv.blur.kernel", "generator.convs.8.conv.modulation.weight", "generator.convs.8.conv.modulation.bias", "generator.convs.8.noise.weight", "generator.convs.8.activate.bias", "generator.convs.9.conv.weight", "generator.convs.9.conv.modulation.weight", "generator.convs.9.conv.modulation.bias", "generator.convs.9.noise.weight", "generator.convs.9.activate.bias", "generator.convs.10.conv.weight", "generator.convs.10.conv.blur.kernel", "generator.convs.10.conv.modulation.weight", "generator.convs.10.conv.modulation.bias", "generator.convs.10.noise.weight", "generator.convs.10.activate.bias", "generator.convs.11.conv.weight", "generator.convs.11.conv.modulation.weight", "generator.convs.11.conv.modulation.bias", "generator.convs.11.noise.weight", "generator.convs.11.activate.bias", "generator.convs.12.conv.weight", "generator.convs.12.conv.blur.kernel", "generator.convs.12.conv.modulation.weight", "generator.convs.12.conv.modulation.bias", "generator.convs.12.noise.weight", "generator.convs.12.activate.bias", "generator.convs.13.conv.weight", "generator.convs.13.conv.modulation.weight", "generator.convs.13.conv.modulation.bias", "generator.convs.13.noise.weight", "generator.convs.13.activate.bias", "generator.convs.14.conv.weight", "generator.convs.14.conv.blur.kernel", "generator.convs.14.conv.modulation.weight", "generator.convs.14.conv.modulation.bias", "generator.convs.14.noise.weight", "generator.convs.14.activate.bias", "generator.convs.15.conv.weight", "generator.convs.15.conv.modulation.weight", "generator.convs.15.conv.modulation.bias", "generator.convs.15.noise.weight", "generator.convs.15.activate.bias", "generator.to_rgbs.0.bias", "generator.to_rgbs.0.upsample.kernel", "generator.to_rgbs.0.conv.weight", "generator.to_rgbs.0.conv.modulation.weight", "generator.to_rgbs.0.conv.modulation.bias", "generator.to_rgbs.1.bias", "generator.to_rgbs.1.upsample.kernel", "generator.to_rgbs.1.conv.weight", "generator.to_rgbs.1.conv.modulation.weight", "generator.to_rgbs.1.conv.modulation.bias", "generator.to_rgbs.2.bias", "generator.to_rgbs.2.upsample.kernel", "generator.to_rgbs.2.conv.weight", "generator.to_rgbs.2.conv.modulation.weight", "generator.to_rgbs.2.conv.modulation.bias", "generator.to_rgbs.3.bias", "generator.to_rgbs.3.upsample.kernel", "generator.to_rgbs.3.conv.weight", "generator.to_rgbs.3.conv.modulation.weight", "generator.to_rgbs.3.conv.modulation.bias", "generator.to_rgbs.4.bias", "generator.to_rgbs.4.upsample.kernel", "generator.to_rgbs.4.conv.weight", "generator.to_rgbs.4.conv.modulation.weight", "generator.to_rgbs.4.conv.modulation.bias", "generator.to_rgbs.5.bias", "generator.to_rgbs.5.upsample.kernel", "generator.to_rgbs.5.conv.weight", "generator.to_rgbs.5.conv.modulation.weight", "generator.to_rgbs.5.conv.modulation.bias", "generator.to_rgbs.6.bias", "generator.to_rgbs.6.upsample.kernel", "generator.to_rgbs.6.conv.weight", "generator.to_rgbs.6.conv.modulation.weight", "generator.to_rgbs.6.conv.modulation.bias", "generator.to_rgbs.7.bias", "generator.to_rgbs.7.upsample.kernel", "generator.to_rgbs.7.conv.weight", "generator.to_rgbs.7.conv.modulation.weight", "generator.to_rgbs.7.conv.modulation.bias", "generator.noises.noise_0", "generator.noises.noise_1", "generator.noises.noise_2", "generator.noises.noise_3", "generator.noises.noise_4", "generator.noises.noise_5", "generator.noises.noise_6", "generator.noises.noise_7", "generator.noises.noise_8", "generator.noises.noise_9", "generator.noises.noise_10", "generator.noises.noise_11", "generator.noises.noise_12", "generator.noises.noise_13", "generator.noises.noise_14", "generator.noises.noise_15", "generator.noises.noise_16", "generator.style.3.weight", "generator.style.3.bias", "generator.style.4.weight", "generator.style.4.bias", "generator.style.5.weight", "generator.style.5.bias", "generator.style.6.weight", "generator.style.6.bias", "generator.style.7.weight", "generator.style.7.bias", "generator.style.8.weight", "generator.style.8.bias", "fusion_out.0.weight", "fusion_out.0.bias", "fusion_out.1.weight", "fusion_out.1.bias", "fusion_out.2.weight", "fusion_out.2.bias", "fusion_out.3.weight", "fusion_out.3.bias".

Add more pretrain model

good job!I see another repo(https://github.com/williamyang1991/DualStyleGAN) have many other style model , can you integrate them in this repo? I had download the model,but I can't use them in VToonify's code.

Has anyone encountered this problem?

ResolvePackageNotFound:

libgcc-ng=9.3.0
libstdcxx-ng=9.3.0
libedit=3.1.20191231

The three packages cannot be downloaded.

Training preparation

I would like to know what I should prepare for its training. As far as I understand, it needs trained dual stylegen model which was trained with the target-style images, but not sure if there is more things I have.
Thank you!

Training VToonify with different weights

Hello!

in DualStyleGAN, it is interesting to get diverse images with style modifications. If I want to pick up one specific result in the grid, for example, the image located in 3x3 position, and I know the weight of 18 layers, for instance, [0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 1 1 1 1 1 1 1]. is it possible to train VToonify to create such model?

https://github.com/williamyang1991/DualStyleGAN/blob/96e4b2b148fef53d1ba70f1dcfaa5917bd5316f8/destylize.py#L110

I breifly tried to finetune VToonify with different weight, such as 0.3, 0.5, changing --weight parameter, but it seems it only learnt extrinsic style from the style image...

Ideal image input resolution

Hey! Loved the paper.

What is the ideal image input to get the best results?
I tried to use the same image in multiple resolutions (small, medium, and large sizes), and it seems like it's effecting the output drastically. What is your recommendation for the input size?

where is pretrain.pt ?

Hello, sir , I meet this issue:

FileNotFoundError: [Errno 2] No such file or directory: './checkpoint/vtoonify_d_cartoon/pretrain.pt'

I can't find pretrain.pt in google drive:

|--vtoonify_t_cartoon
    |--pretrain.pt                        % * Pre-trained encoder for Cartoon style
|--vtoonify_d_cartoon
    |--pretrain.pt                        % * Pre-trained encoder for Cartoon style

Could you tell me where is pretrain.pt?
thanks~

Turning off Cropping in Video Toonification

Hi, I am wondering how to turn off automatic cropping in Video Toonification. What function and class is doing this?

I am working on processing some videos to be toonified using the Colab Notebook - PART II - Style Transfer with specialized VToonify-D model. How can I modify the code in this section to remove the automatic cropping. I did like to keep the output uncropped.

Thank you for the amazing work you are doing here.

Toon -> Real

Has anyone tried the inverse, Toon -> Real? Curious to know how it would work!

When will you release the paper?

Great work! I'd like to read the paper carefully, and when will you upload it?

I can't run it locally

Hi Everyone,

I am new to jupyter notebooks. I tried installing it locally, but I can't install dependencies properly. Any tut or help will be appreciated.

this is what I get

(base) PS C:\Users\R\Vtoonify> conda env create -f ./environment/vtoonify_env.yaml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

libstdcxx-ng==9.3.0=h6de172a_19
faiss==1.7.1=py38h7b17aaf_0_cpu
python==3.8.3=cpython_he5300dc_0
certifi==2021.10.8=py38h06a4308_2
setuptools==49.6.0=py38h578d9bd_3
matplotlib-base==3.3.4=py38h62a2d02_0
_libgcc_mutex==0.1=conda_forge
libffi==3.2.1=he1b5a44_1007
libfaiss==1.7.1=hb573701_0_cpu
pytorch==1.7.1=py3.8_cuda10.1.243_cudnn7.6.3_0
python-lmdb==1.2.1=py38h2531618_1
libgcc-ng==9.3.0=h2828fa1_19
libedit==3.1.20191231=he28a2e2_2
pillow==8.3.1=py38h2c7a002_0
ca-certificates==2022.2.1=h06a4308_0
scikit-image==0.18.1=py38ha9443f7_0

Is it possible to run this on a MAC M1/M2?

I am not able to find CUDA for Mac, am I missing something?

How can i run it only with cpu?

I installed touch-cpu 1.7.0, and followed the steps on this.
But I still got an error: "AssertionError: Torch not compiled with CUDA enabled".
What should I do only with CPU?

For CPU compatible, missing "import contextlib"

The file named 'mode/stylegan/op_cpu/conv2d_gradfix.py' missing "import contextlib" at beginning, which is refered latter as "@contextlib.contextmanager"

Style_transfer.py can't run successfully

RuntimeError: The size of tensor a (2664) must match the size of tensor b (2663) at non-singleton dimension 2. This error occurs in every time I run the file even if I change the image input. The size of f_G and f_E do not match, so cant do the f_G-f_E. What should I do? Please help me!

the pretrain code get an gray image?

i run the
python train_vtonify_d.py --pretrain
the save the image of variable
real_skip \ fake_skip\ img_gen
because i want to see the relation between them
i find pretrained 'fake_skip' image is a color face segmentation image 3232， according to img_gen.
but get a total gray image in "real_skip" 3232.
From this line of code
recon_loss = F.mse_loss(fake_feat, real_feat) + F.mse_loss(fake_skip, real_skip)
This optimization direction seems to be wrong

what‘s wrong in my operation....

my shell is：
python train_vtoonify_d.py --iter 1 --exstyle_path DualStyleGAN/checkpoint/arcane/exstyle_code.npy --batch 1 --name GG --stylegan_path DualStyleGAN/checkpoint/arcane/generator.pt --pretrain

my saving is
def save_image(img, filename): tmp = ((img.detach().numpy().transpose(1, 2, 0) + 1.0) * 127.5).astype(np.uint8) cv2.imwrite(filename, cv2.cvtColor(tmp, cv2.COLOR_RGB2BGR))
save_image(img_gen[0].cpu(),'real_input.jpg') save_image(real_skip[0].cpu(),'real_skip.jpg') save_image(fake_skip[0].cpu(),'fake_skip.jpg')

Input live webcam video for live streaming?

Has anyone tried or thought about the possibility to use VToonify with live image inputs from a webcam or virtual camera? Or utilizing it for livestreams?

Someone mentioned being able to use this on Linux - https://github.com/umlaeute/v4l2loopback

Missing caricature model with s, d, c params

Cool work!! Are you planning to share the vtoonify_s_d_c.pt checkpoint for the caricature style?

about project only Face image, scenery can not be，What should I do if I want to change the scenery（no person）

like this pro https://github.com/bryandlee/animegan2-pytorch

Doubts about Supervisory Information

This is a very wonderful work! Thank you so much for opening the code.
Currently I am reading your code. When I was reading train_vtoonify_t.py I got confused.
basemodel = Generator(1024, 512, 8, 2).to(device) # G0 finetunemodel = Generator(1024, 512, 8, 2).to(device) basemodel.load_state_dict(torch.load(args.stylegan_path, map_location=lambda storage, loc: storage)['g_ema']) finetunemodel.load_state_dict(torch.load(args.finetunegan_path, map_location=lambda storage, loc: storage)['g_ema']) fused_state_dict = blend_models(finetunemodel, basemodel, args.weight) # G1 generator.generator.load_state_dict(fused_state_dict) # load G1 g_ema.generator.load_state_dict(fused_state_dict) requires_grad(basemodel, False) requires_grad(generator.generator, False) requires_grad(g_ema.generator, False)
There are only cartoon params_low in g_ema.generator but no cartoon params_high.
xs, _ = g_ema.generator([xl], input_is_latent=True) xs = torch.clamp(xs, -1, 1).detach() # y'
Therefore, it is impossible to learn cartoon-style textures and colors using xs as supervision information. But the inference results actually obtained have cartoon textures and colors.

Edit face attributes [expressions: angry, sad, smiling]

Hello, first of all thank you for such a wonderful work. I'd like to know if there's any way to manipulate face attributes like smiling, sad face, angry face, by using pretrained directions. As I've come to know that these boundaries are usually trained on PRE-PROCESSED images on either FFHQ dataset. But in your directory the faces are not cropped and aligned the same as FFHQ. So I'm wondering what do i need to do to be able to make cartoonization as well as manipulate further facial features.
Your guidance is very highly appreciated

Why use g_ema to save model?

Hello,sir, I am new here, I read the code, meet a problem , think a hundred times but get no work:

g_ema is used to generate image pair and it should be freezed:

VToonify/train_vtoonify_d.py

Line 238 in 6154ac0

with torch.no_grad():

and generator is used to generate fake image and it should not be freezed :

VToonify/train_vtoonify_d.py

Line 297 in 6154ac0

###### This part is for training generator (encoder and fusion modules)

Question1:
Finally, we should get the weights of generator, but why save g_ema 's weights in the code? :

VToonify/train_vtoonify_d.py

Line 387 in 6154ac0

"g_ema": g_ema.state_dict(),

Question2:
What is the effect of the function "accumulate"? Does it change g_ema's weights? Why it changes g_ema's weights?

thank you~

视频转卡通

GPU requirements?

Hey guys, good work!
I'm new to GANs and ML in general, though I have some experience with Python ecosystem as a web dev

Might be a stupid question, but what do you think running VToonify would require in terms of GPU specs? I've set up conda env and VToonify in WSL on Lenovo Legion laptop with Nvidia GeForce 2060 6Gb GPU onboard, but almost immediately ran into out of GPU memory issue from CUDA driver. And I don't know whether it's GPU a memory leak due to errors in my env setup, or VToonify require more computing power? Would appreciate your help!

Segmentation Fault

Hi,

I receive a segmentation fault and the code crashes when it attempts to load the vtoonify model to device(cuda).
Would appreciate any help

How can I run transform program smoothly?

Hello.

I could make the environment for this repository and tried running transform program.
But I got documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON as an error.
That seems a lack of gpu capacity, so I tried it another PC with RTX2080. But I got the same error.

When I stopped all programs except for style_transfer.py, I got the same error.

Do you have any ideas to solve this problem?

Thank you for your attention.

install for windows

some one share the steps for installing to win ?

how to get the style code ?

how do i get my own stylecode?

train my dualstylegan g1
train a psp based on g1
get stylecode use psp

is that right？

is it possible to provide a single frame input and get style transferred to video?

currently ,there are few models seen : pixar , cartoon etc.

but , is it possible for us to upload our own style input image (single image) and do a style transfer onto a video?

Is it possible to do that with VToonify , if so , could someone make a colab notebook , where users can input a single stylised frame and be able to transfer it to a target video?

This would allow fully customisable style transfer like EbSynth.

Error "`GLIBCXX_3.4.29' not found (required by dlib)"

Hi.

I've been trying to install vToonify on Ubuntu-WSL. Some packages were offline or isn't avaliable, and had to manually install.
But I think everything was fine.

But when I try to run python style_transfer.py --scale_image, I got this error:
(vtoonify_env) mercantigo@DESKTOP-SC64BP9:~/VToonify$ python style_transfer.py --scale_image Traceback (most recent call last): File "style_transfer.py", line 6, in <module> import dlib File "/home/mercantigo/anaconda3/envs/vtoonify_env/lib/python3.8/site-packages/dlib/__init__.py", line 19, in <module> from _dlib_pybind11 import * ImportError: /home/mercantigo/anaconda3/envs/vtoonify_env/bin/../lib/libstdc++.so.6: version GLIBCXX_3.4.29' not found (required by /home/mercantigo/anaconda3/envs/vtoonify_env/lib/python3.8/site-packages/_dlib_pybind11.cpython-38-x86_64-linux-gnu.so)`

I already tried to reinstall libstdcxx-ng and libgcc, but no lucky at all.

How can I solve this?

Can we toonify more than just a portrait

Can we toonify more than just a portrait? What I mean I want the rest of the scene to be visible as normal but the head to be toonifyed rather than cropped

Video Toonification

Hi. I am working on the code in the Colab Notebook in the repo, on PART II - Style Transfer with specialized VToonify-D model.

I am working through all the steps just fine but when I am at the Video Toonification code, I am able to go through the 'Visualize and Rescale Input' part fine but I cant run 'Perform Inference'. Running the code works well for the default input video, but when I am using my own video it's creating problems.

Running this:
`
with torch.no_grad():
batch_frames = []
print(num)
for i in tqdm(range(num)):
if i == 0:
I = align_face(frame, landmarkpredictor)
I = transform(I).unsqueeze(dim=0).to(device)
s_w = pspencoder(I)
s_w = vtoonify.zplus2wplus(s_w)
s_w[:,:7] = exstyle[:,:7]
else:
success, frame = video_cap.read()
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
if scale <= 0.75:
frame = cv2.sepFilter2D(frame, -1, kernel_1d, kernel_1d)
if scale <= 0.375:
frame = cv2.sepFilter2D(frame, -1, kernel_1d, kernel_1d)
frame = cv2.resize(frame, (w, h))[top:bottom, left:right]

    batch_frames += [transform(frame).unsqueeze(dim=0).to(device)]

    if len(batch_frames) == batch_size or (i+1) == num:
        x = torch.cat(batch_frames, dim=0)
        batch_frames = []
        # parsing network works best on 512x512 images, so we predict parsing maps on upsmapled frames
        # followed by downsampling the parsing maps
        x_p = F.interpolate(parsingpredictor(2*(F.interpolate(x, scale_factor=2, mode='bilinear', align_corners=False)))[0], 
                        scale_factor=0.5, recompute_scale_factor=False).detach()
        # we give parsing maps lower weight (1/16)
        inputs = torch.cat((x, x_p/16.), dim=1)
        # d_s has no effect when backbone is toonify
        y_tilde = vtoonify(inputs, s_w.repeat(inputs.size(0), 1, 1), d_s = 0.5)       
        y_tilde = torch.clamp(y_tilde, -1, 1)
        for k in range(y_tilde.size(0)):
            videoWriter.write(tensor2cv2(y_tilde[k].cpu()))

videoWriter.release()
video_cap.release()
`

Gives:
0it [00:00, ?it/s]

Possible empty file

In the part 3 of the inference notebook, while loading vtoonify.load_state_dict(torch.load(os.path.join(MODEL_DIR, style_type+'_generator.pt'), map_location=lambda storage, loc: storage)['g_ema']) I encountered the following error

EOFError                                  
Traceback (most recent call last)
[<ipython-input-12-40b58cac2d4b>](https://localhost:8080/#) in <module>
      5 
      6 vtoonify = VToonify(backbone = 'dualstylegan')
----> 7 vtoonify.load_state_dict(torch.load(os.path.join(MODEL_DIR, style_type+'_generator.pt'), map_location=lambda storage, loc: storage)['g_ema'])
      8 vtoonify.to(device)
      9 

1 frames
[/usr/local/lib/python3.8/dist-packages/torch/serialization.py](https://localhost:8080/#) in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
   1000             "functionality.")
   1001 
-> 1002     magic_number = pickle_module.load(f, **pickle_load_args)
   1003     if magic_number != MAGIC_NUMBER:
   1004         raise RuntimeError("Invalid magic number; corrupt file?")

EOFError: Ran out of input

Could this be due an empty file being provided?

args: parsing_map_path

what is it?
and what does it affects ?

Can't work well on Colab

Hello.

I'm interested in this system.
I tried running it on google colaboratory though, I got the error below.

Step3

How to solve this?

Thank you for your attention.

Problem during training

Hello!
I faced the problem when training VToonify. I finished training my own style DualStyleGAN model (Thank you for the author's helps!) and would like to make my own VToonify model.

I was able to do pre-training the encoder, but when training VToonify-D, the error occurred. It seems like the problem is related to my GPU, but my GPU is working, and there was no problem in pre-training the encoder part.
Could you have a look at my code and point out what is wrong here?

(vtoonify_env) donghyun@kr-03:~/Desktop/training/VToonify$ python -m torch.distributed.launch --nproc_per_node=1 --master_port=8765 train_vtoonify_d.py --iter 2000 --stylegan_path ./checkpoint/mystyle/generator.pt --exstyle_path ./checkpoint/mystyle/refined_exstyle_code.npy --batch 4 --name vtoonify_d_ mystyle --fix_color
Load options
adv_loss: 0.01
batch: 4
direction_path: ./checkpoint/directions.npy
encoder_path: ./checkpoint/vtoonify_d_ mystyle/pretrain.pt
exstyle_path: ./checkpoint/mystyle/refined_exstyle_code.npy
faceparsing_path: ./checkpoint/faceparsing.pth
fix_color: True
fix_degree: False
fix_style: False
grec_loss: 0.1
iter: 2000
local_rank: 0
log_every: 200
lr: 0.0001
msk_loss: 0.0005
name: vtoonify_d_ mystyle
perc_loss: 0.01
pretrain: False
save_begin: 30000
save_every: 30000
start_iter: 0
style_degree: 0.5
style_encoder_path: ./checkpoint/encoder.pt
style_id: 26
stylegan_path: ./checkpoint/mystyle/generator.pt
tmp_loss: 1.0

Setting up Perceptual loss...
Loading model from: /home/donghyun/Desktop/training/VToonify/model/stylegan/lpips/weights/v0.1/vgg.pth
...[net-lin [vgg]] initialized
...Done
Load models and data successfully loaded!
0%| | 0/2000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train_vtoonify_d.py", line 515, in
train(args, generator, discriminator, g_optim, d_optim, g_ema, percept, parsingpredictor, down, pspencoder, directions, styles, device)
File "train_vtoonify_d.py", line 286, in train
fake_pred = discriminator(F.adaptive_avg_pool2d(fake_output, 256), degree_label, style_ind)
File "/home/donghyun/anaconda3/envs/vtoonify_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/donghyun/Desktop/training/VToonify/model/vtoonify.py", line 84, in forward
condition = torch.cat((self.label_mapper(degree_label), self.style_mapper(style_ind)), dim=1)
File "/home/donghyun/anaconda3/envs/vtoonify_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/donghyun/anaconda3/envs/vtoonify_env/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 145, in forward
return F.embedding(
File "/home/donghyun/anaconda3/envs/vtoonify_env/lib/python3.8/site-packages/torch/nn/functional.py", line 1913, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Input, output and indices must be on the current device

About stylization beyond the head

How can our project achieve this effect？

Since VToonify project is embedded with three other projects（stylegan 、DualStyleGAN、Toonify）, I am not very clear about the implementation process of panorama. Could you give me some advice

conda environment fails to install

conda env create -f ./environment/vtoonify_env.yaml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:
  - python-lmdb==1.2.1=py38h2531618_1
  - scikit-image==0.18.1=py38ha9443f7_0
  - libfaiss==1.7.1=hb573701_0_cpu
  - libstdcxx-ng==9.3.0=h6de172a_19
  - pillow==8.3.1=py38h2c7a002_0
  - libedit==3.1.20191231=he28a2e2_2
  - pytorch==1.7.1=py3.8_cuda10.1.243_cudnn7.6.3_0
  - libgcc-ng==9.3.0=h2828fa1_19
  - ca-certificates==2022.2.1=h06a4308_0
  - python==3.8.3=cpython_he5300dc_0
  - certifi==2021.10.8=py38h06a4308_2
  - faiss==1.7.1=py38h7b17aaf_0_cpu
  - _libgcc_mutex==0.1=conda_forge
  - setuptools==49.6.0=py38h578d9bd_3
  - matplotlib-base==3.3.4=py38h62a2d02_0
  - libffi==3.2.1=he1b5a44_1007

Any info on that? The builds seem too specific.

Why E and Es use two different models？ because the PSP can also extract multi-scale features

Excuse me, the ”Collection-Based Portrait Video Style Transfer “ vtoonify use two different encoder，This is a speed consideration or a performance advantage, because I think the structure of PAN in PSP also has multi-scale feature maps before the map2style, which is similar to the downsampling in E

the face toonify good ,but the body area works bad

I tested my images by vtoonify_t_arcane ckpt with below code :

python style_transfer.py --content ./data/038648.jpg \
       --scale_image --backbone toonify \
       --ckpt ./checkpoint/vtoonify_t_arcane/vtoonify.pt \
       --padding 600 600 600 600

I found it sometimes work bad in body area , the background style is padding to the body area :

williamyang1991 / vtoonify Goto Github PK

vtoonify's People

Contributors

Stargazers

Watchers

Forkers

vtoonify's Issues

Step3

Recommend Projects

Recommend Topics

Recommend Org

Jobs