williamyang1991 / vtoonify Goto Github PK
View Code? Open in Web Editor NEW[SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer
License: Other
[SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer
License: Other
我训练了一个256分辨率的模型,同时用了相应的stylegan,pspencoder,dualstylegan,这几个模型的单独效果都是ok的。
在训练vtoonify中,我修改了部分代码,directions是基于1024stylegan的,我注释掉了这部分的latent的变换。
xc, _ = g_ema.stylegan()([wc], input_is_latent=True, truncation=0.5, truncation_latent=0)
这个xc作为realoutput,是不是因为这个用dualstylegan生成的风格图没用闭眼的情况所以导致了最后vtoonify不能应对闭眼的图像?
我单独测试pspencoder,对闭眼图像还原的还是可以的,但是dualstylegan对闭眼图像的风格化,不能生成闭眼的图像。
We all know how important this is. Does is it work with pets? 🐱
Awesome model!
I would like to know if I can change the size of output image. It looks like the resolution of generated image is not the same as the input image, I mean, it just resizes the result. I wonder if I can adjust the setting to get detailed toonified results.
It was a bit inconvenient to manually rename output files to avoid overwriting so I made a rather provisional hack:
savename = os.path.join(args.output_path, basename + '_vtoonify_' + args.backbone[0] + '-' + args.ckpt[24:29] + '-' + str(args.style_id).zfill(3) + '-' + str(100*args.style_degree).zfill(3) + '.jpg')
I think it would be useful to add something like this ("args.ckpt[24:29]" needs to be replaced by something more robust) into your code.
Hello!
I am pretty much enjoying DualStyleGAN and VToonify, so I would like to know how I can control them.
In DualStyleGAN, I can adjust interp_weights to control styles (like grid visualization in DualStyleGAN inference playground notebook), but it seems when training VToonift-D, the weight is fixed so that I can generate only single weight variation for each fine-tuned VToonify model. Is it correct?
If I want to put different interp_weights in VToonify, is it possible?
Thank you!
Hi, nice work, appreciate it! But two questions confuse me.
in train_vtoonify_d.py, for pre-training, why save the weights of g_ema in line 172 / 387? Looks g_ema.eval() keep the weights unchanged, and g is the generator should to be trained.
after pre-training, in full train process, looks the pre-trained model is not loaded (from vtoonifu_d_cartoon/pretrain.pth).
Thanks!
I'm really impressed with your work.
May I ask if I can get style datasets you used such as cartoon, caricature, arcane, comic, pixar
Traceback (most recent call last):
File "/VToonify/style_transfer.py", line 226, in
y_tilde = vtoonify(inputs, s_w.repeat(inputs.size(0), 1, 1), d_s = args.style_degree)
File "/root/miniconda3/envs/python-app/lib/python3.9/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/VToonify/model/vtoonify.py", line 258, in forward
out, m_E = self.fusion_out[fusion_index](out, f_E, d_s)
File "/root/miniconda3/envs/python-app/lib/python3.9/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/VToonify/model/vtoonify.py", line 125, in forward
out = torch.cat([f_G, abs(f_G-f_E)], dim=1)
RuntimeError: The size of tensor a (126) must match the size of tensor b (125) at non-singleton dimension 3
Hi, I run the cpu version of the code and got the following message , pls help, thanks!
Traceback (most recent call last):
File "/Users/chikiuso/Downloads/VToonify/style_transfer.py", line 63, in
vtoonify.load_state_dict(torch.load(args.ckpt, map_location=lambda storage, loc: storage)['g_ema'])
File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VToonify:
Missing key(s) in state_dict: "generator.generator.style.1.weight", "generator.generator.style.1.bias", "generator.generator.style.2.weight", "generator.generator.style.2.bias", "generator.generator.style.3.weight", "generator.generator.style.3.bias", "generator.generator.style.4.weight", "generator.generator.style.4.bias", "generator.generator.style.5.weight", "generator.generator.style.5.bias", "generator.generator.style.6.weight", "generator.generator.style.6.bias", "generator.generator.style.7.weight", "generator.generator.style.7.bias", "generator.generator.style.8.weight", "generator.generator.style.8.bias", "generator.generator.input.input", "generator.generator.conv1.conv.weight", "generator.generator.conv1.conv.modulation.weight", "generator.generator.conv1.conv.modulation.bias", "generator.generator.conv1.noise.weight", "generator.generator.conv1.activate.bias", "generator.generator.to_rgb1.bias", "generator.generator.to_rgb1.conv.weight", "generator.generator.to_rgb1.conv.modulation.weight", "generator.generator.to_rgb1.conv.modulation.bias", "generator.generator.convs.0.conv.weight", "generator.generator.convs.0.conv.blur.kernel", "generator.generator.convs.0.conv.modulation.weight", "generator.generator.convs.0.conv.modulation.bias", "generator.generator.convs.0.noise.weight", "generator.generator.convs.0.activate.bias", "generator.generator.convs.1.conv.weight", "generator.generator.convs.1.conv.modulation.weight", "generator.generator.convs.1.conv.modulation.bias", "generator.generator.convs.1.noise.weight", "generator.generator.convs.1.activate.bias", "generator.generator.convs.2.conv.weight", "generator.generator.convs.2.conv.blur.kernel", "generator.generator.convs.2.conv.modulation.weight", "generator.generator.convs.2.conv.modulation.bias", "generator.generator.convs.2.noise.weight", "generator.generator.convs.2.activate.bias", "generator.generator.convs.3.conv.weight", "generator.generator.convs.3.conv.modulation.weight", "generator.generator.convs.3.conv.modulation.bias", "generator.generator.convs.3.noise.weight", "generator.generator.convs.3.activate.bias", "generator.generator.convs.4.conv.weight", "generator.generator.convs.4.conv.blur.kernel", "generator.generator.convs.4.conv.modulation.weight", "generator.generator.convs.4.conv.modulation.bias", "generator.generator.convs.4.noise.weight", "generator.generator.convs.4.activate.bias", "generator.generator.convs.5.conv.weight", "generator.generator.convs.5.conv.modulation.weight", "generator.generator.convs.5.conv.modulation.bias", "generator.generator.convs.5.noise.weight", "generator.generator.convs.5.activate.bias", "generator.generator.convs.6.conv.weight", "generator.generator.convs.6.conv.blur.kernel", "generator.generator.convs.6.conv.modulation.weight", "generator.generator.convs.6.conv.modulation.bias", "generator.generator.convs.6.noise.weight", "generator.generator.convs.6.activate.bias", "generator.generator.convs.7.conv.weight", "generator.generator.convs.7.conv.modulation.weight", "generator.generator.convs.7.conv.modulation.bias", "generator.generator.convs.7.noise.weight", "generator.generator.convs.7.activate.bias", "generator.generator.convs.8.conv.weight", "generator.generator.convs.8.conv.blur.kernel", "generator.generator.convs.8.conv.modulation.weight", "generator.generator.convs.8.conv.modulation.bias", "generator.generator.convs.8.noise.weight", "generator.generator.convs.8.activate.bias", "generator.generator.convs.9.conv.weight", "generator.generator.convs.9.conv.modulation.weight", "generator.generator.convs.9.conv.modulation.bias", "generator.generator.convs.9.noise.weight", "generator.generator.convs.9.activate.bias", "generator.generator.convs.10.conv.weight", "generator.generator.convs.10.conv.blur.kernel", "generator.generator.convs.10.conv.modulation.weight", "generator.generator.convs.10.conv.modulation.bias", "generator.generator.convs.10.noise.weight", "generator.generator.convs.10.activate.bias", "generator.generator.convs.11.conv.weight", "generator.generator.convs.11.conv.modulation.weight", "generator.generator.convs.11.conv.modulation.bias", "generator.generator.convs.11.noise.weight", "generator.generator.convs.11.activate.bias", "generator.generator.convs.12.conv.weight", "generator.generator.convs.12.conv.blur.kernel", "generator.generator.convs.12.conv.modulation.weight", "generator.generator.convs.12.conv.modulation.bias", "generator.generator.convs.12.noise.weight", "generator.generator.convs.12.activate.bias", "generator.generator.convs.13.conv.weight", "generator.generator.convs.13.conv.modulation.weight", "generator.generator.convs.13.conv.modulation.bias", "generator.generator.convs.13.noise.weight", "generator.generator.convs.13.activate.bias", "generator.generator.convs.14.conv.weight", "generator.generator.convs.14.conv.blur.kernel", "generator.generator.convs.14.conv.modulation.weight", "generator.generator.convs.14.conv.modulation.bias", "generator.generator.convs.14.noise.weight", "generator.generator.convs.14.activate.bias", "generator.generator.convs.15.conv.weight", "generator.generator.convs.15.conv.modulation.weight", "generator.generator.convs.15.conv.modulation.bias", "generator.generator.convs.15.noise.weight", "generator.generator.convs.15.activate.bias", "generator.generator.to_rgbs.0.bias", "generator.generator.to_rgbs.0.upsample.kernel", "generator.generator.to_rgbs.0.conv.weight", "generator.generator.to_rgbs.0.conv.modulation.weight", "generator.generator.to_rgbs.0.conv.modulation.bias", "generator.generator.to_rgbs.1.bias", "generator.generator.to_rgbs.1.upsample.kernel", "generator.generator.to_rgbs.1.conv.weight", "generator.generator.to_rgbs.1.conv.modulation.weight", "generator.generator.to_rgbs.1.conv.modulation.bias", "generator.generator.to_rgbs.2.bias", "generator.generator.to_rgbs.2.upsample.kernel", "generator.generator.to_rgbs.2.conv.weight", "generator.generator.to_rgbs.2.conv.modulation.weight", "generator.generator.to_rgbs.2.conv.modulation.bias", "generator.generator.to_rgbs.3.bias", "generator.generator.to_rgbs.3.upsample.kernel", "generator.generator.to_rgbs.3.conv.weight", "generator.generator.to_rgbs.3.conv.modulation.weight", "generator.generator.to_rgbs.3.conv.modulation.bias", "generator.generator.to_rgbs.4.bias", "generator.generator.to_rgbs.4.upsample.kernel", "generator.generator.to_rgbs.4.conv.weight", "generator.generator.to_rgbs.4.conv.modulation.weight", "generator.generator.to_rgbs.4.conv.modulation.bias", "generator.generator.to_rgbs.5.bias", "generator.generator.to_rgbs.5.upsample.kernel", "generator.generator.to_rgbs.5.conv.weight", "generator.generator.to_rgbs.5.conv.modulation.weight", "generator.generator.to_rgbs.5.conv.modulation.bias", "generator.generator.to_rgbs.6.bias", "generator.generator.to_rgbs.6.upsample.kernel", "generator.generator.to_rgbs.6.conv.weight", "generator.generator.to_rgbs.6.conv.modulation.weight", "generator.generator.to_rgbs.6.conv.modulation.bias", "generator.generator.to_rgbs.7.bias", "generator.generator.to_rgbs.7.upsample.kernel", "generator.generator.to_rgbs.7.conv.weight", "generator.generator.to_rgbs.7.conv.modulation.weight", "generator.generator.to_rgbs.7.conv.modulation.bias", "generator.generator.noises.noise_0", "generator.generator.noises.noise_1", "generator.generator.noises.noise_2", "generator.generator.noises.noise_3", "generator.generator.noises.noise_4", "generator.generator.noises.noise_5", "generator.generator.noises.noise_6", "generator.generator.noises.noise_7", "generator.generator.noises.noise_8", "generator.generator.noises.noise_9", "generator.generator.noises.noise_10", "generator.generator.noises.noise_11", "generator.generator.noises.noise_12", "generator.generator.noises.noise_13", "generator.generator.noises.noise_14", "generator.generator.noises.noise_15", "generator.generator.noises.noise_16", "generator.res.0.conv.0.weight", "generator.res.0.conv.1.bias", "generator.res.0.conv2.0.weight", "generator.res.0.conv2.1.bias", "generator.res.0.norm.style.weight", "generator.res.0.norm.style.bias", "generator.res.0.norm2.style.weight", "generator.res.0.norm2.style.bias", "generator.res.1.conv.0.weight", "generator.res.1.conv.1.bias", "generator.res.1.conv2.0.weight", "generator.res.1.conv2.1.bias", "generator.res.1.norm.style.weight", "generator.res.1.norm.style.bias", "generator.res.1.norm2.style.weight", "generator.res.1.norm2.style.bias", "generator.res.2.conv.0.weight", "generator.res.2.conv.1.bias", "generator.res.2.conv2.0.weight", "generator.res.2.conv2.1.bias", "generator.res.2.norm.style.weight", "generator.res.2.norm.style.bias", "generator.res.2.norm2.style.weight", "generator.res.2.norm2.style.bias", "generator.res.3.conv.0.weight", "generator.res.3.conv.1.bias", "generator.res.3.conv2.0.weight", "generator.res.3.conv2.1.bias", "generator.res.3.norm.style.weight", "generator.res.3.norm.style.bias", "generator.res.3.norm2.style.weight", "generator.res.3.norm2.style.bias", "generator.res.4.conv.0.weight", "generator.res.4.conv.1.bias", "generator.res.4.conv2.0.weight", "generator.res.4.conv2.1.bias", "generator.res.4.norm.style.weight", "generator.res.4.norm.style.bias", "generator.res.4.norm2.style.weight", "generator.res.4.norm2.style.bias", "generator.res.5.conv.0.weight", "generator.res.5.conv.1.bias", "generator.res.5.conv2.0.weight", "generator.res.5.conv2.1.bias", "generator.res.5.norm.style.weight", "generator.res.5.norm.style.bias", "generator.res.5.norm2.style.weight", "generator.res.5.norm2.style.bias", "generator.res.6.conv.0.weight", "generator.res.6.conv.1.bias", "generator.res.6.conv2.0.weight", "generator.res.6.conv2.1.bias", "generator.res.6.norm.style.weight", "generator.res.6.norm.style.bias", "generator.res.6.norm2.style.weight", "generator.res.6.norm2.style.bias", "generator.res.7.weight", "generator.res.7.bias", "generator.res.8.weight", "generator.res.8.bias", "generator.res.9.weight", "generator.res.9.bias", "generator.res.10.weight", "generator.res.10.bias", "generator.res.11.weight", "generator.res.11.bias", "generator.res.12.weight", "generator.res.12.bias", "generator.res.13.weight", "generator.res.13.bias", "generator.res.14.weight", "generator.res.14.bias", "generator.res.15.weight", "generator.res.15.bias", "generator.res.16.weight", "generator.res.16.bias", "generator.res.17.weight", "generator.res.17.bias", "fusion_out.0.conv.weight", "fusion_out.0.conv.bias", "fusion_out.0.norm.style.weight", "fusion_out.0.norm.style.bias", "fusion_out.0.conv2.weight", "fusion_out.0.conv2.bias", "fusion_out.0.linear.0.weight", "fusion_out.0.linear.0.bias", "fusion_out.0.linear.2.weight", "fusion_out.0.linear.2.bias", "fusion_out.1.conv.weight", "fusion_out.1.conv.bias", "fusion_out.1.norm.style.weight", "fusion_out.1.norm.style.bias", "fusion_out.1.conv2.weight", "fusion_out.1.conv2.bias", "fusion_out.1.linear.0.weight", "fusion_out.1.linear.0.bias", "fusion_out.1.linear.2.weight", "fusion_out.1.linear.2.bias", "fusion_out.2.conv.weight", "fusion_out.2.conv.bias", "fusion_out.2.norm.style.weight", "fusion_out.2.norm.style.bias", "fusion_out.2.conv2.weight", "fusion_out.2.conv2.bias", "fusion_out.2.linear.0.weight", "fusion_out.2.linear.0.bias", "fusion_out.2.linear.2.weight", "fusion_out.2.linear.2.bias", "fusion_out.3.conv.weight", "fusion_out.3.conv.bias", "fusion_out.3.norm.style.weight", "fusion_out.3.norm.style.bias", "fusion_out.3.conv2.weight", "fusion_out.3.conv2.bias", "fusion_out.3.linear.0.weight", "fusion_out.3.linear.0.bias", "fusion_out.3.linear.2.weight", "fusion_out.3.linear.2.bias", "res.0.conv.0.weight", "res.0.conv.1.bias", "res.0.conv2.0.weight", "res.0.conv2.1.bias", "res.0.norm.style.weight", "res.0.norm.style.bias", "res.0.norm2.style.weight", "res.0.norm2.style.bias", "res.1.conv.0.weight", "res.1.conv.1.bias", "res.1.conv2.0.weight", "res.1.conv2.1.bias", "res.1.norm.style.weight", "res.1.norm.style.bias", "res.1.norm2.style.weight", "res.1.norm2.style.bias", "res.2.conv.0.weight", "res.2.conv.1.bias", "res.2.conv2.0.weight", "res.2.conv2.1.bias", "res.2.norm.style.weight", "res.2.norm.style.bias", "res.2.norm2.style.weight", "res.2.norm2.style.bias", "res.3.conv.0.weight", "res.3.conv.1.bias", "res.3.conv2.0.weight", "res.3.conv2.1.bias", "res.3.norm.style.weight", "res.3.norm.style.bias", "res.3.norm2.style.weight", "res.3.norm2.style.bias", "res.4.conv.0.weight", "res.4.conv.1.bias", "res.4.conv2.0.weight", "res.4.conv2.1.bias", "res.4.norm.style.weight", "res.4.norm.style.bias", "res.4.norm2.style.weight", "res.4.norm2.style.bias", "res.5.conv.0.weight", "res.5.conv.1.bias", "res.5.conv2.0.weight", "res.5.conv2.1.bias", "res.5.norm.style.weight", "res.5.norm.style.bias", "res.5.norm2.style.weight", "res.5.norm2.style.bias", "res.6.conv.0.weight", "res.6.conv.1.bias", "res.6.conv2.0.weight", "res.6.conv2.1.bias", "res.6.norm.style.weight", "res.6.norm.style.bias", "res.6.norm2.style.weight", "res.6.norm2.style.bias".
Unexpected key(s) in state_dict: "generator.input.input", "generator.conv1.conv.weight", "generator.conv1.conv.modulation.weight", "generator.conv1.conv.modulation.bias", "generator.conv1.noise.weight", "generator.conv1.activate.bias", "generator.to_rgb1.bias", "generator.to_rgb1.conv.weight", "generator.to_rgb1.conv.modulation.weight", "generator.to_rgb1.conv.modulation.bias", "generator.convs.0.conv.weight", "generator.convs.0.conv.blur.kernel", "generator.convs.0.conv.modulation.weight", "generator.convs.0.conv.modulation.bias", "generator.convs.0.noise.weight", "generator.convs.0.activate.bias", "generator.convs.1.conv.weight", "generator.convs.1.conv.modulation.weight", "generator.convs.1.conv.modulation.bias", "generator.convs.1.noise.weight", "generator.convs.1.activate.bias", "generator.convs.2.conv.weight", "generator.convs.2.conv.blur.kernel", "generator.convs.2.conv.modulation.weight", "generator.convs.2.conv.modulation.bias", "generator.convs.2.noise.weight", "generator.convs.2.activate.bias", "generator.convs.3.conv.weight", "generator.convs.3.conv.modulation.weight", "generator.convs.3.conv.modulation.bias", "generator.convs.3.noise.weight", "generator.convs.3.activate.bias", "generator.convs.4.conv.weight", "generator.convs.4.conv.blur.kernel", "generator.convs.4.conv.modulation.weight", "generator.convs.4.conv.modulation.bias", "generator.convs.4.noise.weight", "generator.convs.4.activate.bias", "generator.convs.5.conv.weight", "generator.convs.5.conv.modulation.weight", "generator.convs.5.conv.modulation.bias", "generator.convs.5.noise.weight", "generator.convs.5.activate.bias", "generator.convs.6.conv.weight", "generator.convs.6.conv.blur.kernel", "generator.convs.6.conv.modulation.weight", "generator.convs.6.conv.modulation.bias", "generator.convs.6.noise.weight", "generator.convs.6.activate.bias", "generator.convs.7.conv.weight", "generator.convs.7.conv.modulation.weight", "generator.convs.7.conv.modulation.bias", "generator.convs.7.noise.weight", "generator.convs.7.activate.bias", "generator.convs.8.conv.weight", "generator.convs.8.conv.blur.kernel", "generator.convs.8.conv.modulation.weight", "generator.convs.8.conv.modulation.bias", "generator.convs.8.noise.weight", "generator.convs.8.activate.bias", "generator.convs.9.conv.weight", "generator.convs.9.conv.modulation.weight", "generator.convs.9.conv.modulation.bias", "generator.convs.9.noise.weight", "generator.convs.9.activate.bias", "generator.convs.10.conv.weight", "generator.convs.10.conv.blur.kernel", "generator.convs.10.conv.modulation.weight", "generator.convs.10.conv.modulation.bias", "generator.convs.10.noise.weight", "generator.convs.10.activate.bias", "generator.convs.11.conv.weight", "generator.convs.11.conv.modulation.weight", "generator.convs.11.conv.modulation.bias", "generator.convs.11.noise.weight", "generator.convs.11.activate.bias", "generator.convs.12.conv.weight", "generator.convs.12.conv.blur.kernel", "generator.convs.12.conv.modulation.weight", "generator.convs.12.conv.modulation.bias", "generator.convs.12.noise.weight", "generator.convs.12.activate.bias", "generator.convs.13.conv.weight", "generator.convs.13.conv.modulation.weight", "generator.convs.13.conv.modulation.bias", "generator.convs.13.noise.weight", "generator.convs.13.activate.bias", "generator.convs.14.conv.weight", "generator.convs.14.conv.blur.kernel", "generator.convs.14.conv.modulation.weight", "generator.convs.14.conv.modulation.bias", "generator.convs.14.noise.weight", "generator.convs.14.activate.bias", "generator.convs.15.conv.weight", "generator.convs.15.conv.modulation.weight", "generator.convs.15.conv.modulation.bias", "generator.convs.15.noise.weight", "generator.convs.15.activate.bias", "generator.to_rgbs.0.bias", "generator.to_rgbs.0.upsample.kernel", "generator.to_rgbs.0.conv.weight", "generator.to_rgbs.0.conv.modulation.weight", "generator.to_rgbs.0.conv.modulation.bias", "generator.to_rgbs.1.bias", "generator.to_rgbs.1.upsample.kernel", "generator.to_rgbs.1.conv.weight", "generator.to_rgbs.1.conv.modulation.weight", "generator.to_rgbs.1.conv.modulation.bias", "generator.to_rgbs.2.bias", "generator.to_rgbs.2.upsample.kernel", "generator.to_rgbs.2.conv.weight", "generator.to_rgbs.2.conv.modulation.weight", "generator.to_rgbs.2.conv.modulation.bias", "generator.to_rgbs.3.bias", "generator.to_rgbs.3.upsample.kernel", "generator.to_rgbs.3.conv.weight", "generator.to_rgbs.3.conv.modulation.weight", "generator.to_rgbs.3.conv.modulation.bias", "generator.to_rgbs.4.bias", "generator.to_rgbs.4.upsample.kernel", "generator.to_rgbs.4.conv.weight", "generator.to_rgbs.4.conv.modulation.weight", "generator.to_rgbs.4.conv.modulation.bias", "generator.to_rgbs.5.bias", "generator.to_rgbs.5.upsample.kernel", "generator.to_rgbs.5.conv.weight", "generator.to_rgbs.5.conv.modulation.weight", "generator.to_rgbs.5.conv.modulation.bias", "generator.to_rgbs.6.bias", "generator.to_rgbs.6.upsample.kernel", "generator.to_rgbs.6.conv.weight", "generator.to_rgbs.6.conv.modulation.weight", "generator.to_rgbs.6.conv.modulation.bias", "generator.to_rgbs.7.bias", "generator.to_rgbs.7.upsample.kernel", "generator.to_rgbs.7.conv.weight", "generator.to_rgbs.7.conv.modulation.weight", "generator.to_rgbs.7.conv.modulation.bias", "generator.noises.noise_0", "generator.noises.noise_1", "generator.noises.noise_2", "generator.noises.noise_3", "generator.noises.noise_4", "generator.noises.noise_5", "generator.noises.noise_6", "generator.noises.noise_7", "generator.noises.noise_8", "generator.noises.noise_9", "generator.noises.noise_10", "generator.noises.noise_11", "generator.noises.noise_12", "generator.noises.noise_13", "generator.noises.noise_14", "generator.noises.noise_15", "generator.noises.noise_16", "generator.style.3.weight", "generator.style.3.bias", "generator.style.4.weight", "generator.style.4.bias", "generator.style.5.weight", "generator.style.5.bias", "generator.style.6.weight", "generator.style.6.bias", "generator.style.7.weight", "generator.style.7.bias", "generator.style.8.weight", "generator.style.8.bias", "fusion_out.0.weight", "fusion_out.0.bias", "fusion_out.1.weight", "fusion_out.1.bias", "fusion_out.2.weight", "fusion_out.2.bias", "fusion_out.3.weight", "fusion_out.3.bias".
good job!I see another repo(https://github.com/williamyang1991/DualStyleGAN) have many other style model , can you integrate them in this repo? I had download the model,but I can't use them in VToonify's code.
ResolvePackageNotFound:
The three packages cannot be downloaded.
I would like to know what I should prepare for its training. As far as I understand, it needs trained dual stylegen model which was trained with the target-style images, but not sure if there is more things I have.
Thank you!
Hello!
in DualStyleGAN, it is interesting to get diverse images with style modifications. If I want to pick up one specific result in the grid, for example, the image located in 3x3 position, and I know the weight of 18 layers, for instance, [0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 1 1 1 1 1 1 1]. is it possible to train VToonify to create such model?
I breifly tried to finetune VToonify with different weight, such as 0.3, 0.5, changing --weight parameter, but it seems it only learnt extrinsic style from the style image...
Hey! Loved the paper.
What is the ideal image input to get the best results?
I tried to use the same image in multiple resolutions (small, medium, and large sizes), and it seems like it's effecting the output drastically. What is your recommendation for the input size?
Hello, sir , I meet this issue:
FileNotFoundError: [Errno 2] No such file or directory: './checkpoint/vtoonify_d_cartoon/pretrain.pt'
I can't find pretrain.pt in google drive:
|--vtoonify_t_cartoon
|--pretrain.pt % * Pre-trained encoder for Cartoon style
|--vtoonify_d_cartoon
|--pretrain.pt % * Pre-trained encoder for Cartoon style
Could you tell me where is pretrain.pt?
thanks~
Hi, I am wondering how to turn off automatic cropping in Video Toonification. What function and class is doing this?
I am working on processing some videos to be toonified using the Colab Notebook - PART II - Style Transfer with specialized VToonify-D model. How can I modify the code in this section to remove the automatic cropping. I did like to keep the output uncropped.
Thank you for the amazing work you are doing here.
Has anyone tried the inverse, Toon -> Real? Curious to know how it would work!
Great work! I'd like to read the paper carefully, and when will you upload it?
Hi Everyone,
I am new to jupyter notebooks. I tried installing it locally, but I can't install dependencies properly. Any tut or help will be appreciated.
this is what I get
(base) PS C:\Users\R\Vtoonify> conda env create -f ./environment/vtoonify_env.yaml
Collecting package metadata (repodata.json): done
Solving environment: failed
ResolvePackageNotFound:
I am not able to find CUDA for Mac, am I missing something?
I installed touch-cpu 1.7.0, and followed the steps on this.
But I still got an error: "AssertionError: Torch not compiled with CUDA enabled".
What should I do only with CPU?
The file named 'mode/stylegan/op_cpu/conv2d_gradfix.py' missing "import contextlib" at beginning, which is refered latter as "@contextlib.contextmanager"
RuntimeError: The size of tensor a (2664) must match the size of tensor b (2663) at non-singleton dimension 2. This error occurs in every time I run the file even if I change the image input. The size of f_G and f_E do not match, so cant do the f_G-f_E. What should I do? Please help me!
i run the
python train_vtonify_d.py --pretrain
the save the image of variable
real_skip \ fake_skip\ img_gen
because i want to see the relation between them
i find pretrained 'fake_skip' image is a color face segmentation image 3232, according to img_gen.
but get a total gray image in "real_skip" 3232.
From this line of code
recon_loss = F.mse_loss(fake_feat, real_feat) + F.mse_loss(fake_skip, real_skip)
This optimization direction seems to be wrong
what‘s wrong in my operation....
my shell is:
python train_vtoonify_d.py --iter 1 --exstyle_path DualStyleGAN/checkpoint/arcane/exstyle_code.npy --batch 1 --name GG --stylegan_path DualStyleGAN/checkpoint/arcane/generator.pt --pretrain
my saving is
def save_image(img, filename): tmp = ((img.detach().numpy().transpose(1, 2, 0) + 1.0) * 127.5).astype(np.uint8) cv2.imwrite(filename, cv2.cvtColor(tmp, cv2.COLOR_RGB2BGR))
save_image(img_gen[0].cpu(),'real_input.jpg') save_image(real_skip[0].cpu(),'real_skip.jpg') save_image(fake_skip[0].cpu(),'fake_skip.jpg')
Has anyone tried or thought about the possibility to use VToonify with live image inputs from a webcam or virtual camera? Or utilizing it for livestreams?
Someone mentioned being able to use this on Linux - https://github.com/umlaeute/v4l2loopback
Cool work!! Are you planning to share the vtoonify_s_d_c.pt checkpoint for the caricature style?
like this pro https://github.com/bryandlee/animegan2-pytorch
This is a very wonderful work! Thank you so much for opening the code.
Currently I am reading your code. When I was reading train_vtoonify_t.py I got confused.
basemodel = Generator(1024, 512, 8, 2).to(device) # G0 finetunemodel = Generator(1024, 512, 8, 2).to(device) basemodel.load_state_dict(torch.load(args.stylegan_path, map_location=lambda storage, loc: storage)['g_ema']) finetunemodel.load_state_dict(torch.load(args.finetunegan_path, map_location=lambda storage, loc: storage)['g_ema']) fused_state_dict = blend_models(finetunemodel, basemodel, args.weight) # G1 generator.generator.load_state_dict(fused_state_dict) # load G1 g_ema.generator.load_state_dict(fused_state_dict) requires_grad(basemodel, False) requires_grad(generator.generator, False) requires_grad(g_ema.generator, False)
There are only cartoon params_low in g_ema.generator but no cartoon params_high.
xs, _ = g_ema.generator([xl], input_is_latent=True) xs = torch.clamp(xs, -1, 1).detach() # y'
Therefore, it is impossible to learn cartoon-style textures and colors using xs as supervision information. But the inference results actually obtained have cartoon textures and colors.
Hello, first of all thank you for such a wonderful work. I'd like to know if there's any way to manipulate face attributes like smiling, sad face, angry face, by using pretrained directions. As I've come to know that these boundaries are usually trained on PRE-PROCESSED images on either FFHQ dataset. But in your directory the faces are not cropped and aligned the same as FFHQ. So I'm wondering what do i need to do to be able to make cartoonization as well as manipulate further facial features.
Your guidance is very highly appreciated
Hello,sir, I am new here, I read the code, meet a problem , think a hundred times but get no work:
g_ema is used to generate image pair and it should be freezed:
Line 238 in 6154ac0
and generator is used to generate fake image and it should not be freezed :
Line 297 in 6154ac0
Question1:
Finally, we should get the weights of generator, but why save g_ema 's weights in the code? :
Line 387 in 6154ac0
Question2:
What is the effect of the function "accumulate"? Does it change g_ema's weights? Why it changes g_ema's weights?
thank you~
Hey guys, good work!
I'm new to GANs and ML in general, though I have some experience with Python ecosystem as a web dev
Might be a stupid question, but what do you think running VToonify would require in terms of GPU specs? I've set up conda
env and VToonify
in WSL on Lenovo Legion laptop with Nvidia GeForce 2060 6Gb GPU onboard, but almost immediately ran into out of GPU memory issue from CUDA driver. And I don't know whether it's GPU a memory leak due to errors in my env setup, or VToonify require more computing power? Would appreciate your help!
Hi,
I receive a segmentation fault and the code crashes when it attempts to load the vtoonify model to device(cuda).
Would appreciate any help
Hello.
I could make the environment for this repository and tried running transform program.
But I got documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON
as an error.
That seems a lack of gpu capacity, so I tried it another PC with RTX2080. But I got the same error.
When I stopped all programs except for style_transfer.py, I got the same error.
Do you have any ideas to solve this problem?
Thank you for your attention.
some one share the steps for installing to win ?
how do i get my own stylecode?
is that right?
currently ,there are few models seen : pixar , cartoon etc.
but , is it possible for us to upload our own style input image (single image) and do a style transfer onto a video?
Is it possible to do that with VToonify , if so , could someone make a colab notebook , where users can input a single stylised frame and be able to transfer it to a target video?
This would allow fully customisable style transfer like EbSynth.
Hi.
I've been trying to install vToonify on Ubuntu-WSL. Some packages were offline or isn't avaliable, and had to manually install.
But I think everything was fine.
But when I try to run python style_transfer.py --scale_image, I got this error:
(vtoonify_env) mercantigo@DESKTOP-SC64BP9:~/VToonify$ python style_transfer.py --scale_image Traceback (most recent call last): File "style_transfer.py", line 6, in <module> import dlib File "/home/mercantigo/anaconda3/envs/vtoonify_env/lib/python3.8/site-packages/dlib/__init__.py", line 19, in <module> from _dlib_pybind11 import * ImportError: /home/mercantigo/anaconda3/envs/vtoonify_env/bin/../lib/libstdc++.so.6: version
GLIBCXX_3.4.29' not found (required by /home/mercantigo/anaconda3/envs/vtoonify_env/lib/python3.8/site-packages/_dlib_pybind11.cpython-38-x86_64-linux-gnu.so)`
I already tried to reinstall libstdcxx-ng and libgcc, but no lucky at all.
How can I solve this?
Can we toonify more than just a portrait? What I mean I want the rest of the scene to be visible as normal but the head to be toonifyed rather than cropped
Hi. I am working on the code in the Colab Notebook in the repo, on PART II - Style Transfer with specialized VToonify-D model.
I am working through all the steps just fine but when I am at the Video Toonification code, I am able to go through the 'Visualize and Rescale Input' part fine but I cant run 'Perform Inference'. Running the code works well for the default input video, but when I am using my own video it's creating problems.
Running this:
`
with torch.no_grad():
batch_frames = []
print(num)
for i in tqdm(range(num)):
if i == 0:
I = align_face(frame, landmarkpredictor)
I = transform(I).unsqueeze(dim=0).to(device)
s_w = pspencoder(I)
s_w = vtoonify.zplus2wplus(s_w)
s_w[:,:7] = exstyle[:,:7]
else:
success, frame = video_cap.read()
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
if scale <= 0.75:
frame = cv2.sepFilter2D(frame, -1, kernel_1d, kernel_1d)
if scale <= 0.375:
frame = cv2.sepFilter2D(frame, -1, kernel_1d, kernel_1d)
frame = cv2.resize(frame, (w, h))[top:bottom, left:right]
batch_frames += [transform(frame).unsqueeze(dim=0).to(device)]
if len(batch_frames) == batch_size or (i+1) == num:
x = torch.cat(batch_frames, dim=0)
batch_frames = []
# parsing network works best on 512x512 images, so we predict parsing maps on upsmapled frames
# followed by downsampling the parsing maps
x_p = F.interpolate(parsingpredictor(2*(F.interpolate(x, scale_factor=2, mode='bilinear', align_corners=False)))[0],
scale_factor=0.5, recompute_scale_factor=False).detach()
# we give parsing maps lower weight (1/16)
inputs = torch.cat((x, x_p/16.), dim=1)
# d_s has no effect when backbone is toonify
y_tilde = vtoonify(inputs, s_w.repeat(inputs.size(0), 1, 1), d_s = 0.5)
y_tilde = torch.clamp(y_tilde, -1, 1)
for k in range(y_tilde.size(0)):
videoWriter.write(tensor2cv2(y_tilde[k].cpu()))
videoWriter.release()
video_cap.release()
`
Gives:
0it [00:00, ?it/s]
In the part 3 of the inference notebook, while loading vtoonify.load_state_dict(torch.load(os.path.join(MODEL_DIR, style_type+'_generator.pt'), map_location=lambda storage, loc: storage)['g_ema'])
I encountered the following error
EOFError
Traceback (most recent call last)
[<ipython-input-12-40b58cac2d4b>](https://localhost:8080/#) in <module>
5
6 vtoonify = VToonify(backbone = 'dualstylegan')
----> 7 vtoonify.load_state_dict(torch.load(os.path.join(MODEL_DIR, style_type+'_generator.pt'), map_location=lambda storage, loc: storage)['g_ema'])
8 vtoonify.to(device)
9
1 frames
[/usr/local/lib/python3.8/dist-packages/torch/serialization.py](https://localhost:8080/#) in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
1000 "functionality.")
1001
-> 1002 magic_number = pickle_module.load(f, **pickle_load_args)
1003 if magic_number != MAGIC_NUMBER:
1004 raise RuntimeError("Invalid magic number; corrupt file?")
EOFError: Ran out of input
Could this be due an empty file being provided?
what is it?
and what does it affects ?
Hello!
I faced the problem when training VToonify. I finished training my own style DualStyleGAN model (Thank you for the author's helps!) and would like to make my own VToonify model.
I was able to do pre-training the encoder, but when training VToonify-D, the error occurred. It seems like the problem is related to my GPU, but my GPU is working, and there was no problem in pre-training the encoder part.
Could you have a look at my code and point out what is wrong here?
(vtoonify_env) donghyun@kr-03:~/Desktop/training/VToonify$ python -m torch.distributed.launch --nproc_per_node=1 --master_port=8765 train_vtoonify_d.py --iter 2000 --stylegan_path ./checkpoint/mystyle/generator.pt --exstyle_path ./checkpoint/mystyle/refined_exstyle_code.npy --batch 4 --name vtoonify_d_ mystyle --fix_color
Load options
adv_loss: 0.01
batch: 4
direction_path: ./checkpoint/directions.npy
encoder_path: ./checkpoint/vtoonify_d_ mystyle/pretrain.pt
exstyle_path: ./checkpoint/mystyle/refined_exstyle_code.npy
faceparsing_path: ./checkpoint/faceparsing.pth
fix_color: True
fix_degree: False
fix_style: False
grec_loss: 0.1
iter: 2000
local_rank: 0
log_every: 200
lr: 0.0001
msk_loss: 0.0005
name: vtoonify_d_ mystyle
perc_loss: 0.01
pretrain: False
save_begin: 30000
save_every: 30000
start_iter: 0
style_degree: 0.5
style_encoder_path: ./checkpoint/encoder.pt
style_id: 26
stylegan_path: ./checkpoint/mystyle/generator.pt
tmp_loss: 1.0
Setting up Perceptual loss...
Loading model from: /home/donghyun/Desktop/training/VToonify/model/stylegan/lpips/weights/v0.1/vgg.pth
...[net-lin [vgg]] initialized
...Done
Load models and data successfully loaded!
0%| | 0/2000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train_vtoonify_d.py", line 515, in
train(args, generator, discriminator, g_optim, d_optim, g_ema, percept, parsingpredictor, down, pspencoder, directions, styles, device)
File "train_vtoonify_d.py", line 286, in train
fake_pred = discriminator(F.adaptive_avg_pool2d(fake_output, 256), degree_label, style_ind)
File "/home/donghyun/anaconda3/envs/vtoonify_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/donghyun/Desktop/training/VToonify/model/vtoonify.py", line 84, in forward
condition = torch.cat((self.label_mapper(degree_label), self.style_mapper(style_ind)), dim=1)
File "/home/donghyun/anaconda3/envs/vtoonify_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/donghyun/anaconda3/envs/vtoonify_env/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 145, in forward
return F.embedding(
File "/home/donghyun/anaconda3/envs/vtoonify_env/lib/python3.8/site-packages/torch/nn/functional.py", line 1913, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Input, output and indices must be on the current device
conda env create -f ./environment/vtoonify_env.yaml
Collecting package metadata (repodata.json): done
Solving environment: failed
ResolvePackageNotFound:
- python-lmdb==1.2.1=py38h2531618_1
- scikit-image==0.18.1=py38ha9443f7_0
- libfaiss==1.7.1=hb573701_0_cpu
- libstdcxx-ng==9.3.0=h6de172a_19
- pillow==8.3.1=py38h2c7a002_0
- libedit==3.1.20191231=he28a2e2_2
- pytorch==1.7.1=py3.8_cuda10.1.243_cudnn7.6.3_0
- libgcc-ng==9.3.0=h2828fa1_19
- ca-certificates==2022.2.1=h06a4308_0
- python==3.8.3=cpython_he5300dc_0
- certifi==2021.10.8=py38h06a4308_2
- faiss==1.7.1=py38h7b17aaf_0_cpu
- _libgcc_mutex==0.1=conda_forge
- setuptools==49.6.0=py38h578d9bd_3
- matplotlib-base==3.3.4=py38h62a2d02_0
- libffi==3.2.1=he1b5a44_1007
Any info on that? The builds seem too specific.
Excuse me, the ”Collection-Based Portrait Video Style Transfer “ vtoonify use two different encoder,This is a speed consideration or a performance advantage, because I think the structure of PAN in PSP also has multi-scale feature maps before the map2style, which is similar to the downsampling in E
I tested my images by vtoonify_t_arcane
ckpt with below code :
python style_transfer.py --content ./data/038648.jpg \
--scale_image --backbone toonify \
--ckpt ./checkpoint/vtoonify_t_arcane/vtoonify.pt \
--padding 600 600 600 600
I found it sometimes work bad in body area , the background style is padding to the body area :
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.