harlanhong / iccv2023-mcnet Goto Github PK

The official code of our ICCV2023 work: Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation

Python 97.24% Dockerfile 0.31% C++ 0.37% Cuda 2.08%

animation deepfake face gan image-generation motion-transfer reenactment talking-face-generation talking-head

iccv2023-mcnet's People

Contributors

Stargazers

Watchers

iccv2023-mcnet's Issues

make animation in a loop

I try to run this line in a loop to automatically generate multiple videos with different source and driving data. I note that if I use a loop to run make_animation twice on the same input data, the result will be slightly different (something like 1e-5 or so). Is it because the model needs reset for each run, or it's just floating point error? If the model needs reset, how to do it? Thanks!

关于数据集代码中

有个疑问就是红框中的代码，source取一个视频中的第一帧，driving取该视频的第二帧，这样两帧之间的差异不是很小吗？

你好我采用多卡训练出现了这个问题

one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32]] is at version 4; expected version 3 instead.

why did my demo running result has no teeth show up?

Why did my demo running result has no teeth show up? And how to crop image to ensure better result

512p training error

@harlanhong when I try to extend to 512p training, I get this issue after changing the image size in config file (as you recommended here: #4 (comment)). Which other changes should I keep in mind for 512p training?

Traceback (most recent call last):
  File "/home/ubuntu/code/ICCV2023-MCNET/run.py", line 256, in <module>
    train.train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer)
  File "/home/ubuntu/code/ICCV2023-MCNET/train.py", line 119, in train
    losses_generator, generated = generator_full(x,weight,epoch=epoch) 
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/code/ICCV2023-MCNET/modules/model.py", line 319, in forward
    generated = self.generator(x['source'], kp_source=kp_source, kp_driving=kp_driving, source_depth = depth_source, driving_depth = depth_driving,driving_image=x['driving'])
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/parallel/distributed.py", line 1156, in forward
    output = self._run_ddp_forward(*inputs, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
    return module_to_run(*inputs[0], **kwargs[0])  # type: ignore[index]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/code/ICCV2023-MCNET/modules/generator.py", line 497, in forward
    out = self.mbUnit(out,output_dict,keypoints = kp_source['value'])
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/code/ICCV2023-MCNET/modules/generator.py", line 309, in forward
    feat = eval('self.feat_forward_proj_{}'.format(w))(out_cs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Given groups=1, weight of size [512, 128, 1, 1], expected input[8, 256, 64, 64] to have 128 channels, but got 256 channels instead

Questions about pairing the original image with the driving frame with same id

Thank you for your excellent work. I'm trying to reproduce your work, but I have some problems. You said, "The source image and the driving video share the same identity in the training stage, so the sampled driving frame can be used as the ground-truth of a generated source-identity image." in your paper. But I noticed that in the code file "vox256.csv" seems like not aligh you paper's word, that the source image and the driving frame do not share the same id. So I'm litil bit confused about that, please tell me the correct trainning dataset configs. Thank you!

pretarined model

great work!
but It seems the link blow is the old model for Dagan2022 instead of the new pretarined model? Could you please check it out?

代码理解

    out = out * (1 - occlusion_last) + encode_i 
    out = self.final(out)
    out = torch.sigmoid(out)
    out = out * (1 - occlusion_last) + deformed_source * occlusion_last

请问可以解释一下这几行代码的含义吗，我理解的是out * (1 - occlusion_last)保留未遮挡部分，再加上需要修复的源区域？

License

Thanks for your work!
What's the license for this repo?

Which CSV partitioned dataset is used for Same-id Reenactment?

In run.py and reconstruction.py, if the Framedataset used for reconstruction doesn't utilize the CSV partitioning based on pair_list, could you please confirm if pre-partitioned test data is being used in the reconstruction process? Also, could you specify which file from the data directory is being used for this partitioning?

'GeneratorFullModel' object has no attribute 'mb'

Hi, it's a very nice work.
But when I train the model, I got a problem with 'GeneratorFullModel' object has no attribute 'mb', the 'train.py' shows in 163 lines, I don't find any code about 'mb' define.

512 size model

Hi, thank you for your research.
MCNET make realiable result compare than other face animation models. I like it.

Do you have any plan to share 512 size model?
If not, any guide or advice for training 512 size model? (ex. number of keypoint, training time or epoch, or anything regarding config.yaml)

best regards.

裁剪处理驱动后视频，如何还原到原图像中？

目的：驱动一张图像中的头像运动，
预处理：为了效果好进行了裁剪操作（得到关于头像的一部分，其余舍弃），
对预处理图像进行驱动，得到驱动后的视频，
那么如何将驱动后视频放回原图像中。

SyntaxError: unexpected EOF while parsing

self.mbUnit = eval(kwargs['mbunit'])(kwargs['mb_spatial'],kwargs['mb_channel'])
File "", line 0

SyntaxError: unexpected EOF while parsing

i find kwargs['mbunit'] is an empty str, which is "".

Try to train customer data but The results are not good such as blurring and ghosting

Completely reuse the hyperparameter settings from the original text
and we have Training data of equivalent magnitude for characters

hope you can help us We can provide reasonable consultancy fees

请问如何和 GFPGAN 结合呢

您好，看到您 project 的视频中使用了您的方法+GFPGAN 的效果，请问如何将两者结合呢

关于Global facial meta-memory的一点想法

你好我关注您的工作很久了，包括Dagan++和MCNET都很惊艳，在MCNET中您介绍了一种新颖的人脸特征补偿的方法，通过全局人脸特征记忆库来进行补偿，按文中描述这里的能力是通过大量数据学习得到的，而在真实场景中我理解人脸驱动数据集分为两种，一种是驱动数据类似talking head的数据集是动态视频，另一种是纯静态的人脸数据集，两个数据集的目的不同，动态视频数据主要学习动态变化，静态数据集主要学习人脸特征，既然如此，我有个大胆的想法，您提到的全局人脸元记忆库，有没有可能直接使用类似SD或者lora来提取人脸特征补偿。这么做有两个好处，第一这个模块可以直接用一些已经训练好的模型，第二SD这种模型的数据集一定是远大于动态视频素材里面相关的人脸ID，这样做是否有可能呢

动漫数据训练

作者您好，我尝试了复现MCNet在人脸上的效果，结果很好。但是当我用几乎相同体量的动漫数据集和相同的参数想训练一个动漫人脸的模型，但是结果并不想人脸那么好。主要就是面部表情没有一些细微的变化（张嘴或闭眼）经常不准确，而且相比真实人脸的生成结果动漫的特别模糊。我想请问一下这有可能是什么原因导致的。

id5_3.mp4

id4_4.mp4

开源的模型不是论文效果展示的模型吗？

result4.mp4

使用开源模型推理的效果，生成的视频，背景会受驱动视频影响，跟论文展示的效果，差别很大，是需要自己重新训练吗？

process killed

hi, when I run the command
' CUDA_VISIBLE_DEVICES=0 python demo.py --config config/vox-256.yaml --driving_video crop.mp4 --source_image test_images/a3.jpg --checkpoint 00000099-checkpoint.pth.tar --relative --adapt_scale --kp_num 15 --generator Unet_Generator_keypoint_aware --result_video result.mp4 --mbunit ExpendMemoryUnit --memsize 1 '
the process has been killed in the middle without any prompt. Could you please help me to find out the reason?

Training Hyperparameters?

Could you confirm if the training hyperparameters in vox-256.yaml are configured correctly?

I'm particularly puzzled about the generator_gan loss weight being set to 0, as seen here: https://github.com/harlanhong/ICCV2023-MCNET/blob/master/config/vox-256.yaml#L74. If I understand correctly, doesn't this imply that the discriminator's weights remain unchanged throughout training? https://github.com/harlanhong/ICCV2023-MCNET/blob/master/train.py#L138

Can you please clarify?

是否支持不同身份之间的重演训练？

我看论文原理，是没有强绑定sourc与target必须为同一个人的，但是在论文实验说明中，有明确说训练阶段source与target为同一id,以至于loss那块存在感知损失，且我看源码数据处理模块FramesDataset中，source与target也是同id.

因为我现在有个任务，是需要实现比较精细的不同id之间的重演，所以我想问一下，是否可以在该工程基础上训不同id的重演？希望作者能回复一下，不甚感激！

harlanhong / iccv2023-mcnet Goto Github PK

iccv2023-mcnet's People

Contributors

Stargazers

Watchers

Forkers

iccv2023-mcnet's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs