GithubHelp home page GithubHelp logo

harlanhong / iccv2023-mcnet Goto Github PK

View Code? Open in Web Editor NEW
223.0 7.0 20.0 110.4 MB

The official code of our ICCV2023 work: Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation

Python 97.24% Dockerfile 0.31% C++ 0.37% Cuda 2.08%
animation image-generation motion-transfer talking-face-generation talking-head deepfake face gan reenactment

iccv2023-mcnet's Introduction

📖 Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head Video Generation (ICCV 2023)

🔥 If MCNet is helpful in your photos/projects, please help to ⭐ it or recommend it to your friends. Thanks🔥

[Paper   [Project Page]   [Poster Video]

Fa-Ting Hong, Dan Xu
The Hong Kong University of Science and Technology

Interesting Sample

mcnet-.01_1.-.Compressed.with.FlexClip.mp4

🚩 Updates

  • 🔥🔥✅ July 20 2023: Our new talking head work MCNet was accpted by ICCV2023. The code will be released in ten days. You can talk a look at our previous work DaGAN first.

🔧 Dependencies and Installation

Installation

We now provide a clean version of MCNet, which does not require customized CUDA extensions.

  1. Clone repo

    git clone https://github.com/harlanhong/ICCV2023-MCNET.git
    cd ICCV2023-MCNET
  2. Install dependent packages

    pip install -r requirements.txt
    
    ## Install the Face Alignment lib
    cd face-alignment
    pip install -r requirements.txt
    python setup.py install

⚡ Quick Inference

We take the paper version for an example. More models can be found here.

YAML configs

See config/vox-256.yaml to get description of each parameter.

Pre-trained checkpoint

The pre-trained checkpoint of face depth network and our MCNet checkpoints can be found under following link: OneDrive.

Inference! To run a demo, download checkpoint and run the following command:

CUDA_VISIBLE_DEVICES=0 python demo.py  --config config/vox-256.yaml --driving_video path/to/driving --source_image path/to/source --checkpoint path/to/checkpoint --relative --adapt_scale --kp_num 15 --generator Unet_Generator_keypoint_aware --result_video path/to/result --mbunit ExpendMemoryUnit --memsize 1 

The result will be stored in path/to/result. The driving videos and source images should be cropped before it can be used in our method. To obtain some semi-automatic crop suggestions you can use python crop-video.py --inp some_youtube_video.mp4. It will generate commands for crops using ffmpeg.

💻 Training

Datasets

  1. VoxCeleb. Please follow the instruction from https://github.com/AliaksandrSiarohin/video-preprocessing.

Train on VoxCeleb

To train a model on specific dataset run:

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_addr="0.0.0.0" --master_port=12347 run.py --config config/vox-256.yaml --name MCNet --batchsize 8 --kp_num 15 --generator Unet_Generator_keypoint_aware --GFM GeneratorFullModel --memsize 1 --kp_distance 10 --feat_consistent 10 --generator_gan 0 --mbunit ExpendMemoryUnit

The code will create a folder in the log directory (each run will create a new name-specific directory). Checkpoints will be saved to this folder. To check the loss values during training see log.txt. By default the batch size is tunned to run on 8 GeForce RTX 3090 gpu (You can obtain the best performance after about 150 epochs). You can change the batch size in the train_params in .yaml file.

Also, you can watch the training loss by running the following command:

tensorboard --logdir log/MCNet/log

When you kill your process for some reasons in the middle of training, a zombie process may occur, you can kill it using our provided tool:

python kill_port.py PORT

Training on your own dataset

  1. Resize all the videos to the same size e.g 256x256, the videos can be in '.gif', '.mp4' or folder with images. We recommend the later, for each video make a separate folder with all the frames in '.png' format. This format is loss-less, and it has better i/o performance.

  2. Create a folder data/dataset_name with 2 subfolders train and test, put training videos in the train and testing in the test.

  3. Create a config config/dataset_name.yaml, in dataset_params specify the root dir the root_dir: data/dataset_name. Also adjust the number of epoch in train_params.

📜 Acknowledgement

Our MCNet implementation is inspired by FOMM. We appreciate the authors of FOMM for making their codes available to public.

📜 BibTeX

@inproceedings{hong23implicit,
            title={Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation},
            author={Hong, Fa-Ting and Xu, Dan},
            booktitle={ICCV},
            year={2023}
          }

@inproceedings{hong2022depth,
            title={Depth-Aware Generative Adversarial Network for Talking Head Video Generation},
            author={Hong, Fa-Ting and Zhang, Longhao and Shen, Li and Xu, Dan},
            journal={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
            year={2022}
          }

@inproceedings{hong2023depth,
            title={DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation},
            author={Hong, Fa-Ting and and Shen, Li and Xu, Dan},
            journal={arXiv preprint arXiv:2305.06225},
            year={2023}
          }

📧 Contact

If you have any question or collaboration need (research purpose or commercial purpose), please email [email protected].

iccv2023-mcnet's People

Contributors

harlanhong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

iccv2023-mcnet's Issues

make animation in a loop

I try to run this line in a loop to automatically generate multiple videos with different source and driving data. I note that if I use a loop to run make_animation twice on the same input data, the result will be slightly different (something like 1e-5 or so). Is it because the model needs reset for each run, or it's just floating point error? If the model needs reset, how to do it? Thanks!

裁剪处理驱动后视频,如何还原到原图像中?

目的: 驱动一张图像中的头像运动,
预处理:为了效果好进行了裁剪操作(得到关于头像的一部分,其余舍弃),
对预处理图像进行驱动,得到驱动后的视频,
那么如何将驱动后视频放回原图像中。

512 size model

Hi, thank you for your research.
MCNET make realiable result compare than other face animation models. I like it.

Do you have any plan to share 512 size model?
If not, any guide or advice for training 512 size model? (ex. number of keypoint, training time or epoch, or anything regarding config.yaml)

best regards.

代码理解

    out = out * (1 - occlusion_last) + encode_i 
    out = self.final(out)
    out = torch.sigmoid(out)
    out = out * (1 - occlusion_last) + deformed_source * occlusion_last

请问可以解释一下这几行代码的含义吗,我理解的是out * (1 - occlusion_last)保留未遮挡部分,再加上需要修复的源区域?

Training Hyperparameters?

Could you confirm if the training hyperparameters in vox-256.yaml are configured correctly?

I'm particularly puzzled about the generator_gan loss weight being set to 0, as seen here: https://github.com/harlanhong/ICCV2023-MCNET/blob/master/config/vox-256.yaml#L74. If I understand correctly, doesn't this imply that the discriminator's weights remain unchanged throughout training? https://github.com/harlanhong/ICCV2023-MCNET/blob/master/train.py#L138

Can you please clarify?

SyntaxError: unexpected EOF while parsing

self.mbUnit = eval(kwargs['mbunit'])(kwargs['mb_spatial'],kwargs['mb_channel'])
File "", line 0

^

SyntaxError: unexpected EOF while parsing

i find kwargs['mbunit'] is an empty str, which is "".

pretarined model

great work!
but It seems the link blow is the old model for Dagan2022 instead of the new pretarined model? Could you please check it out?

关于Global facial meta-memory的一点想法

你好我关注您的工作很久了,包括Dagan++和MCNET都很惊艳,在MCNET中您介绍了一种新颖的人脸特征补偿的方法,通过全局人脸特征记忆库来进行补偿,按文中描述这里的能力是通过大量数据学习得到的,而在真实场景中我理解人脸驱动数据集分为两种,一种是驱动数据类似talking head的数据集是动态视频,另一种是纯静态的人脸数据集,两个数据集的目的不同,动态视频数据主要学习动态变化,静态数据集主要学习人脸特征,既然如此,我有个大胆的想法,您提到的全局人脸元记忆库,有没有可能直接使用类似SD或者lora来提取人脸特征补偿。这么做有两个好处,第一这个模块可以直接用一些已经训练好的模型,第二SD这种模型的数据集一定是远大于动态视频素材里面相关的人脸ID,这样做是否有可能呢

Which CSV partitioned dataset is used for Same-id Reenactment?

In run.py and reconstruction.py, if the Framedataset used for reconstruction doesn't utilize the CSV partitioning based on pair_list, could you please confirm if pre-partitioned test data is being used in the reconstruction process? Also, could you specify which file from the data directory is being used for this partitioning?

512p training error

@harlanhong when I try to extend to 512p training, I get this issue after changing the image size in config file (as you recommended here: #4 (comment)). Which other changes should I keep in mind for 512p training?

Traceback (most recent call last):
  File "/home/ubuntu/code/ICCV2023-MCNET/run.py", line 256, in <module>
    train.train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer)
  File "/home/ubuntu/code/ICCV2023-MCNET/train.py", line 119, in train
    losses_generator, generated = generator_full(x,weight,epoch=epoch) 
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/code/ICCV2023-MCNET/modules/model.py", line 319, in forward
    generated = self.generator(x['source'], kp_source=kp_source, kp_driving=kp_driving, source_depth = depth_source, driving_depth = depth_driving,driving_image=x['driving'])
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/parallel/distributed.py", line 1156, in forward
    output = self._run_ddp_forward(*inputs, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
    return module_to_run(*inputs[0], **kwargs[0])  # type: ignore[index]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/code/ICCV2023-MCNET/modules/generator.py", line 497, in forward
    out = self.mbUnit(out,output_dict,keypoints = kp_source['value'])
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/code/ICCV2023-MCNET/modules/generator.py", line 309, in forward
    feat = eval('self.feat_forward_proj_{}'.format(w))(out_cs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/lib/python3.11/site-packages/torch-2.0.1-py3.11-linux-x86_64.egg/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Given groups=1, weight of size [512, 128, 1, 1], expected input[8, 256, 64, 64] to have 128 channels, but got 256 channels instead

动漫数据训练

作者您好,我尝试了复现MCNet在人脸上的效果,结果很好。但是当我用几乎相同体量的动漫数据集和相同的参数想训练一个动漫人脸的模型,但是结果并不想人脸那么好。主要就是面部表情没有一些细微的变化(张嘴或闭眼)经常不准确,而且相比真实人脸的生成结果动漫的特别模糊。我想请问一下这有可能是什么原因导致的。

id5_3.mp4
id4_4.mp4

关于数据集代码中

WX20240201-182259
有个疑问就是红框中的代码,source取一个视频中的第一帧,driving取该视频的第二帧,这样两帧之间的差异不是很小吗?

process killed

hi, when I run the command
' CUDA_VISIBLE_DEVICES=0 python demo.py --config config/vox-256.yaml --driving_video crop.mp4 --source_image test_images/a3.jpg --checkpoint 00000099-checkpoint.pth.tar --relative --adapt_scale --kp_num 15 --generator Unet_Generator_keypoint_aware --result_video result.mp4 --mbunit ExpendMemoryUnit --memsize 1 '
the process has been killed in the middle without any prompt. Could you please help me to find out the reason?

'GeneratorFullModel' object has no attribute 'mb'

Hi, it's a very nice work.
But when I train the model, I got a problem with 'GeneratorFullModel' object has no attribute 'mb', the 'train.py' shows in 163 lines, I don't find any code about 'mb' define.

Questions about pairing the original image with the driving frame with same id

Thank you for your excellent work. I'm trying to reproduce your work, but I have some problems. You said, "The source image and the driving video share the same identity in the training stage, so the sampled driving frame can be used as the ground-truth of a generated source-identity image." in your paper. But I noticed that in the code file "vox256.csv" seems like not aligh you paper's word, that the source image and the driving frame do not share the same id. So I'm litil bit confused about that, please tell me the correct trainning dataset configs. Thank you!

是否支持不同身份之间的重演训练?

我看论文原理,是没有强绑定sourc与target必须为同一个人的,但是在论文实验说明中,有明确说训练阶段source与target为同一id,以至于loss那块存在感知损失,且我看源码数据处理模块FramesDataset中,source与target也是同id.

因为我现在有个任务,是需要实现比较精细的不同id之间的重演,所以我想问一下,是否可以在该工程基础上训不同id的重演?希望作者能回复一下,不甚感激!

License

Thanks for your work!
What's the license for this repo?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.