Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis

This repository contains a PyTorch re-implementation of the paper: Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis (CVPR 2024).

| Arxiv | Video |

Installation

Requires Python 3.6+, Cuda 11.3+ and PyTorch 1.10+.

Tested in Linux and Anaconda3 with Python 3.9 and PyTorch 1.10.

Please refer to scripts/install.sh

conda create -n dyntet python=3.9
conda activate dyntet
conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge
pip install ninja imageio PyOpenGL glfw xatlas gdown
pip install git+https://github.com/NVlabs/nvdiffrast/
pip install git+https://github.com/facebookresearch/pytorch3d/
pip install --global-option="--no-networks" git+https://github.com/NVlabs/tiny-cuda-nn#subdirectory=bindings/torch
pip install scikit-learn configargparse face_alignment natsort matplotlib dominate tensorboard kornia trimesh open3d imageio-ffmpeg lpips easydict pysdf rich openpyxl gfpgan

Preparation

The following steps refer to AD-NeRF.

Prepare face-parsing model.

wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_parsing/79999_iter.pth?raw=true -O data_utils/face_parsing/79999_iter.pth

Prepare the 3DMM model for head pose estimation.

wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/exp_info.npy?raw=true -O data_utils/face_tracking/3DMM/exp_info.npy
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/keys_info.npy?raw=true -O data_utils/face_tracking/3DMM/keys_info.npy
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/sub_mesh.obj?raw=true -O data_utils/face_tracking/3DMM/sub_mesh.obj
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/topology_info.npy?raw=true -O data_utils/face_tracking/3DMM/topology_info.npy

Download 3DMM model from Basel Face Model 2009:

# 1. copy 01_MorphableModel.mat to data_util/face_tracking/3DMM/
# 2. cd data_utils/face_tracking && python convert_BFM.py

In addition, the following steps refer to Deep3DFace. We use 3DMM coefficients to drive talking heads.

Download the pre-trained model using this link (google drive) and organize the directory into the following structure:

data_utils
│
└───Deep3DFaceRecon
    │
    └─── checkpoints
        │
        └─── facerecon
            │
            └─── epoch_20.pth

For evaluation, download the pre-trained model arcface model and organize the directory into the following structure:

evaluate_utils
│
└───arcface
    │
    └─── model_ir_se50.pth

Usage

Pre-processing

Put training video under data/video/<ID>.mp4
- The video must be 25FPS, with all frames containing the talking person.
- Due to the usage of nvdiffrast, we will process video width and height into integers multiple of 8, like 448*448 and 512*512.
We get the experiment videos mainly from AD-NeRF, ER-NeRF, GeneFace and YouTube. Due to copyright restrictions, we can't distribute all of them. You may have to download and crop these videos by youself. Here is an example training video (Obama) from AD-NeRF.
```
mkdir -p data/video
wget https://github.com/YudongGuo/AD-NeRF/blob/master/dataset/vids/Obama.mp4?raw=true -O data/video/obama.mp4
```

Run script to process the video. (may take several hours)

python data_utils/process.py --path "data/video/obama.mp4" --save_dir "data/video/obama" --task -1

Train

To train the model on the Obama video:

python train.py --config configs/obama.json

Evaluation

To evaluate the trained model on the validation dataset:

python evaluate_utils/evaluate.py --train_dir out/obama

Inference

To infer the video of validation dataset:

python infer.py --config configs/obama.json

To infer the video with customized 3DMM coefficients, and (optionally) merge the video and audio:

python infer.py --config configs/obama.json --drive_3dmm data/test_audio/obama_sing_sadtalker.npy --audio data/test_audio/sing.wav

Note: Given an audio (e.g., AUDIO.wav), you can try SadTalker to generate the 3DMM coefficients mat file (e.g., FILE.mat) , then run

python infer.py --config configs/obama.json --drive_3dmm FILE.mat --audio AUDIO.wav

TODO

Release Code.
We consider that uploading a script that fine-tunes GFPGAN on DynTet to enhance the visual effects of talking head.

Citation

Consider citing as below if you find this repository helpful to your project:

@InProceedings{zhang2024learning,
    title={Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis}, 
    author={Zicheng Zhang and Ruobing Zheng and Ziwen Liu and Congying Han and Tianqi Li and Meng Wang and Tiande Guo and Jingdong Chen and Bonan Li and Ming Yang},
    booktitle={CVPR},
    year={2024}
}

Acknowledgements

This code is developed heavily relying on AD-NeRF for data processing, nvdiffrec for Marching Tetrahedra, Deep3DFace for 3DMM extraction. Some of the code is drawn from OTAvatar, RAD-NeRF and ER-NeRF. Thanks for these great projects. Please follow the license of the above open-source code