GithubHelp home page GithubHelp logo

dyntet's Introduction

Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis

This repository contains a PyTorch re-implementation of the paper: Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis (CVPR 2024).

| Arxiv | Video |

Installation

Requires Python 3.6+, Cuda 11.3+ and PyTorch 1.10+.

Tested in Linux and Anaconda3 with Python 3.9 and PyTorch 1.10.

Please refer to scripts/install.sh

conda create -n dyntet python=3.9
conda activate dyntet
conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge
pip install ninja imageio PyOpenGL glfw xatlas gdown
pip install git+https://github.com/NVlabs/nvdiffrast/
pip install git+https://github.com/facebookresearch/pytorch3d/
pip install --global-option="--no-networks" git+https://github.com/NVlabs/tiny-cuda-nn#subdirectory=bindings/torch
pip install scikit-learn configargparse face_alignment natsort matplotlib dominate tensorboard kornia trimesh open3d imageio-ffmpeg lpips easydict pysdf rich openpyxl gfpgan

Preparation

The following steps refer to AD-NeRF.

  • Prepare face-parsing model.

    wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_parsing/79999_iter.pth?raw=true -O data_utils/face_parsing/79999_iter.pth
  • Prepare the 3DMM model for head pose estimation.

    wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/exp_info.npy?raw=true -O data_utils/face_tracking/3DMM/exp_info.npy
    wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/keys_info.npy?raw=true -O data_utils/face_tracking/3DMM/keys_info.npy
    wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/sub_mesh.obj?raw=true -O data_utils/face_tracking/3DMM/sub_mesh.obj
    wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/topology_info.npy?raw=true -O data_utils/face_tracking/3DMM/topology_info.npy
  • Download 3DMM model from Basel Face Model 2009:

    # 1. copy 01_MorphableModel.mat to data_util/face_tracking/3DMM/
    # 2. cd data_utils/face_tracking && python convert_BFM.py
    

In addition, the following steps refer to Deep3DFace. We use 3DMM coefficients to drive talking heads.

  • Download the pre-trained model using this link (google drive) and organize the directory into the following structure:
data_utils
│
└───Deep3DFaceRecon
    │
    └─── checkpoints
        │
        └─── facerecon
            │
            └─── epoch_20.pth

For evaluation, download the pre-trained model arcface model and organize the directory into the following structure:

evaluate_utils
│
└───arcface
    │
    └─── model_ir_se50.pth

Usage

Pre-processing

  • Put training video under data/video/<ID>.mp4

    • The video must be 25FPS, with all frames containing the talking person.
    • Due to the usage of nvdiffrast, we will process video width and height into integers multiple of 8, like 448*448 and 512*512.

    We get the experiment videos mainly from AD-NeRF, ER-NeRF, GeneFace and YouTube. Due to copyright restrictions, we can't distribute all of them. You may have to download and crop these videos by youself. Here is an example training video (Obama) from AD-NeRF.

    mkdir -p data/video
    wget https://github.com/YudongGuo/AD-NeRF/blob/master/dataset/vids/Obama.mp4?raw=true -O data/video/obama.mp4
    
  • Run script to process the video. (may take several hours)

    python data_utils/process.py --path "data/video/obama.mp4" --save_dir "data/video/obama" --task -1

Train

To train the model on the Obama video:

python train.py --config configs/obama.json

Evaluation

To evaluate the trained model on the validation dataset:

python evaluate_utils/evaluate.py --train_dir out/obama

Inference

To infer the video of validation dataset:

python infer.py --config configs/obama.json 

To infer the video with customized 3DMM coefficients, and (optionally) merge the video and audio:

python infer.py --config configs/obama.json --drive_3dmm data/test_audio/obama_sing_sadtalker.npy --audio data/test_audio/sing.wav

Note: Given an audio (e.g., AUDIO.wav), you can try SadTalker to generate the 3DMM coefficients mat file (e.g., FILE.mat) , then run

python infer.py --config configs/obama.json --drive_3dmm FILE.mat --audio AUDIO.wav

TODO

  • Release Code.
  • We consider that uploading a script that fine-tunes GFPGAN on DynTet to enhance the visual effects of talking head.

Citation

Consider citing as below if you find this repository helpful to your project:

@InProceedings{zhang2024learning,
    title={Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis}, 
    author={Zicheng Zhang and Ruobing Zheng and Ziwen Liu and Congying Han and Tianqi Li and Meng Wang and Tiande Guo and Jingdong Chen and Bonan Li and Ming Yang},
    booktitle={CVPR},
    year={2024}
}

Acknowledgements

This code is developed heavily relying on AD-NeRF for data processing, nvdiffrec for Marching Tetrahedra, Deep3DFace for 3DMM extraction. Some of the code is drawn from OTAvatar, RAD-NeRF and ER-NeRF. Thanks for these great projects. Please follow the license of the above open-source code

dyntet's People

Contributors

zhangzc21 avatar

Stargazers

Vardan Agarwal avatar NewCoderQ avatar  avatar Unprocessable Man avatar 狼小叁 avatar Ashley En avatar  avatar wangdongdong avatar 安琪 avatar  avatar  avatar Zhao Xing avatar  avatar Feijiang Han avatar Token2019 avatar Q7E 98C 36E avatar  avatar guanglinmei avatar timothy Rasinski avatar 両角凛 avatar  avatar Cheng Yang avatar 0xLemon avatar 虞兮曦 avatar  avatar lyirs avatar Farming Tong avatar huangshenneng avatar  avatar Gaoheng Zhang avatar Hongyu Xiang avatar DELAG avatar Awesome King avatar Denise Turner DVM avatar 卍 avatar Cynthia Xin avatar hellozim22 avatar cookie08 avatar Guo Lin avatar Colton Sun avatar  avatar No.67 avatar 张思绮 avatar Dia avatar  avatar 李易连 avatar 王正君 avatar Yuanhao Li avatar Steven Nelson avatar Nowwa avatar  avatar Cyborg Girl avatar  avatar Bart's Lab avatar  avatar 白马非马 avatar  avatar Winter Cao avatar Liu 宇阳 avatar 代码妖孽 avatar Annabelle Lane avatar  avatar  avatar  avatar Nate River avatar  avatar mr zhang avatar Zen Obsidian avatar  avatar Always in Good Company avatar Hansheng GUO avatar K.N. Sun avatar zhangYin avatar Nicholas Baird avatar Hồ Thi Tý avatar Dainis Graveris avatar Gloria Legere / Food Designer / Graphic designer/ Food stylist avatar Xing Di (底兴) avatar  avatar MinerProxy avatar 马志宇 avatar yanzx avatar Margaret Wong avatar  avatar BLOCKCHAIN DEV avatar 热心市民黄先生 avatar Ether Line avatar iacker avatar Jack Mu avatar Nexta avatar  avatar  avatar Joon Ki Hong avatar Yuchen Shi avatar pilipala avatar 胡煜东 avatar  avatar Pawan Sharma avatar Sushanth Reddy avatar  avatar

Watchers

Rishikesh (ऋषिकेश) avatar Grant Shaddick avatar L.JIE avatar Snow avatar  avatar  avatar Vector Ventures avatar Farming Tong avatar 热心市民黄先生 avatar Nowwa avatar Margaret Wong avatar 李易连 avatar  avatar Saravana Rathinam avatar No.67 avatar Inferencer avatar Colligram avatar

dyntet's Issues

The system cannot find the specified path

I tried to run the script to process the video and the system is reporting that the video path cannot be found. I put the video obama.mp4 in the data\video folder. What should I do?

error

(DynTet) D:\DynTet>python data_utils/process.py --path "data/video/obama.mp4" --save_dir "data/video/obama" --task -1
[INFO] ===== extract audio from data/video/obama.mp4 to data/video/obama\aud.wav =====
O sistema não pode encontrar o caminho especificado.
[INFO] ===== extracted audio =====
[INFO] ===== extract images from data/video/obama.mp4 to data/video/obama\ori_imgs =====
/usr/bin/ffmpeg -y -i data/video/obama.mp4 -vf "scale=trunc(iw/16)*16:trunc(ih/16)*16" -c:a copy -c:v h264 -crf 20 C:\Users\Bruno\AppData\Local\Temp\tmpho67t3d5.mp4
O sistema não pode encontrar o caminho especificado.
Traceback (most recent call last):
File "D:\DynTet\data_utils\process.py", line 525, in
extract_images(opt.path, ori_imgs_dir)
File "D:\DynTet\data_utils\process.py", line 36, in extract_images
assert return_code == 0
AssertionError

Question to the Obama_sing_sadtalker.npy

Hello hello,

its me again. What does this mean? --drive_3dmm data/test_audio/obama_sing_sadtalker.npy
How can I create this npy for my own video? Is it necessary? Or whats the best approach to insert an audio file and get the lip sync video back?

thanks!

ModuleNotFoundError: No module named 'evaluate_utils'

Hey there,

again. Great Project! I sucessfully managed to complete the training on a GCP Ubuntu 20.04 VM Instance. But after the training I try to evaluate and get this error. Any Idea whats happening here?

Screenshot 2024-04-25 at 06 01 43

File Structure:

Screenshot 2024-04-25 at 06 03 28

Thanks!

No module named 'tinycudann'

I tried trained but got this error. What to do?
error2

import tinycudann as tcnn
ModuleNotFoundError: No module named 'tinycudann'

3DMM audio coefficients

Hello! I've been exploring the DynTet repository, and I came across the note in the README that mentions using SadTalker to generate the 3DMM coefficients given audio. The note states, "Note: Given an audio, you can try SadTalker to generate the 3DMM coefficients."

I followed the instructions, installed SadTalker, and ran it as directed. However, I did not see any npy file being produced. I'm particularly interested in obtaining another audio npy file.

Could you please provide more details on how I can obtain this file? Is there a specific audio input or additional steps required to generate the npy file using SadTalker?

Thank you for your assistance!
Best regards,
Manal

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.