Awesome Audio-driven Talking Face Generation

2D Encoder-Decoder Based

StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN [F Yin 2022] [arXiv] demo project page
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation [Hang Zhou 2021] [CVPR] demo project page
Talking Head Generation with Audio and Speech Related Facial Action Units [S Chen 2021] [BMVC]
Speech Driven Talking Face Generation from a Single Image and an Emotion Condition [SE Eskimez 2021] [arXiv] project page
HeadGAN: Video-and-Audio-Driven Talking Head Synthesis [MC Doukas 2021] [arXiv] demo project page
Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning [Hao Zhu 2020] [IJCAI]
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild [K R Prajwal 2020] [ACMMM] demo project page
Talking Face Generation with Expression-Tailored Generative Adversarial Network [D Zeng 2020] [ACMMM]
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis [KR Prajwal 2020] [CVPR] demo project page
Robust One Shot Audio to Video Generation [N Kumar 2020] [CVPRW] demo project page
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation [Hang Zhou 2019] [AAAI] demo project page
Talking face generation by conditional recurrent adversarial network [Yang Song 2019] [IJCAI] demo project page
Realistic Speech-Driven Facial Animation with GANs [Konstantinos Vougioukas 2019] [IJCV] demo project page
Animating Face using Disentangled Audio Representations [G Mittal 2019] [WACV]
Lip Movements Generation at a Glance [Lele Chen 2018] [ECCV] demo project page
X2Face: A network for controlling face generation using images, audio, and pose codes [Olivia Wiles 2018] [ECCV] demo project page
Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network [HX Pham 2018] [arXiv] demo
You said that？ [Chung 2017] [BMVC] demo project page

Landmark Based

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation [YUANXUN LU 2021] [SIGGRAPH] demo project page
Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis [H Wu 2021] [ACMMM] demo project page
MakeItTalk: Speaker-Aware Talking-Head Animation [YANG ZHOU 2020] [SIGGRAPH] demo project page
Speech-driven Facial Animation using Cascaded GANs for Learning of Motion and Texture [Dipanjan Das, Sandika Biswas 2020] [ECCV]
A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors [R Zheng 2020] [ICPR]
Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise Loss [Lele Chen 2019] [CVPR] demo project page
Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks [SA Jalalifar 2018] [arXiv]
Synthesizing Obama: learning lip sync from audio [SUPASORN SUWAJANAKORN 2017] [SIGGRAPH] demo

3D Model Based

Everybody’s Talkin’: Let Me Talk as You Want [Linsen Song 2022] [TIFS] demo
One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning [Suzhen Wang 2022] [AAAI] demo projectpage
FaceFormer: Speech-Driven 3D Facial Animation with Transformers [Y Fan 2022] [CVPR] demo projectpage
Iterative Text-based Editing of Talking-heads Using Neural Retargeting [Xinwei Yao 2021] [ICML] demo
AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [Yudong Guo 2021] [ICCV] demo projectpage
Audio-driven emotional video portraits [X Ji 2021] [CVPR] demo projectpage
FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning [C Zhang 2021] [ICCV] demo projectpage
Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset [Z Zhang 2021] [CVPR] demo projectpage
Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion [Suzhen Wang 2021] [IJCAI] demo projectpage
MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement [A Richard 2021] [ICCV] demo projectpage
3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head [Q Wang 2021] [arXiv]
Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation [L Li 2021] [AAAI] demo projectpage
Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary [S Zhang 2021 ] [ICASSP] demo projectpage
Neural Voice Puppetry: Audio-driven Facial Reenactment [Justus Thies 2020] [ECCV] demo projectpage
Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose [Ran Yi 2020] [arXiv] projectpage
Talking-head Generation with Rhythmic Head Motion [Lele Chen 2020] [ECCV] demo projectpage
Modality Dropout for Improved Performance-driven Talking Faces [‎Hussen Abdelaziz 2020] [ICMI]
Audio- and Gaze-driven Facial Animation of Codec Avatars [A Richard 2020] [arXiv] demo projectpage
Text-based editing of talking-head video [OHAD FRIED 2019] [arXiv] demo
Capture, Learning, and Synthesis of 3D Speaking Styles [D Cudeiro 2019] [CVPR] demo projectpage
Visemenet: audio-driven animator-centric speech animation [YANG ZHOU 2018] [TOG] demo
Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks [N Sadoughi 2018] [TAC]
Speech-driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach [Hai X. Pham 2017] [IEEE Trans. Syst. Man Cybern.: Syst.]
Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion [TERO KARRAS 2017] [TOG] demo projectpage
A deep learning approach for generalized speech animation [SARAH TAYLOR 2017] [SIGGRAPH] demo
End-to-end Learning for 3D Facial Animation from Speech [HX Pham 2017] [ICMI]
JALI: An Animator-Centric Viseme Model for Expressive Lip Synchronization [Pif Edwards 2016] [SIGGRAPH] demo

Survey

What comprises a good talking-head video generation?: A Survey and Benchmark [Lele Chen 2020] paper

Deep Audio-Visual Learning: A Survey [Hao Zhu 2020] paper

Handbook of Digital Face Manipulation and Detection [Yuxin Wang 2022] paper

Deep Learning for Visual Speech Analysis: A Survey paper

Datasets

GRID 2006 project page
TCD-TIMIT 2015 project page
LRW 2016 project page
MODALITY 2017 project page
ObamaSet 2017
Voxceleb1 2017 project page
Voxceleb2 2018 project page
LRS2-BBC 2018 project page
LRS3-TED 2018 project page
HDTF 2020 project page
CREMA-D 2014 project page
MSP-IMPROV 2016 project page
RAVDESS 2018 project page
MELD 2018 project page
MEAD 2020 project page
CAVSR1.0 1998
HIT Bi-CAV 2005
LRW-1000 2018 project page

Metrics

Metrics	Paper
PSNR (peak signal-to-noise ratio)	-
SSIM (structural similarity index measure)	Image quality assessment: from error visibility to structural similarity.
CPBD(cumulative probability of blur detection)	A no-reference image blur metric based on the cumulative probability of blur detection
LPIPS (Learned Perceptual Image Patch Similarity) -	The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
NIQE (Natural Image Quality Evaluator)	Making a ‘Completely Blind’ Image Quality Analyzer
FID (Fréchet inception distance)	GANs trained by a two time-scale update rule converge to a local nash equilibrium
LMD (landmark distance error)	Lip Movements Generation at a Glance
LRA (lip-reading accuracy)	Talking Face Generation by Conditional Recurrent Adversarial Network
WER(word error rate)	Lipnet: end-to-end sentencelevel lipreading.
LSE-D (Lip Sync Error - Distance)	Out of time: automated lip sync in the wild
LSE-C (Lip Sync Error - Confidence)	Out of time: automated lip sync in the wild
ACD(Average content distance)	Facenet: a unified embedding for face recognition and clustering.
CSIM(cosine similarity)	Arcface: additive angular margin loss for deep face recognition.
EAR(eye aspect ratio)	Real-time eye blink detection using facial landmarks. In: Computer Vision Winter Workshop
ESD(emotion similarity distance)	What comprises a good talking-head video generation?: A Survey and Benchmark

kangweiiliu / awesome_audio-driven_talking-face-generation Goto Github PK

awesome_audio-driven_talking-face-generation's Introduction

Awesome Audio-driven Talking Face Generation

2D Encoder-Decoder Based

Landmark Based

3D Model Based

Survey

Datasets

Metrics

awesome_audio-driven_talking-face-generation's People

Contributors

Stargazers

Watchers

Forkers

awesome_audio-driven_talking-face-generation's Issues

Real time audio driven face generation

About BlendShape

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs