GithubHelp home page GithubHelp logo

kangweiiliu / awesome_audio-driven_talking-face-generation Goto Github PK

View Code? Open in Web Editor NEW
114.0 1.0 10.0 17 KB

A curated list of resources of audio-driven talking face generation

talking-face-generation audio-driven-talking-face speech-driven-talking-face generative-adversarial-networks paperlist controllable-generation

awesome_audio-driven_talking-face-generation's Introduction

Awesome Audio-driven Talking Face Generation

2D Encoder-Decoder Based

  • StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN [F Yin 2022] [arXiv] demo project page
  • Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation [Hang Zhou 2021] [CVPR] demo project page
  • Talking Head Generation with Audio and Speech Related Facial Action Units [S Chen 2021] [BMVC]
  • Speech Driven Talking Face Generation from a Single Image and an Emotion Condition [SE Eskimez 2021] [arXiv] project page
  • HeadGAN: Video-and-Audio-Driven Talking Head Synthesis [MC Doukas 2021] [arXiv] demo project page
  • Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning [Hao Zhu 2020] [IJCAI]
  • A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild [K R Prajwal 2020] [ACMMM] demo project page
  • Talking Face Generation with Expression-Tailored Generative Adversarial Network [D Zeng 2020] [ACMMM]
  • Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis [KR Prajwal 2020] [CVPR] demo project page
  • Robust One Shot Audio to Video Generation [N Kumar 2020] [CVPRW] demo project page
  • Talking Face Generation by Adversarially Disentangled Audio-Visual Representation [Hang Zhou 2019] [AAAI] demo project page
  • Talking face generation by conditional recurrent adversarial network [Yang Song 2019] [IJCAI] demo project page
  • Realistic Speech-Driven Facial Animation with GANs [Konstantinos Vougioukas 2019] [IJCV] demo project page
  • Animating Face using Disentangled Audio Representations [G Mittal 2019] [WACV]
  • Lip Movements Generation at a Glance [Lele Chen 2018] [ECCV] demo project page
  • X2Face: A network for controlling face generation using images, audio, and pose codes [Olivia Wiles 2018] [ECCV] demo project page
  • Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network [HX Pham 2018] [arXiv] demo
  • You said that? [Chung 2017] [BMVC] demo project page

Landmark Based

  • Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation [YUANXUN LU 2021] [SIGGRAPH] demo project page
  • Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis [H Wu 2021] [ACMMM] demo project page
  • MakeItTalk: Speaker-Aware Talking-Head Animation [YANG ZHOU 2020] [SIGGRAPH] demo project page
  • Speech-driven Facial Animation using Cascaded GANs for Learning of Motion and Texture [Dipanjan Das, Sandika Biswas 2020] [ECCV]
  • A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors [R Zheng 2020] [ICPR]
  • Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise Loss [Lele Chen 2019] [CVPR] demo project page
  • Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks [SA Jalalifar 2018] [arXiv]
  • Synthesizing Obama: learning lip sync from audio [SUPASORN SUWAJANAKORN 2017] [SIGGRAPH] demo

3D Model Based

  • Everybody’s Talkin’: Let Me Talk as You Want [Linsen Song 2022] [TIFS] demo
  • One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning [Suzhen Wang 2022] [AAAI] demo projectpage
  • FaceFormer: Speech-Driven 3D Facial Animation with Transformers [Y Fan 2022] [CVPR] demo projectpage
  • Iterative Text-based Editing of Talking-heads Using Neural Retargeting [Xinwei Yao 2021] [ICML] demo
  • AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [Yudong Guo 2021] [ICCV] demo projectpage
  • Audio-driven emotional video portraits [X Ji 2021] [CVPR] demo projectpage
  • FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning [C Zhang 2021] [ICCV] demo projectpage
  • Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset [Z Zhang 2021] [CVPR] demo projectpage
  • Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion [Suzhen Wang 2021] [IJCAI] demo projectpage
  • MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement [A Richard 2021] [ICCV] demo projectpage
  • 3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head [Q Wang 2021] [arXiv]
  • Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation [L Li 2021] [AAAI] demo projectpage
  • Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary [S Zhang 2021 ] [ICASSP] demo projectpage
  • Neural Voice Puppetry: Audio-driven Facial Reenactment [Justus Thies 2020] [ECCV] demo projectpage
  • Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose [Ran Yi 2020] [arXiv] projectpage
  • Talking-head Generation with Rhythmic Head Motion [Lele Chen 2020] [ECCV] demo projectpage
  • Modality Dropout for Improved Performance-driven Talking Faces [‎Hussen Abdelaziz 2020] [ICMI]
  • Audio- and Gaze-driven Facial Animation of Codec Avatars [A Richard 2020] [arXiv] demo projectpage
  • Text-based editing of talking-head video [OHAD FRIED 2019] [arXiv] demo
  • Capture, Learning, and Synthesis of 3D Speaking Styles [D Cudeiro 2019] [CVPR] demo projectpage
  • Visemenet: audio-driven animator-centric speech animation [YANG ZHOU 2018] [TOG] demo
  • Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks [N Sadoughi 2018] [TAC]
  • Speech-driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach [Hai X. Pham 2017] [IEEE Trans. Syst. Man Cybern.: Syst.]
  • Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion [TERO KARRAS 2017] [TOG] demo projectpage
  • A deep learning approach for generalized speech animation [SARAH TAYLOR 2017] [SIGGRAPH] demo
  • End-to-end Learning for 3D Facial Animation from Speech [HX Pham 2017] [ICMI]
  • JALI: An Animator-Centric Viseme Model for Expressive Lip Synchronization [Pif Edwards 2016] [SIGGRAPH] demo

Survey

What comprises a good talking-head video generation?: A Survey and Benchmark [Lele Chen 2020] paper

Deep Audio-Visual Learning: A Survey [Hao Zhu 2020] paper

Handbook of Digital Face Manipulation and Detection [Yuxin Wang 2022] paper

Deep Learning for Visual Speech Analysis: A Survey paper

Datasets

Metrics

Metrics Paper
PSNR (peak signal-to-noise ratio) -
SSIM (structural similarity index measure) Image quality assessment: from error visibility to structural similarity.
CPBD(cumulative probability of blur detection) A no-reference image blur metric based on the cumulative probability of blur detection
LPIPS (Learned Perceptual Image Patch Similarity) - The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
NIQE (Natural Image Quality Evaluator) Making a ‘Completely Blind’ Image Quality Analyzer
FID (Fréchet inception distance) GANs trained by a two time-scale update rule converge to a local nash equilibrium
LMD (landmark distance error) Lip Movements Generation at a Glance
LRA (lip-reading accuracy) Talking Face Generation by Conditional Recurrent Adversarial Network
WER(word error rate) Lipnet: end-to-end sentencelevel lipreading.
LSE-D (Lip Sync Error - Distance) Out of time: automated lip sync in the wild
LSE-C (Lip Sync Error - Confidence) Out of time: automated lip sync in the wild
ACD(Average content distance) Facenet: a unified embedding for face recognition and clustering.
CSIM(cosine similarity) Arcface: additive angular margin loss for deep face recognition.
EAR(eye aspect ratio) Real-time eye blink detection using facial landmarks. In: Computer Vision Winter Workshop
ESD(emotion similarity distance) What comprises a good talking-head video generation?: A Survey and Benchmark

awesome_audio-driven_talking-face-generation's People

Contributors

kangweiiliu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

awesome_audio-driven_talking-face-generation's Issues

Real time audio driven face generation

Hello,

Thank you for sharing this list. I'm new to this and it has helped me a lot.

I want to generate a photo-realistic avatar of myself and connect with chatGPT. For this, I'm looking for a lip sync / audio-driven face generation model that works in real time. I'm okay with models that can be trained/overfitted to my face (ie, I don't want it to be generic or a single shot). Any pointers, ideas, or suggestions is deeply appreciated. I have been trying to find a solution to this for a while and still has no clue.

Thank you,
Thomas.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.