GithubHelp home page GithubHelp logo

xymfei / adasr-talkinghead Goto Github PK

View Code? Open in Web Editor NEW

This project forked from songluchuan/adasr-talkinghead

0.0 0.0 0.0 19.21 MB

AdaSR-TalkingHead 人头图片高仿模拟输入的说话人物视频ICASSP2024: Adaptive Super Resolution For One-Shot Talking-Head Generation

Shell 0.01% C++ 0.06% Python 11.28% Cuda 0.48% Jupyter Notebook 88.17%

adasr-talkinghead's Introduction

Adaptive Super Resolution For One-Shot Talking-Head Generation

The repository for ICASSP2024 Adaptive Super Resolution For One-Shot Talking-Head Generation (AdaSR TalkingHead)

Abstract

The one-shot talking-head generation learns to synthesize a talking-head video with one source portrait image under the driving of same or different identity video. Usually these methods require plane-based pixel transformations via Jacobin matrices or facial image warps for novel poses generation. The constraints of using a single image source and pixel displacements often compromise the clarity of the synthesized images. Some methods try to improve the quality of synthesized videos by introducing additional super-resolution modules, but this will undoubtedly increase computational consumption and destroy the original data distribution. In this work, we propose an adaptive high-quality talking-head video generation method, which synthesizes high-resolution video without additional pre-trained modules. Specifically, inspired by existing super-resolution methods, we down-sample the one-shot source image, and then adaptively reconstruct high-frequency details via an encoder-decoder module, resulting in enhanced video clarity. Our method consistently improves the quality of generated videos through a straightforward yet effective strategy, substantiated by quantitative and qualitative evaluations. The code and demo video are available on: https://github.com/Songluchuan/AdaSR-TalkingHead/

Updates

  • [03/2024] Inference code and pretrained model are released.
  • [03/2024] Arxiv Link: https://arxiv.org/abs/2403.15944.
  • [COMING] Super-resolution model (based on StyleGANEX and ESRGAN).
  • [COMING] Train code and processed datasets.

Installation

Clone this repo:

git clone [email protected]:Songluchuan/AdaSR-TalkingHead.git
cd AdaSR-TalkingHead

Dependencies:

We have tested on:

  • CUDA 11.3-11.6
  • PyTorch 1.10.1
  • Matplotlib 3.4.3; Matplotlib 3.4.2; opencv-python 4.7.0; scikit-learn 1.0; tqdm 4.62.3

Inference Code

  1. Download the pretrained model on google drive: https://drive.google.com/file/d/1g58uuAyZFdny9_twvbv0AHxB9-03koko/view?usp=sharing (it is trained on the HDTF dataset), and put it under checkpoints/

  2. The demo video and reference image are under DEMO/

  3. The inference code is in the run_demo.sh, please run it with

bash run_demo.sh
  1. You can set different demo image and driven video in the run_demo.sh
--source_image DEMO/demo_img_3.jpg

and

--driving_video DEMO/demo_video_1.mp4

Video

Citation

@inproceedings{song2024adaptive,
  title={Adaptive Super Resolution for One-Shot Talking Head Generation},
  author={Song, Luchuan and Liu, Pinxin and Yin, Guojun and Xu, Chenliang},
  year={2024},
  organization={IEEE International Conference on Acoustics, Speech, and Signal Processing}
}

Acknowledgments

The code is mainly developed based on styleGANEX, ESRGAN and unofficial face2vid. Thanks to the authors contribution.

adasr-talkinghead's People

Contributors

songluchuan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.