GithubHelp home page GithubHelp logo

navezjt / lihq Goto Github PK

View Code? Open in Web Editor NEW

This project forked from johngettings/lihq

0.0 0.0 0.0 109.17 MB

Long-Inference, High Quality Synthetic Speaker (AI avatar/ AI presenter)

Python 100.00%

lihq's Introduction

LIHQ

Long-Inference, High Quality Synthetic Speaker

LIHQ is not a new architecture, it is an application that utilizes several open source deep learning models to generate an artifical speaker of your own design. It was built to be run in google colab to take advantage of free/ cheap GPU with basically zero setup and designed to be as user friendly as I could make it. It was not created for deepfake purposes but you can give it a try if you want (See LIHQ Examples colab and secondary Demo Video). You will find that some voices or face images will not give your desired output and will take a little trial and error to get right. LIHQ really works best with a stylegan2 face and a simple narrator voice. Creating a simple speaker video with a styleGAN2 face and a simple TorToiSe voice is pretty straightforward and often produces good output.

Update:

Looks like Bark (https://github.com/suno-ai/bark) is now the open source SOTA for speech generation. Try it out instead of TorToiSe.

LIHQ Examples

How it works

Steps you need to take:

  1. Run Setup
  2. Create/ Upload Audio (TorToiSe, VITS)
  3. Upload Speaker Face Image (StyleGAN2 preferred)
  4. (Optional) Add reference video
  5. (Optional) Replace Background

Steps the program takes:

  1. Face/ Head Motion Transfer (FOMM)
  2. Mouth Motion Generation (Wav2Lip)
  3. Upscale and Restoration (GFPGAN)
  4. Second Run Through (FOMM and GFPGAN)
  5. (Optional) Frame Interpolation (QVI) (Noticable improvement but long inference)
  6. (Optional) Background Matting (MODNet)

Pick out an image of a face that is forward-facing with a closed mouth (https://github.com/NVlabs/stylegan2) and upload or create audio of anyone you want using TorToiSe (built into the LIHQ colab) https://github.com/neonbjb/tortoise-tts

LIHQ will first transfer head and eye movement from my default reference video to your face image using a First Order Motion Model. Wav2Lip will then create mouth movement from your audio and paste it onto the FOMM output. Since the output is a very low resolution (256x256) we need to run through a face restoration & super resolution model. Repeating this process a second time will make the video look even better. And if you want it to be the highest quality, you can add frame interpolation at the end to increase the fps.

Demo Video

LIHQ Demo Video

Above is the primary LIHQ demo video, demonstrating the software being used as intended. Deepfakes are possible in LIHQ as well but they take a bit more work. Check out some examples in the Deepfake Example Video.

Colabs

The google colabs have a lot of information, tips and tricks that I will not be putting in the README. LIHQ has a cell for every feature you might want or need with an explanation of everything. It is lengthy and a little cluttered but I recommend reading through everything. LIHQ Examples has four different examples, trimmed down to only what you need for each example.

LIHQ

LIHQ Examples

Possible Future Work

  • Create more reference videos. Lip movement seems to work best at ref_vid_offset = 0 and I don't think this is coincidence. I think it has to do with the FOMM movement transfer. I may create more reference videos of shorter length, and with varying emotions.
  • Make wav2LIP, FOMM optional. If you want to use reference video with correct lip movement, or a speaker video that already contains target speaker.
  • Add randomizer to ref vid offset
  • Expand post-processing capabilities
  • Revise for local use. I'm not sure what it would take. It's probably 95% there, I just simply haven't tried outside of colab.

lihq's People

Contributors

johngettings avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.