GithubHelp home page GithubHelp logo

heygenclone's Introduction

HeyGenClone

Welcome to the HeyGenClone, an open-source analogue of the HeyGen system.

I am a developer from Moscow πŸ‡·πŸ‡Ί who devotes his free time to studying new technologies. The project is in an active development phase, but I hope it will help you achieve your goals!

Currently, translation support is enabled only from English πŸ‡¬πŸ‡§!

Installation πŸ₯Έ

  • Clone this repo
  • Install conda
  • Create environment with Python 3.10 (for macOS refer to link)
  • Activate environment
  • Install requirements:
    cd path_to_project
    sh install.sh
    
  • In config.json file change HF_TOKEN argument. It is your HuggingFace token. Visit speaker-diarization, segmentation and accept user conditions
  • Download weights from drive, unzip downloaded file into weights folder
  • Install ffmpeg

Configurations (config.json) πŸ§™β€β™‚οΈ

Key Description
DET_TRESH Face detection treshtold [0.0:1.0]
DIST_TRESH Face embeddings distance treshtold [0.0:1.0]
HF_TOKEN Your HuggingFace token (see Installation)
USE_ENHANCER Do we need to improve faces using GFPGAN?
ADD_SUBTITLES Subtitles in the output video

Supported languages πŸ™‚

English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu) and Korean (ko)

Usage 🀩

  • Activate your environment:
  conda activate your_env_name
  • Π‘d to project path:
  cd path_to_project

At the root of the project there is a translate script that translates the video you set.

  • video_filename - the filename of your input video (.mp4)
  • output_language - the language to be translated into. Provided here (you can also find it in my code)
  • output_filename - the filename of output video (.mp4)
python translate.py video_filename output_language -o output_filename

I also added a script to overlay the voice on the video with lip sync, which allows you to create a video with a person pronouncing your speech. Π‘urrently it works for videos with one person.

  • voice_filename - the filename of your speech (.wav)
  • video_filename - the filename of your input video (.mp4)
  • output_filename - the filename of output video (.mp4)
python speech_changer.py voice_filename video_filename -o output_filename

How it works 😱

  1. Detecting scenes (PySceneDetect)
  2. Face detection (yolov8-face)
  3. Reidentification (deepface)
  4. Speech enhancement (MDXNet)
  5. Speakers transcriptions and diarization (whisperX)
  6. Text translation (googletrans)
  7. Voice cloning (TTS)
  8. Lip sync (lipsync)
  9. Face restoration (GFPGAN)
  10. [Need to fix] Search for talking faces, determining what this person is saying

Translation results πŸ₯Ί

Note that this example was created without GFPGAN usage!

Destination language Source video Output video
πŸ‡·πŸ‡Ί (Russian) Watch the video Watch the video

Contributors 🫡🏻

To-Do List πŸ€·πŸΌβ€β™‚οΈ

  • Fully GPU support
  • Multithreading support (optimizations)
  • Detecting talking faces (improvement)

Other 🀘🏻

  • Tested on macOS
  • ⚠️ The project is under development!

heygenclone's People

Contributors

brasd99 avatar zellux avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.