brasd99 / heygenclone Goto Github PK

View Code? Open in Web Editor NEW

817.0 817.0 165.0 243 KB

A simple and open-source analogue of the HeyGen system

Python 99.59% Shell 0.41%

heygenclone's Introduction

Welcome 👋

I'm Denis, a 24 years old developer from Russia 🇷🇺

👨‍💻 Currently working at MTS Digital
💼 I love server development and use .NET. But I also use Python for my pet-projects!
🏫 Professional mathematical and ML engineer education
👨‍🏫 Here you will find many of my personal projects, a significant part of which is devoted to the use of neural networks on practical tasks!

You can freely discuss any project with me via any contact link!

Most used languages on GitHub:

heygenclone's People

Contributors

Stargazers

Watchers

Forkers

objone ab2021 samge0 haoyuanchi baifengbai wanghao-007 levicreate ringwraith stefanwan-durham quantjia skylord2 jonatast biejixu gj199575 yueyedeai geekcheng pangdahaino1 hito0512 akeboshi1 codingrockz zhangziliang04 maxmax2016 liujingxiu23 great1001 masterbc sheepyang1993 whitefu bingov yunnewh bikong2 tuongphuong e1ektr0 chriscohoat apollohuang1 fengyunzaidushi yl17104265 assassindesign prog-ape tsiojeft chenbinghui1 disini lbyiuou0329 zellux liuchaoxd hkokarcali newoneincntk lierba64 roflyx lihuibng donghl guos oneliao iamleon121 mrwangio likethis85 huangyingting leing2021 zhangyuereal gisealh yangfengkaust zhangbo2008 adwpc nightwhite ylz201 lycsqq arnofrost keyzf greatwildfire meogoo pustar diolps diegobona coderzzp no1welldone yllstudy zhangxt toucari momomocheng iluckyboy007 xiao2mo javamario laid-backprogrammer jackstephen hzwutong openai1998 zouhuigang neargostudio fromparis shivamsinha15 jaman21 dingtim left11 ptcc nnnnai etemical xiaofengc medalcollector laure1102 gayatrivadaparty siliconlife

heygenclone's Issues

Do you need help?

Репа давно не обновляется, да при том качество алгоритма можно сделать лучше. Поэтому вопрос: нужна помощь?

TypeError: LipSync.sync() missing 1 required positional argument: 'use_enhancer'

python -V
python 3.10.10

python speech_changer.py 2.wav els.mp4 -o out.mp4
E:\Program\python\python3.10.10\lib\site-packages\torchvision\transforms\functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be removed in 0.17. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
warnings.warn(
onnx load done
Processing: 100%|▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒| 1/1 [00:28<00:00, 28.53s/it]
Processing frames: 93it [00:12, 7.70it/s]
Traceback (most recent call last):
File "E:\git\HeyGen\HeyGenClone\speech_changer.py", line 66, in
update_voice(
File "E:\git\HeyGen\HeyGenClone\speech_changer.py", line 46, in update_voice
frames = lip_sync.sync(frames, voice_filename, orig_clip.fps)
TypeError: LipSync.sync() missing 1 required positional argument: 'use_enhancer'

视频源音频为英语，想转成中文时，合成后，音频基本上听不了

一直提示“[!] Character '我' not found in the vocabulary. Discarding it. ”，是否有没有中文词典？所以没有合成语音？请帮忙解答一下，谢谢

以下为运行命令及日志：

python translate.py it_cut.mp4 chinese -o it-cn.mp4

tts_models/hak/fairseq/vits is already downloaded.
Setting up Audio Processor...
| > sample_rate:22050
| > resample:False
| > num_mels:80
| > log_func:np.log10
| > min_level_db:0
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:None
| > fft_size:1024
| > power:None
| > preemphasis:0.0
| > griffin_lim_iters:None
| > signal_norm:None
| > symmetric_norm:None
| > mel_fmin:0
| > mel_fmax:None
| > pitch_fmin:None
| > pitch_fmax:None
| > spec_gain:20.0
| > stft_pad_mode:reflect
| > max_norm:1.0
| > clip_norm:True
| > do_trim_silence:False
| > trim_db:60
| > do_sound_norm:False
| > do_amp_to_db_linear:True
| > do_amp_to_db_mel:True
| > do_rms_norm:False
| > db_level:None
| > stats_path:None
| > base:10
| > hop_length:256
| > win_length:1024
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.1. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint ../../.cache/torch/whisperx-vad-segmentation.bin
Model was trained with pyannote.audio 0.0.1, yours is 3.0.0. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.1.0. Bad things might happen unless you revert torch to 1.x.
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.1. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint ../../.cache/torch/pyannote/models--pyannote--segmentation/snapshots/c4c8ceafcbb3a7a280c2d357aee9fbc9b0be7f9b/pytorch_model.bin
Model was trained with pyannote.audio 0.0.1, yours is 3.0.0. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.1.0. Bad things might happen unless you revert torch to 1.x.
onnx load done
onnx load done
Processing: 100%|█████████████████████████████████| 2/2 [00:28<00:00, 14.28s/it]
Detected language: en (1.00) in first 30s of audio...
VideoManager is deprecated and will be removed.
base_timecode argument is deprecated and has no effect.
Face detector [scene_id: 1]: 165it [00:25, 6.52it/s]
Face detector [scene_id: 2]: 104it [00:17, 5.88it/s]
Face detector [scene_id: 3]: 126it [00:05, 24.59it/s]
Face detector [scene_id: 4]: 137it [00:24, 5.58it/s]
Face detector [scene_id: 5]: 83it [00:02, 30.47it/s]
Face detector [scene_id: 6]: 124it [00:22, 5.44it/s]
Face detector [scene_id: 7]: 105it [00:04, 24.33it/s]
Face detector [scene_id: 8]: 79it [00:16, 4.73it/s]
Text splitted to sentences.
['我是來自 FreeCodeCamp.org 的 Beau Carnes，在本課程中，我將向您展示如何使用 AI 來簡化基礎架構和網站的部署。']
我是來自 freecodecamp.org 的 beau carnes，在本課程中，我將向您展示如何使用 ai 來簡化基礎架構和網站的部署。
[!] Character '我' not found in the vocabulary. Discarding it.
我是來自 freecodecamp.org 的 beau carnes，在本課程中，我將向您展示如何使用 ai 來簡化基礎架構和網站的部署。
[!] Character '是' not found in the vocabulary. Discarding it.
我是來自 freecodecamp.org 的 beau carnes，在本課程中，我將向您展示如何使用 ai 來簡化基礎架構和網站的部署。
[!] Character '來' not found in the vocabulary. Discarding it.
我是來自 freecodecamp.org 的 beau carnes，在本課程中，我將向您展示如何使用 ai 來簡化基礎架構和網站的部署。

KeyError: 'text'

Traceback (most recent call last):
File "/home/bhits-003/xcelore/Gen Ai Audio/hey_gen/HeyGenClone-main/speech_changer.py", line 66, in
update_voice(
File "/home/bhits-003/xcelore/Gen Ai Audio/hey_gen/HeyGenClone-main/speech_changer.py", line 47, in update_voice
temp_result_avi = to_avi(frames, orig_clip.fps)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bhits-003/xcelore/Gen Ai Audio/hey_gen/HeyGenClone-main/core/helpers.py", line 61, in to_avi
if frame['text']:
~~~~~^^^^^^^^
i am getting this error when i use voice changer

About the using of scenedetect

Thanks for sharing this wonderful work!. Your work are amazing. I notice that you use scenedetect at the begining of the program. What is the purpose of using the scenedetect. It seems like the process work fine without using the scenedetect. Could you explain the advantage of using it?

Thanks for you attention! Have a nice day

How many gpu do i need

中文的支持

你好，达瓦里希。对于中文的翻译，目前有没有相关的解决方案，方便的话你可以透露一下，我想实现一下啊。谢谢

TypeError: issubclass() arg 1 must be a class

Attaching the entire response received from terminal for better understanding.

(LIP) rishabharora@Rishs-MBP HeyGenClone % python translate.py 'Green Screen.mp4' russian -o 'done.mp4'
Traceback (most recent call last):
File "/Users/rishabharora/Documents/GitHub/HeyGenClone/translate.py", line 5, in
from core.engine import Engine
File "/Users/rishabharora/Documents/GitHub/HeyGenClone/core/engine.py", line 20, in
from core.whisperx.asr import load_model, load_audio
File "/Users/rishabharora/Documents/GitHub/HeyGenClone/core/whisperx/asr.py", line 12, in
from .vad import load_vad_model, merge_chunks
File "/Users/rishabharora/Documents/GitHub/HeyGenClone/core/whisperx/vad.py", line 9, in
from pyannote.audio import Model
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pyannote/audio/init.py", line 29, in
from .core.inference import Inference
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pyannote/audio/core/inference.py", line 36, in
from pyannote.audio.core.model import Model, Specifications
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pyannote/audio/core/model.py", line 47, in
from pyannote.audio.core.task import (
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pyannote/audio/core/task.py", line 43, in
from pyannote.database import Protocol
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pyannote/database/init.py", line 36, in
from .registry import registry, LoadingMode
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pyannote/database/registry.py", line 38, in
from .custom import create_protocol, get_init
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pyannote/database/custom.py", line 66, in
from .loader import load_lst, load_trial
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pyannote/database/loader.py", line 44, in
from spacy.tokens import Token
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/spacy/init.py", line 14, in
from . import pipeline # noqa: F401
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/spacy/pipeline/init.py", line 1, in
from .attributeruler import AttributeRuler
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/spacy/pipeline/attributeruler.py", line 6, in
from .pipe import Pipe
File "spacy/pipeline/pipe.pyx", line 8, in init spacy.pipeline.pipe
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/spacy/training/init.py", line 11, in
from .callbacks import create_copy_from_base_model # noqa: F401
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/spacy/training/callbacks.py", line 3, in
from ..language import Language
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/spacy/language.py", line 25, in
from .training.initialize import init_vocab, init_tok2vec
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/spacy/training/initialize.py", line 14, in
from .pretrain import get_tok2vec_ref
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/spacy/training/pretrain.py", line 16, in
from ..schemas import ConfigSchemaPretrain
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/spacy/schemas.py", line 216, in
class TokenPattern(BaseModel):
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pydantic/main.py", line 299, in new
fields[ann_name] = ModelField.infer(
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pydantic/fields.py", line 411, in infer
return cls(
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pydantic/fields.py", line 342, in init
self.prepare()
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pydantic/fields.py", line 451, in prepare
self._type_analysis()
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pydantic/fields.py", line 545, in _type_analysis
self._type_analysis()
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pydantic/fields.py", line 550, in _type_analysis
if issubclass(origin, Tuple): # type: ignore
File "/opt/anaconda3/envs/LIP/lib/python3.9/typing.py", line 852, in subclasscheck
return issubclass(cls, self.origin)
TypeError: issubclass() arg 1 must be a class
(LIP) rishabharora@Rishs-MBP HeyGenClone %

about the requirements.txt file

Hello, I try to install the package but there is always a conflict in the requirments file.
For instance, audiostretchy==1.3.5 requires numpy>=1.23 but TTS=0.17.6 requires numpy==1.22.

TTS=0.17.6 will also install the latest torch (now is 2.1.0).

How to solve this issue? thanks in advance.

Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!

i know the issue is the data not in the same device, i have changed serveral times,it did not work.

Model was trained with pyannote.audio 0.0.1, yours is 3.0.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.1.0+cu118. Bad things might happen unless you revert torch to 1.x.
device cuda
onnx load done
onnx load done
Processing:   0%|          | 0/18 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/mnt/sda/github/11yue/HeyGenClone/translate_fxy.py", line 35, in <module>
    translate(
  File "/mnt/sda/github/11yue/HeyGenClone/translate_fxy.py", line 12, in translate
    engine(video_filename, output_filename)
  File "/mnt/sda/github/11yue/HeyGenClone/core/engine.py", line 60, in __call__
    dereverb_out = self.dereverb.split(original_audio_file)
  File "/mnt/sda/github/11yue/HeyGenClone/core/dereverb.py", line 233, in split
    return self.pred.prediction(input)
  File "/mnt/sda/github/11yue/HeyGenClone/core/dereverb.py", line 201, in prediction
    sources = self.demix(mix.T)
  File "/mnt/sda/github/11yue/HeyGenClone/core/dereverb.py", line 126, in demix
    sources = self.demix_base(segmented_mix, margin_size=margin)
  File "/mnt/sda/github/11yue/HeyGenClone/core/dereverb.py", line 168, in demix_base
    tar_waves = model.istft(torch.tensor(spec_pred))
  File "/mnt/sda/github/11yue/HeyGenClone/core/dereverb.py", line 60, in istft
    x = torch.cat([x, freq_pad], -2)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat)
Processing:   0%|          | 0/18 [00:04<?, ?it/s]

i need help with hf_token

I've already set up my hf_token,but there is the following error.

I set two points

Need to Add Streaming Feature for Avatar

My task is to Stream the avatar using three.js , but I am not able to do so, please help with it, How can i make this pipeline with minimun latency so that user can converse with my AI avatar

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Hi, I ran the program successfully according to the tutorial, but it has the following problem:
Traceback (most recent call last):
File "/root/autodl-tmp/HeyGenClone/translate.py", line 44, in
translate(
File "/root/autodl-tmp/HeyGenClone/translate.py", line 34, in translate
engine(video_filename, output_filename)
File "/root/autodl-tmp/HeyGenClone/core/engine.py", line 50, in call
dereverb_out = self.dereverb.split(original_audio_file)
File "/root/autodl-tmp/HeyGenClone/core/dereverb.py", line 225, in split
return self.pred.prediction(input)
File "/root/autodl-tmp/HeyGenClone/core/dereverb.py", line 194, in prediction
sources = self.demix(mix.T)
File "/root/autodl-tmp/HeyGenClone/core/dereverb.py", line 122, in demix
sources = self.demix_base(segmented_mix, margin_size=margin)
File "/root/autodl-tmp/HeyGenClone/core/dereverb.py", line 156, in demix_base
input_data_1 = -spek.cuda().numpy() if self.device.type == 'cuda' else -spek.cpu().numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

My environmental information is below:
Ubuntu 20.04
python 3.9
nvidia 3080

What is the RAM requirement to run a 30 sec clip?

My Mac only has 16G and cannot comfortably finish the whole run (with process meter 100%). My question is: what is the memory requirement to run this project?
Thanks!

Can the model download support resumable transfer?

It seems that the model will be relatively large, and if the download is interrupted and re executed, it will start downloading again

Getting killed after onnx is loaded

The code runs file till it loads onnx. After that is abruptly gets killed.

Getting killed after onnx is loaded

The code runs fine until the onnx is loaded.Then it gets killed abruptly.

When converting to Chinese, subtitle is abnormal

When converting to Chinese, subtitle is full of question mark. It seems the encoding is incorrect?

two numpy version

ValueError: You have tensorflow 2.16.1 and this requires tf-keras package

After following the install instructions I get following error:

ValueError: You have tensorflow 2.16.1 and this requires tf-keras package. Please run pip install tf-keras or downgrade your tensorflow.

After pip install tf-keras I get the following:

ImportError: cannot import name 'distance' from 'deepface.commons'

Anybody knows what I am doing wrong?

About voice tools

Need some recommend of this. Definitely coqui-ai is not a good choice. Maybe sovits could be one. VaLLE needs further development or support.

Regarding the issue of synchronizing human voice audio

Hello, I have a question about a difficulty I'm encountering in the process of replicating the HeyGen function. The core steps currently include text translation and Text-to-Speech (TTS). I see that the project is using the Google translation engine, but let's ignore the translation accuracy for now. My question is: since the length of the text will vary after translation due to different languages, this will also result in inconsistent lengths of human voice audio after TTS dubbing. When I need to keep the duration of the final output video consistent with the original video, how can I solve the problem of matching the dubbed human voice audio with the original video footage synchronously? Do you have any suggested methods?

AttributeError: 'SpeakerDiarization' object has no attribute 'to'`

i find the same issue feat: send pipeline to device with Pipeline.to(device)

Using device: cuda
model_name is pyannote/[email protected]
Traceback (most recent call last):
  File "/mnt/sda/github/11yue/HeyGenClone/translate_fxy.py", line 47, in <module>
    translate(
  File "/mnt/sda/github/11yue/HeyGenClone/translate_fxy.py", line 33, in translate
    engine = Engine(config, output_language)
  File "/mnt/sda/github/11yue/HeyGenClone/core/engine.py", line 38, in __init__
    self.diarize_model = DiarizationPipeline(use_auth_token=config['HF_TOKEN'], device=self.device)
  File "/mnt/sda/github/11yue/HeyGenClone/core/whisperx/diarize.py", line 21, in __init__
    self.model = self.model.to(device)
  File "/opt/miniconda3/envs/heygenclone/lib/python3.10/site-packages/pyannote/pipeline/pipeline.py", line 100, in __getattr__
    raise AttributeError(msg)
AttributeError: 'SpeakerDiarization' object has no attribute 'to'