GithubHelp home page GithubHelp logo

heygenclone's Introduction

Open Source Love

Welcome 👋

I'm Denis, a 24 years old developer from Russia 🇷🇺

  • 👨‍💻 Currently working at MTS Digital
  • 💼 I love server development and use .NET. But I also use Python for my pet-projects!
  • 🏫 Professional mathematical and ML engineer education
  • 👨‍🏫 Here you will find many of my personal projects, a significant part of which is devoted to the use of neural networks on practical tasks!

You can freely discuss any project with me via any contact link!

Most used languages on GitHub:

heygenclone's People

Contributors

brasd99 avatar zellux avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

heygenclone's Issues

Do you need help?

Репа давно не обновляется, да при том качество алгоритма можно сделать лучше. Поэтому вопрос: нужна помощь?

TypeError: LipSync.sync() missing 1 required positional argument: 'use_enhancer'

python -V
python 3.10.10

python speech_changer.py 2.wav els.mp4 -o out.mp4
E:\Program\python\python3.10.10\lib\site-packages\torchvision\transforms\functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be removed in 0.17. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
warnings.warn(
onnx load done
Processing: 100%|▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒| 1/1 [00:28<00:00, 28.53s/it]
Processing frames: 93it [00:12, 7.70it/s]
Traceback (most recent call last):
File "E:\git\HeyGen\HeyGenClone\speech_changer.py", line 66, in
update_voice(
File "E:\git\HeyGen\HeyGenClone\speech_changer.py", line 46, in update_voice
frames = lip_sync.sync(frames, voice_filename, orig_clip.fps)
TypeError: LipSync.sync() missing 1 required positional argument: 'use_enhancer'

image

视频源音频为英语,想转成中文时,合成后,音频基本上听不了

一直提示“[!] Character '我' not found in the vocabulary. Discarding it. ”,是否有没有中文词典?所以没有合成语音?请帮忙解答一下,谢谢

以下为运行命令及日志:

python translate.py it_cut.mp4 chinese -o it-cn.mp4

tts_models/hak/fairseq/vits is already downloaded.
Setting up Audio Processor...
| > sample_rate:22050
| > resample:False
| > num_mels:80
| > log_func:np.log10
| > min_level_db:0
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:None
| > fft_size:1024
| > power:None
| > preemphasis:0.0
| > griffin_lim_iters:None
| > signal_norm:None
| > symmetric_norm:None
| > mel_fmin:0
| > mel_fmax:None
| > pitch_fmin:None
| > pitch_fmax:None
| > spec_gain:20.0
| > stft_pad_mode:reflect
| > max_norm:1.0
| > clip_norm:True
| > do_trim_silence:False
| > trim_db:60
| > do_sound_norm:False
| > do_amp_to_db_linear:True
| > do_amp_to_db_mel:True
| > do_rms_norm:False
| > db_level:None
| > stats_path:None
| > base:10
| > hop_length:256
| > win_length:1024
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.1. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint ../../.cache/torch/whisperx-vad-segmentation.bin
Model was trained with pyannote.audio 0.0.1, yours is 3.0.0. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.1.0. Bad things might happen unless you revert torch to 1.x.
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.1. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint ../../.cache/torch/pyannote/models--pyannote--segmentation/snapshots/c4c8ceafcbb3a7a280c2d357aee9fbc9b0be7f9b/pytorch_model.bin
Model was trained with pyannote.audio 0.0.1, yours is 3.0.0. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.1.0. Bad things might happen unless you revert torch to 1.x.
onnx load done
onnx load done
Processing: 100%|█████████████████████████████████| 2/2 [00:28<00:00, 14.28s/it]
Detected language: en (1.00) in first 30s of audio...
VideoManager is deprecated and will be removed.
base_timecode argument is deprecated and has no effect.
Face detector [scene_id: 1]: 165it [00:25, 6.52it/s]
Face detector [scene_id: 2]: 104it [00:17, 5.88it/s]
Face detector [scene_id: 3]: 126it [00:05, 24.59it/s]
Face detector [scene_id: 4]: 137it [00:24, 5.58it/s]
Face detector [scene_id: 5]: 83it [00:02, 30.47it/s]
Face detector [scene_id: 6]: 124it [00:22, 5.44it/s]
Face detector [scene_id: 7]: 105it [00:04, 24.33it/s]
Face detector [scene_id: 8]: 79it [00:16, 4.73it/s]
Text splitted to sentences.
['我是來自 FreeCodeCamp.org 的 Beau Carnes,在本課程中,我將向您展示如何使用 AI 來簡化基礎架構和網站的部署。']
我是來自 freecodecamp.org 的 beau carnes,在本課程中,我將向您展示如何使用 ai 來簡化基礎架構和網站的部署。
[!] Character '我' not found in the vocabulary. Discarding it.
我是來自 freecodecamp.org 的 beau carnes,在本課程中,我將向您展示如何使用 ai 來簡化基礎架構和網站的部署。
[!] Character '是' not found in the vocabulary. Discarding it.
我是來自 freecodecamp.org 的 beau carnes,在本課程中,我將向您展示如何使用 ai 來簡化基礎架構和網站的部署。
[!] Character '來' not found in the vocabulary. Discarding it.
我是來自 freecodecamp.org 的 beau carnes,在本課程中,我將向您展示如何使用 ai 來簡化基礎架構和網站的部署。

KeyError: 'text'

Traceback (most recent call last):
File "/home/bhits-003/xcelore/Gen Ai Audio/hey_gen/HeyGenClone-main/speech_changer.py", line 66, in
update_voice(
File "/home/bhits-003/xcelore/Gen Ai Audio/hey_gen/HeyGenClone-main/speech_changer.py", line 47, in update_voice
temp_result_avi = to_avi(frames, orig_clip.fps)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bhits-003/xcelore/Gen Ai Audio/hey_gen/HeyGenClone-main/core/helpers.py", line 61, in to_avi
if frame['text']:
~~~~~^^^^^^^^
i am getting this error when i use voice changer

About the using of scenedetect

Thanks for sharing this wonderful work!. Your work are amazing. I notice that you use scenedetect at the begining of the program. What is the purpose of using the scenedetect. It seems like the process work fine without using the scenedetect. Could you explain the advantage of using it?

Thanks for you attention! Have a nice day

中文的支持

你好,达瓦里希。对于中文的翻译,目前有没有相关的解决方案,方便的话你可以透露一下,我想实现一下啊。谢谢

TypeError: issubclass() arg 1 must be a class

Attaching the entire response received from terminal for better understanding.

(LIP) rishabharora@Rishs-MBP HeyGenClone % python translate.py 'Green Screen.mp4' russian -o 'done.mp4'
Traceback (most recent call last):
File "/Users/rishabharora/Documents/GitHub/HeyGenClone/translate.py", line 5, in
from core.engine import Engine
File "/Users/rishabharora/Documents/GitHub/HeyGenClone/core/engine.py", line 20, in
from core.whisperx.asr import load_model, load_audio
File "/Users/rishabharora/Documents/GitHub/HeyGenClone/core/whisperx/asr.py", line 12, in
from .vad import load_vad_model, merge_chunks
File "/Users/rishabharora/Documents/GitHub/HeyGenClone/core/whisperx/vad.py", line 9, in
from pyannote.audio import Model
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pyannote/audio/init.py", line 29, in
from .core.inference import Inference
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pyannote/audio/core/inference.py", line 36, in
from pyannote.audio.core.model import Model, Specifications
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pyannote/audio/core/model.py", line 47, in
from pyannote.audio.core.task import (
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pyannote/audio/core/task.py", line 43, in
from pyannote.database import Protocol
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pyannote/database/init.py", line 36, in
from .registry import registry, LoadingMode
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pyannote/database/registry.py", line 38, in
from .custom import create_protocol, get_init
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pyannote/database/custom.py", line 66, in
from .loader import load_lst, load_trial
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pyannote/database/loader.py", line 44, in
from spacy.tokens import Token
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/spacy/init.py", line 14, in
from . import pipeline # noqa: F401
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/spacy/pipeline/init.py", line 1, in
from .attributeruler import AttributeRuler
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/spacy/pipeline/attributeruler.py", line 6, in
from .pipe import Pipe
File "spacy/pipeline/pipe.pyx", line 8, in init spacy.pipeline.pipe
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/spacy/training/init.py", line 11, in
from .callbacks import create_copy_from_base_model # noqa: F401
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/spacy/training/callbacks.py", line 3, in
from ..language import Language
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/spacy/language.py", line 25, in
from .training.initialize import init_vocab, init_tok2vec
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/spacy/training/initialize.py", line 14, in
from .pretrain import get_tok2vec_ref
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/spacy/training/pretrain.py", line 16, in
from ..schemas import ConfigSchemaPretrain
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/spacy/schemas.py", line 216, in
class TokenPattern(BaseModel):
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pydantic/main.py", line 299, in new
fields[ann_name] = ModelField.infer(
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pydantic/fields.py", line 411, in infer
return cls(
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pydantic/fields.py", line 342, in init
self.prepare()
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pydantic/fields.py", line 451, in prepare
self._type_analysis()
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pydantic/fields.py", line 545, in _type_analysis
self._type_analysis()
File "/opt/anaconda3/envs/LIP/lib/python3.9/site-packages/pydantic/fields.py", line 550, in _type_analysis
if issubclass(origin, Tuple): # type: ignore
File "/opt/anaconda3/envs/LIP/lib/python3.9/typing.py", line 852, in subclasscheck
return issubclass(cls, self.origin)
TypeError: issubclass() arg 1 must be a class
(LIP) rishabharora@Rishs-MBP HeyGenClone %

about the requirements.txt file

Hello, I try to install the package but there is always a conflict in the requirments file.
For instance, audiostretchy==1.3.5 requires numpy>=1.23 but TTS=0.17.6 requires numpy==1.22.

TTS=0.17.6 will also install the latest torch (now is 2.1.0).

How to solve this issue? thanks in advance.

Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!

i know the issue is the data not in the same device, i have changed serveral times,it did not work.

Model was trained with pyannote.audio 0.0.1, yours is 3.0.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.1.0+cu118. Bad things might happen unless you revert torch to 1.x.
device cuda
onnx load done
onnx load done
Processing:   0%|          | 0/18 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/mnt/sda/github/11yue/HeyGenClone/translate_fxy.py", line 35, in <module>
    translate(
  File "/mnt/sda/github/11yue/HeyGenClone/translate_fxy.py", line 12, in translate
    engine(video_filename, output_filename)
  File "/mnt/sda/github/11yue/HeyGenClone/core/engine.py", line 60, in __call__
    dereverb_out = self.dereverb.split(original_audio_file)
  File "/mnt/sda/github/11yue/HeyGenClone/core/dereverb.py", line 233, in split
    return self.pred.prediction(input)
  File "/mnt/sda/github/11yue/HeyGenClone/core/dereverb.py", line 201, in prediction
    sources = self.demix(mix.T)
  File "/mnt/sda/github/11yue/HeyGenClone/core/dereverb.py", line 126, in demix
    sources = self.demix_base(segmented_mix, margin_size=margin)
  File "/mnt/sda/github/11yue/HeyGenClone/core/dereverb.py", line 168, in demix_base
    tar_waves = model.istft(torch.tensor(spec_pred))
  File "/mnt/sda/github/11yue/HeyGenClone/core/dereverb.py", line 60, in istft
    x = torch.cat([x, freq_pad], -2)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat)
Processing:   0%|          | 0/18 [00:04<?, ?it/s]

Need to Add Streaming Feature for Avatar

My task is to Stream the avatar using three.js , but I am not able to do so, please help with it, How can i make this pipeline with minimun latency so that user can converse with my AI avatar

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Hi, I ran the program successfully according to the tutorial, but it has the following problem:
Traceback (most recent call last):
File "/root/autodl-tmp/HeyGenClone/translate.py", line 44, in
translate(
File "/root/autodl-tmp/HeyGenClone/translate.py", line 34, in translate
engine(video_filename, output_filename)
File "/root/autodl-tmp/HeyGenClone/core/engine.py", line 50, in call
dereverb_out = self.dereverb.split(original_audio_file)
File "/root/autodl-tmp/HeyGenClone/core/dereverb.py", line 225, in split
return self.pred.prediction(input)
File "/root/autodl-tmp/HeyGenClone/core/dereverb.py", line 194, in prediction
sources = self.demix(mix.T)
File "/root/autodl-tmp/HeyGenClone/core/dereverb.py", line 122, in demix
sources = self.demix_base(segmented_mix, margin_size=margin)
File "/root/autodl-tmp/HeyGenClone/core/dereverb.py", line 156, in demix_base
input_data_1 = -spek.cuda().numpy() if self.device.type == 'cuda' else -spek.cpu().numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

My environmental information is below:
Ubuntu 20.04
python 3.9
nvidia 3080

ValueError: You have tensorflow 2.16.1 and this requires tf-keras package

After following the install instructions I get following error:

ValueError: You have tensorflow 2.16.1 and this requires tf-keras package. Please run pip install tf-keras or downgrade your tensorflow.

After pip install tf-keras I get the following:

ImportError: cannot import name 'distance' from 'deepface.commons'

Anybody knows what I am doing wrong?

About voice tools

Need some recommend of this. Definitely coqui-ai is not a good choice. Maybe sovits could be one. VaLLE needs further development or support.

Regarding the issue of synchronizing human voice audio

Hello, I have a question about a difficulty I'm encountering in the process of replicating the HeyGen function. The core steps currently include text translation and Text-to-Speech (TTS). I see that the project is using the Google translation engine, but let's ignore the translation accuracy for now. My question is: since the length of the text will vary after translation due to different languages, this will also result in inconsistent lengths of human voice audio after TTS dubbing. When I need to keep the duration of the final output video consistent with the original video, how can I solve the problem of matching the dubbed human voice audio with the original video footage synchronously? Do you have any suggested methods?

AttributeError: 'SpeakerDiarization' object has no attribute 'to'`

i find the same issue feat: send pipeline to device with Pipeline.to(device)

Using device: cuda
model_name is pyannote/[email protected]
Traceback (most recent call last):
  File "/mnt/sda/github/11yue/HeyGenClone/translate_fxy.py", line 47, in <module>
    translate(
  File "/mnt/sda/github/11yue/HeyGenClone/translate_fxy.py", line 33, in translate
    engine = Engine(config, output_language)
  File "/mnt/sda/github/11yue/HeyGenClone/core/engine.py", line 38, in __init__
    self.diarize_model = DiarizationPipeline(use_auth_token=config['HF_TOKEN'], device=self.device)
  File "/mnt/sda/github/11yue/HeyGenClone/core/whisperx/diarize.py", line 21, in __init__
    self.model = self.model.to(device)
  File "/opt/miniconda3/envs/heygenclone/lib/python3.10/site-packages/pyannote/pipeline/pipeline.py", line 100, in __getattr__
    raise AttributeError(msg)
AttributeError: 'SpeakerDiarization' object has no attribute 'to'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.