GithubHelp home page GithubHelp logo

r3gm / sonitranslate Goto Github PK

View Code? Open in Web Editor NEW
411.0 11.0 92.0 20.04 MB

Synchronized Translation for Videos. Video dubbing

License: Apache License 2.0

Python 84.11% Jupyter Notebook 15.89%
audio-processing diarization translation translate-audio translate-video video-dubbing asr automatic-dubbing document-translator dubbing

sonitranslate's Introduction

🎥 SoniTranslate 🈷️

🎬 Video Translation with Synchronized Audio 🌐

SonyTranslate is a powerful and user-friendly web application that allows you to easily translate videos into different languages. This repository hosts the code for the SonyTranslate web UI, which is built with the Gradio library to provide a seamless and interactive user experience.

Description Link
📙 Colab Notebook Open In Colab
🎉 Repository GitHub Repository
🚀 Online DEMO Hugging Face Spaces

SonyTranslate's web UI, which features a browser interface built on the Gradio library.

image

Using the project: A video guide

For a comprehensive understanding of the project, we highly recommend watching this video tutorial by DEV-MalletteS. You can watch it on YouTube by clicking the thumbnail below:

Watch the video

Supported languages for translation

Language Code Language
en English
fr French
de German
es Spanish
it Italian
ja Japanese
nl Dutch
uk Ukrainian
pt Portuguese
ar Arabic
zh Chinese - Simplified
zh-TW Chinese - Traditional
cs Czech
da Danish
fi Finnish
el Greek
he Hebrew
hu Hungarian
ko Korean
fa Persian
pl Polish
ru Russian
tr Turkish
ur Urdu
hi Hindi
vi Vietnamese
id Indonesian
bn Bengali
te Telugu
mr Marathi
ta Tamil
jw (or jv) Javanese
ca Catalan
ne Nepali
th Thai
sv Swedish
am Amharic
cy Welsh
hr Croatian
is Icelandic
ka Georgian
km Khmer
sk Slovak
sq Albanian
sr Serbian
az Azerbaijani
bg Bulgarian
gl Galician
gu Gujarati
kk Kazakh
kn Kannada
lt Lithuanian
lv Latvian
ml Malayalam
ro Romanian
si Sinhala
su Sundanese
et Estonian
mk Macedonian
sw Swahili
af Afrikaans
bs Bosnian
la Latin
my Myanmar Burmese
no Norwegian
as Assamese
eu Basque
ha Hausa
ht Haitian Creole
hy Armenian
lo Lao
mg Malagasy
mn Mongolian
mt Maltese
pa Punjabi
ps Pashto
sl Slovenian
sn Shona
so Somali
tg Tajik
tk Turkmen
tt Tatar
uz Uzbek
yo Yoruba

Non-transcription

Language Code Language
ay Aymara
bm Bambara
ceb Cebuano
ny Chichewa
dv Divehi
doi Dogri
ee Ewe
gn Guarani
ilo Iloko
rw Kinyarwanda
kri Krio
ku Kurdish
ky Kirghiz
lg Ganda
mai Maithili
or Oriya
om Oromo
qu Quechua
sm Samoan
ti Tigrinya
ts Tsonga
ak Akan
ug Uighur

Example:

Original audio

Video_t.mp4

Translated audio

video_dub.mp4

Colab Runtime

To run SoniTranslate using Colab Runtime: Open In Colab

Install Locally (Installation tested in Linux)

Before You Start

Before you start installing and using SoniTranslate, there are a few things you need to do:

  1. Install the NVIDIA drivers for CUDA 11.8.0, NVIDIA CUDA is a parallel computing platform and programming model that enables developers to use the power of NVIDIA graphics processing units (GPUs) to speed up compute-intensive tasks. You can find the drivers here. Follow the instructions on the website to download and install the drivers.
  2. Accept the license agreement for using Pyannote. You need to have an account on Hugging Face and accept the license to use the models: https://huggingface.co/pyannote/speaker-diarization and https://huggingface.co/pyannote/segmentation
  3. Create a huggingface token. Hugging Face is a natural language processing platform that provides access to state-of-the-art models and tools. You will need to create a token in order to use some of the automatic model download features in SoniTranslate. Follow the instructions on the Hugging Face website to create a token. When you are creating the new Access Token in Hugging Face, make sure to tick "Read access to contents of all public gated repos you can access".
  4. Install Anaconda or Miniconda. Anaconda is a free and open-source distribution of Python and R. It includes a package manager called conda that makes it easy to install and manage Python environments and packages. Follow the instructions on the Anaconda website to download and install Anaconda on your system.
  5. Install Git for your system. Git is a version control system that helps you track changes to your code and collaborate with other developers. You can install Git with Anaconda by running conda install -c anaconda git -y in your terminal (Do this after step 1 in the following section.). If you have trouble installing Git via Anaconda, you can use the following link instead:

Once you have completed these steps, you will be ready to install SoniTranslate.

Getting Started

To install SoniTranslate, follow these steps:

  1. Create a suitable anaconda environment for SoniTranslate and activate it:
conda create -n sonitr python=3.10 -y
conda activate sonitr
python -m pip install pip==23.1.2
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
  1. Clone this github repository and navigate to it:
git clone https://github.com/r3gm/SoniTranslate.git
cd SoniTranslate
  1. Install required packages:
pip install -r requirements_base.txt -v
pip install -r requirements_extra.txt -v
pip install onnxruntime-gpu
  1. Install ffmpeg. FFmpeg is a free software project that produces libraries and programs for handling multimedia data. You will need it to process audio and video files. You can install ffmpeg with Anaconda by running conda install -y ffmpeg in your terminal (recommended). If you have trouble installing ffmpeg via Anaconda, you can use the following link instead: (https://ffmpeg.org/ffmpeg.html). Once it is installed, make sure it is in your PATH by running ffmpeg -h in your terminal. If you don't get an error message, you're good to go.

  2. Optional install:

After installing FFmpeg, you can install these optional packages.

Piper TTS is a fast, local neural text to speech system that sounds great and is optimized for the Raspberry Pi 4. Piper is used in a variety of projects. Voices are trained with VITS and exported to the onnxruntime.

pip install -q piper-tts==1.2.0

Coqui XTTS is a text-to-speech (TTS) model that lets you generate realistic voices in different languages. It can clone voices with just a short audio clip, even speak in a different language! It's like having a personal voice mimic for any text you need spoken.

pip install -q -r requirements_xtts.txt
pip install -q TTS==0.21.1  --no-deps

Running SoniTranslate

To run SoniTranslate locally, make sure the sonitr conda environment is active:

conda activate sonitr

Setting your Hugging Face token as an environment variable in Linux:

export YOUR_HF_TOKEN="YOUR_HUGGING_FACE_TOKEN"

Then navigate to the SoniTranslate folder and run either the app_rvc.py

python app_rvc.py

When the local URL http://127.0.0.1:7860 is displayed in the terminal, simply open this URL in your web browser to access the SoniTranslate interface.

Stop and close SoniTranslate.

In most environments, you can stop the execution by pressing Ctrl+C in the terminal where you launched the script app_rvc.py. This will interrupt the program and stop the Gradio app. To deactivate the Conda environment, you can use the following command:

conda deactivate

This will deactivate the currently active Conda environment sonitr, and you'll return to the base environment or the global Python environment.

Starting Over

If you need to start over from scratch, you can delete the SoniTranslate folder and remove the sonitr conda environment with the following set of commands:

conda deactivate
conda env remove -n sonitr

With the sonitr environment removed, you can start over with a fresh installation.

Notes

  • Alternatively, you can set your Hugging Face token as a permanent environment variable with:
conda activate sonitr
conda env config vars set YOUR_HF_TOKEN="YOUR_HUGGING_FACE_TOKEN_HERE"
conda deactivate
  • To use OpenAI's GPT API for translation, tts or transcription, set up your OpenAI API key as an environment variable in quotes:
conda activate sonitr
conda env config vars set OPENAI_API_KEY="your-api-key-here"
conda deactivate

Command line arguments

The app_rvc.py script supports command-line arguments to customize its behavior. Here's a brief guide on how to use them:

Argument command Default Value Description
--theme Taithrah/Minimal String Sets the theme for the interface. Themes can be found in the Theme Gallery.
--language english String Selects the interface language. Available options: afrikaans, arabic, azerbaijani, chinese_zh_cn, english, french, german, hindi, indonesian, italian, japanese, korean, marathi, persian, polish, portuguese, russian, spanish, swedish, turkish, ukrainian, vietnamese.
--verbosity_level info String Sets the verbosity level of the logger: debug, info, warning, error, or critical.
--public_url Boolean Enables a public link.
--cpu_mode Boolean Enable CPU mode to run the program without utilizing GPU acceleration.
--logs_in_gui Boolean Shows the operations performed in Logs (obsolete).

Example usage:

python app_rvc.py --theme aliabid94/new-theme --language french

This command sets the theme to a custom theme and selects French as the interface language. Feel free to customize these arguments according to your preferences and requirements.

📖 News

🔥 2024/18/05: New Update Details

  • Added option Overlap Reduction
  • OpenAI API Key Integration for Transcription, translation, and TTS
  • More output types: subtitles by speaker, separate audio sound, and video only with subtitles
  • Access to a better-performing version of Whisper for transcribing speech on the Hugging Face Whisper page. Copy the repository ID and paste it into the 'Whisper ASR model' section in 'Advanced Settings'; e.g., kotoba-tech/kotoba-whisper-v1.1 for Japanese transcription available here
  • Support for ASS subtitles and batch processing with subtitles
  • Vocal enhancement before transcription
  • Added CPU mode with app_rvc.py --cpu_mode
  • TTS now supports up to 12 speakers
  • OpenVoiceV2 integration for voice imitation
  • PDF to videobook (displays images from the PDF)
  • GUI language translation in Persian and Afrikaans
  • New Language Support:
    • Complete support: Estonian, Macedonian, Malay, Swahili, Afrikaans, Bosnian, Latin, Myanmar Burmese, Norwegian, Traditional Chinese, Assamese, Basque, Hausa, Haitian Creole, Armenian, Lao, Malagasy, Mongolian, Maltese, Punjabi, Pashto, Slovenian, Shona, Somali, Tajik, Turkmen, Tatar, Uzbek, and Yoruba
    • Non-transcription: Aymara, Bambara, Cebuano, Chichewa, Divehi, Dogri, Ewe, Guarani, Iloko, Kinyarwanda, Krio, Kurdish, Kirghiz, Ganda, Maithili, Oriya, Oromo, Quechua, Samoan, Tigrinya, Tsonga, Akan, and Uighur

🔥 2024/03/02: Preserve file names in output. Multiple archives can now be submitted simultaneously by specifying their paths, directories or URLs separated by commas. Processing of a full YouTube playlist. About supported sites URL, please be aware that not all sites may work optimally. Added option for disabling diarization. Implemented soft subtitles. Format output (MP3, MP4, MKV, WAV, and OGG), and resolved issues related to file reading and diarization.

🔥 2024/02/22: Added freevc for voice imitation, fixed voiceless track, divide segments. New languages support (Swedish, Amharic, Welsh, Croatian, Icelandic, Georgian, Khmer, Slovak, Albanian, Serbian, Azerbaijani, Bulgarian, Galician, Gujarati, Kazakh, Kannada, Lithuanian, Latvian, Malayalam, Romanian, Sinhala and Sundanese). New translations of the GUI (Spanish, French, German, Italian, Japanese, Chinese Simplified, Ukrainian, Arabic, Russian, Turkish, Indonesian, Portuguese, Hindi, Vietnamese, Polish, Swedish, Korean, Marathi and Azerbaijani). With subtitle file, no align and the media file is not needed to process the SRT file. Burn subtitles to video. Queue can accept multiple tasks simultaneously. Sound alert notification. Continue process from last checkpoint. Acceleration rate regulation.

🔥 2024/01/16: Expanded language support (Thai, Nepali, Catalan, Javanese, Tamil, Marathi, Telugu, Bengali and Indonesian), the introduction of whisper large v3, configurable GUI options, integration of BARK, Facebook-mms, Coqui XTTS, and Piper-TTS. Additional features included audio separation utilities, XTTS WAV creation, use an SRT file as a base for translation, document translation, manual speaker editing, and flexible output options (video, audio, subtitles).

🔥 2023/10/29: Edit the translated subtitle, download it, adjust volume and speed options.

🔥 2023/08/03: Changed default options and added directory view of downloads.

🔥 2023/08/02: Added support for Arabic, Czech, Danish, Finnish, Greek, Hebrew, Hungarian, Korean, Persian, Polish, Russian, Turkish, Urdu, Hindi, and Vietnamese languages. 🌐

🔥 2023/08/01: Add options for use RVC models.

🔥 2023/07/27: Fix some bug processing the video and audio.

🔥 2023/07/26: New UI and add mix options.

Contributing

Welcome to contributions from the community! If you have any ideas, bug reports, or feature requests, please open an issue or submit a pull request. For more information, please refer to the contribution guidelines.

Credits

This project leverages a number of open-source projects. We would like to acknowledge and thank the contributors of the following repositories:

License

Although the code is licensed under Apache 2, the models or weights may have commercial restrictions, as seen with pyannote diarization.

sonitranslate's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sonitranslate's Issues

rmvpe+

Hi, sorry for posting here, you don't respond to huggingface. I have a question about rmvpe+. Where did rmvpe+ come from? I can't find anything about it on the Internet. Where can I download it for local use? It works very well

API Keys

Hello, I'm running sonitranslate on Windows (anaconda), and I have a problem because I don't know where to put the OpenAi API, and is there an option to save it permanently somewhere in a file? the same with the HF token, would someone be kind enough to suggest how to deal with it?

error on M1 Mac when creating SRT file from audio file.

I'm running a local install on my M1 Mac Air. I'm getting this error when trying to create an SRT file from an audio file.

"Error
Model has been downloaded but the SHA256 checksum does not not match. Please retry loading the model."

Here's my log `(sonitr) userName@MacBook-Air SoniTranslate % python app_rvc.py
objc[5317]: Class AVFFrameReceiver is implemented in both /Users/userName/anaconda3/envs/sonitr/lib/libavdevice.58.8.100.dylib (0x122394798) and /Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/av/.dylibs/libavdevice.60.1.100.dylib (0x179a54760). One of the two will be used. Which one is undefined.
objc[5317]: Class AVFAudioReceiver is implemented in both /Users/userName/anaconda3/envs/sonitr/lib/libavdevice.58.8.100.dylib (0x1223947e8) and /Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/av/.dylibs/libavdevice.60.1.100.dylib (0x179a547b0). One of the two will be used. Which one is undefined.
[INFO] >> Working in: cpu
[WARNING] >> No module named 'piper'
[INFO] >> PIPER TTS disabled
[INFO] >> Coqui XTTS enabled
[INFO] >> In this app, by using Coqui TTS (text-to-speech), you acknowledge and agree to the license.
You confirm that you have read, understood, and agreed to the Terms and Conditions specified at the following link:
https://coqui.ai/cpml.txt.
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
[WARNING] >> Make sure to select a 'TTS Speaker' suitable for the translation language to avoid errors with the TTS.
[INFO] >> Cache flushed
[INFO] >> Processing audio...
[INFO] >> Transcribing...
Traceback (most recent call last):
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/route_utils.py", line 235, in call_process_api
output = await app.get_blocks().process_api(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/blocks.py", line 1627, in process_api
result = await self.call_function(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/blocks.py", line 1173, in call_function
prediction = await anyio.to_thread.run_sync(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, *args)
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/utils.py", line 690, in wrapper
response = f(*args, **kwargs)
File "/Users/userName/Documents/projects/vsCode projects/soni_translate/SoniTranslate/app_rvc.py", line 436, in multilingual_media_conversion
audio, self.result = transcribe_speech(
File "/Users/userName/Documents/projects/vsCode projects/soni_translate/SoniTranslate/soni_translate/speech_segmentation.py", line 34, in transcribe_speech
model = whisperx.load_model(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/asr.py", line 347, in load_model
vad_model = load_vad_model(torch.device(device), use_auth_token=None, **default_vad_options)
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/vad.py", line 47, in load_vad_model
raise RuntimeError(
RuntimeError: Model has been downloaded but the SHA256 checksum does not not match. Please retry loading the model.`

Control working mode: cpu/cuda

Hi guys,

Can someone please suggest how to effectively control the working mode?

app_rvc.py is automatically started in cuda mode:

[INFO] >> Working in: cuda

However I have a quite old MX150 GPU, and it constantly fails with a CUDA out of memory on Transcribing stage, no matter how I tweak Batch size / Compute type / Whisper ASR model or PYTORCH_CUDA_ALLOC_CONF.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 78.00 MiB. GPU 0 has a total capacity of 2.00 GiB of which 0 bytes is free. Of the allocated memory 101.51 MiB is allocated by PyTorch, and 72.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Therefore I would like to fallback to cpu mode.

Running SoniTranslate dev_24_3 on Windows 11.

Thanks!

Collaboration Proposal

Hey, your work is truly amazing, love it!!!

I am a UI/UX designer and Python developer, sometimes you know I want to translate Indian, Korean, and Chinese lectures into my target language whether on YouTube or other streaming platforms, some services translate videos to other languages, and yours as well, however, would it not be easier and simpler to develop like an extension that translates right away like how google translates web pages in a click of a button. Plus, podcasts...

I am willing to collaborate and further develop your amazing program if you like the idea, you can reach me at:
[email protected]

Sincerely,
Shakhruz Bakhtiyarov

Issues with diarization and pyannote 3.1

Hey, truly incredible, thank you for all your efforts,

i am having an issue with pyannote 3.1 on Google Collab (pyannote 2.0 works fine)

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 235, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1627, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1173, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 690, in wrapper
response = f(*args, **kwargs)
File "/content/SoniTranslate/app_rvc.py", line 288, in batch_multilingual_media_conversion
output_file = self.multilingual_media_conversion(
File "/content/SoniTranslate/app_rvc.py", line 549, in multilingual_media_conversion
self.result_diarize = diarize_speech(
File "/content/SoniTranslate/soni_translate/speech_segmentation.py", line 175, in diarize_speech
raise error
TypeError: exceptions must derive from BaseException

app_rvc.py

When running python app_rvc.py, how do I resolve this?
/tmp/gradio/6cb4020ad75bb1cb116c865ab91842f8753c7acc/Video_main.mp4 Process video... process audio... process audio... ... Error can't create the audio file Traceback (most recent call last): File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/routes.py", line 488, in run_predict output = await app.get_blocks().process_api( File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1434, in process_api data = self.postprocess_data(fn_index, result["prediction"], state) File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1335, in postprocess_data prediction_value = block.postprocess(prediction_value) File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/components/file.py", line 254, in postprocess "name": self.make_temp_copy_if_needed(y), File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/components/base.py", line 226, in make_temp_copy_if_needed temp_dir = self.hash_file(file_path) File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/components/base.py", line 190, in hash_file with open(file_path, "rb") as f: FileNotFoundError: [Errno 2] No such file or directory: 'sub_ori.srt'

Voiceless Track Separation Error

Propose a file and function well, but another file not found:

[INFO] >> Voiceless Track Separation...
100%|█████████████████████████████████████████| 210/210 [00:51<00:00,  4.05it/s]
[ERROR] >> Error comnand
Traceback (most recent call last):
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/route_utils.py", line 235, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/blocks.py", line 1627, in process_api
    result = await self.call_function(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/blocks.py", line 1173, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/utils.py", line 690, in wrapper
    response = f(*args, **kwargs)
  File "/home/bazza/src/sonitr/SoniTranslate/app_rvc.py", line 365, in batch_multilingual_media_conversion
    output_file = self.multilingual_media_conversion(
  File "/home/bazza/src/sonitr/SoniTranslate/app_rvc.py", line 1085, in multilingual_media_conversion
    run_command(command_volume_mix)
  File "/home/bazza/src/sonitr/SoniTranslate/soni_translate/utils.py", line 66, in run_command
    raise Exception(errors.decode())
Exception: ffmpeg version 6.1.1 Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 12.3.0 (conda-forge gcc 12.3.0-5)

ffmpeg error

how to fix this ?

Exception: ffmpeg version 6.1.1 Copyright (c) 2000-2023 the FFmpeg developers
built with clang version 17.0.6

Metadata:
encoder : Lavf60.16.100
Duration: 00:01:43.31, bitrate: 1411 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s
[aist#1:0/pcm_s32le @ 0000014CFD9B66C0] Guessed Channel Layout: mono
Input #1, wav, from 'audio_dub_solo.ogg':
Duration: 00:01:42.73, bitrate: 768 kb/s
Stream #1:0: Audio: pcm_s32le ([1][0][0][0] / 0x0001), 24000 Hz, 1 channels, s32, 768 kb/s
[aost#0:0 @ 0000014CFDA64CC0] Unknown encoder 'libmp3lame'
[aost#0:0 @ 0000014CFDA64CC0] Error selecting an encoder
Error opening output file audio_mix.mp3.
Error opening output files: Encoder not found

Need Support for FastAPI for the SoniTranslate

I am looking for FastAPi support for this project I have gone through the codebase and cannot find it. Can anyone help me with that I would be very gratefull. Or can someone guide me step by step I want to make a FastAPi that will take a video translates using coqui TTS and then dub the video on it.

Best Regards

help me please

Hello, I'm deploying on a computer, it seems that everything is installed. I get the link, everything works. I insert a token, and a link to the video, then an error pops up. What's wrong?
Initial log:
изображение
end log with error:
изображение
Thanks a lot! Good luck)

Whisper

Whisper Do a good translation of subtitles by this command :

whisper x.wav --model large-v3 --task translate --output_format srt --threads 10

even better than google translate and multipe languages.
So I hope you can add an option to use it
Thanks you

[ERROR] >> [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Transpose node. Name:'Transpose_2'

Firstly thanks for your great project <3
I hope will be a free ai translation in the future update its seem openai api cost little expensive for me but Thank you for this awesome tool.
and MY ISSUE IS :
Everything work fine until this problem when I use Voiceless Track, the problem seem on onnxruntime-gpu but I dont know why , I have geforce 960M its 4gb vram

[INFO] >> Creating final translated video...
1it [00:00,  5.05it/s][INFO] >> Avoid overlap for audio2/audio/5.1.ogg with 5.6
19it [00:01, 10.99it/s][INFO] >> Avoid overlap for audio2/audio/90.8.ogg with 91.02
29it [00:02, 10.91it/s][INFO] >> Avoid overlap for audio2/audio/160.8.ogg with 161.22000000000003
31it [00:02, 10.84it/s][INFO] >> Avoid overlap for audio2/audio/165.8.ogg with 166.54000000000005
[INFO] >> Avoid overlap for audio2/audio/170.8.ogg with 171.56000000000006
37it [00:03, 10.36it/s]
[INFO] >> Voiceless Track Separation...
2024-05-19 04:15:50.5631879 [E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running Transpose node. Name:'Transpose_2' Status Message: CUDA error cudaErrorNoKernelImageForDevice:no kernel image is available for execution on the device
[ERROR] >> [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Transpose node. Name:'Transpose_2' Status Message: CUDA error cudaErrorNoKernelImageForDevice:no kernel image is available for execution on the device
[INFO] >> Done: C:\Users\PORTATIL\SoniTranslate\outputs\Fury __en.mp4
(sonitr) PS C:\Users\PORTATIL\SoniTranslate> python app_rvc.py
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
[INFO] >> PIPER TTS enabled
[INFO] >> Coqui XTTS enabled
[INFO] >> In this app, by using Coqui TTS (text-to-speech), you acknowledge and agree to the license.
You confirm that you have read, understood, and agreed to the Terms and Conditions specified at the following link:
https://coqui.ai/cpml.txt.
[INFO] >> Working in: cuda
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
IMPORTANT: You are using gradio version 4.19.2, however version 4.29.0 is available, please upgrade.

Sorry for my English

Document Translation (No generated SRT file)

Hey there, when translating documentations and creating videobooks, I would love it we could include the actual documentation in there translated or having the option to have the original on one side and the translated one on the other side.

But my real issue is when I create audiobook, I would definitely need an SRT file, that would be generated when translating audio.

I know it's almost impossible because you have to define the amount of characters and since it's splitted in segmentations (let's pretend I've set the maximum characters to 200) it will be hard to generate a SRT file if the segments are not exactly as the actual documentation. But there has to be some ways to achieve this. This would be a game changer for my work! We work in IT Accessibility.

Thanks!

AI Dubbing API

Thank you for building this project! I work at a company called Sieve and this is a part of what inspired us to build our Dubbing API. It's a bit different than this as it supports voice cloning, different voice engines, and higher quality translations using other closed-source solutions but it's an example of the bounds of what this tech can do today.

I'd love to contribute our learnings in some way to this project. I think the most challenge part of the problem is around how one handles audio speedups and slowdowns across languages. Different applications seem to want different tradeoffs in the "sync"-ness versus how drastic the speedup tends to be.

Curious if there are improvements in the queue on that vector for this project and if we can contribute in any way? Would also love feedback on what we've built as I think it's something the community would love!

[WinError 2] The system cannot find the specified file (Windows)

Hi, I just installed and ran sonitranslate and everything works fine.
But the problem is that it only works when I pass it a YouTube url, but when I want to upload a file locally it gives me the error "[WinError 2] The system cannot find the specified file".
I attach the details of the console.
image

The translation speed is incredible

Hi, Your work is incredible.
The translator has become much faster, now it runs 10 times faster (on gpu).
Is it possible to speed up RVC processing? When processing, my video card loads by 30%, it would be possible to process it in several threads to speed it up (the option to select threads, if that would be really cool). The same Wishperx uses 100% load on the video card and converts audio into text in just a few seconds.

Also, as I understand, piper tts does not work on Windows
[WARNING] >> No module named 'piper'
[INFO] >> PIPER TTS disabled

during installation it says: ERROR: Cannot install piper-tts==1.1.0 and piper-tts==1.2.0 because these package versions have conflicting dependencies.
Maybe there is some solution?

There is also an excellent fast tts silero: https://github.com/snakers4/silero-models
Maybe it will suit you.
Choice tts is always good.

Thank you so much for your work!

any way to add srt file in the source?

hi,
translating directly from danish to english is never working correctly with anything i have tried,
but can get AI translated subtitless that is 70-80% correct, and then modify them to be understandable in english.
so is there any way to either modify whatever SoniTranslate translates, or get it to take an srt file with timing into account when generating new audio?

I would like to suggest a couple of functions

Hello, your creation is beautiful!
I would like to suggest a couple of functions.
From simple:
Add a checkbox on/off audio acceleration
In the audio mixer output settings (detailed volume settings((volume percentage))
RVC settings (index_rate, rms_mix_rate, protect)
From a more complex one:
Save .srt subtitles files in the original and target languages.
Editing the translated subtitle (a field in which the translation text will be and it can be changed, save and then once again assembling the sound and applying the RVС)

How increase the number of max speakers?

I tried to increase the number of speakers to at least 8 but I end up with error messages such as : "NameError: name 'model_voice_path08' is not defined. Did you mean: 'model_voice_path00'?"
I modified three folders but this is obviously not enough. How to have 8 speakers ?
Modif.zip
Thanks

Ps : In app_rvc.Py look line 307 and 1409 = "auto" compute mode I find this in wisperX documentation

Add Background music after done please!

wow this is 100% best model chain i never seen before! but here is what i ask for improvement. can you add a function to Add Background music of input video after all process is done!

Issue installing Piper TTS and Coqui XTTS

So I don't know much about coding but this is the last steps what appeared:

DEPRECATION: omegaconf 2.0.6 has a non-standard dependency specifier PyYAML>=5.1.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of omegaconf or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at pypa/pip#12063
Installing collected packages: onnxruntime-gpu
Successfully installed onnxruntime-gpu-1.17.1

(sonitr) A:\Art_intel\SoniTranslate>pip install -q piper-tts==1.2.0
ERROR: Could not find a version that satisfies the requirement piper-phonemize~=1.1.0 (from piper-tts) (from versions: none)
ERROR: No matching distribution found for piper-phonemize~=1.1.0

(sonitr) A:\Art_intel\SoniTranslate>pip install -q -r requirements_xtts.txt
DEPRECATION: omegaconf 2.0.6 has a non-standard dependency specifier PyYAML>=5.1.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of omegaconf or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at pypa/pip#12063
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pyannote-audio 3.1.1 requires omegaconf<3.0,>=2.1, but you have omegaconf 2.0.6 which is incompatible.
pyannote-database 5.1.0 requires typer>=0.12.1, but you have typer 0.9.4 which is incompatible.

Uroman Error!

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 235, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1627, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1173, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 690, in wrapper
response = f(*args, **kwargs)
File "/content/ALEPH-WEBETA/app_rvc.py", line 355, in batch_multilingual_media_conversion
return self.multilingual_media_conversion(
File "/content/ALEPH-WEBETA/app_rvc.py", line 955, in multilingual_media_conversion
self.valid_speakers = audio_segmentation_to_voice(
File "/content/ALEPH-WEBETA/soni_translate/text_to_speech.py", line 1057, in audio_segmentation_to_voice
segments_vits_tts(filtered_vits, TRANSLATE_AUDIO_TO) # wav
File "/content/ALEPH-WEBETA/soni_translate/text_to_speech.py", line 346, in segments_vits_tts
romanize_text = uromanize(text)
File "/content/ALEPH-WEBETA/soni_translate/text_to_speech.py", line 316, in uromanize
raise ValueError(f"Error {process.returncode}: {stderr.decode()}")
ValueError: Error 2: Can't open perl script "./uroman/bin/uroman.pl": No such file or directory

Status gets stuck in transcription (30%)

Status gets stuck in transcription, do you know what the problem could be?

I'm running on Ubuntu WSL, I installed all the dependencies, I set my HF token but even so, when I load the video and put it to dub into Portuguese it gets stuck (for now at 3 hours at 30%, transcription stage).

Problem with Doc translate (PDF and DOCX)

hello,

I have installed Sonitranslate on windows.

(sonitr) D:\GITHUB\Sonitranslate\SoniTranslate>python app_rvc.py --theme aliabid94/new-theme --language french The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows. The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows. [INFO] >> PIPER TTS disabled [INFO] >> Coqui XTTS enabled [INFO] >> In this app, by using Coqui TTS (text-to-speech), you acknowledge and agree to the license. You confirm that you have read, understood, and agreed to the Terms and Conditions specified at the following link: https://coqui.ai/cpml.txt. [INFO] >> Working in: cuda Running on local URL: http://127.0.0.1:7860

DOCX DOCUMENT
I can translate text document. With docx doccument SoniTranslate stopping and disconnecting.
the docx is reading but stopping " de 2014 à 2017 il est élu député en 2017 dans la dixième >> audio/1.0.ogg"

PDF DOCUMENT

Exception in callback _ProactorBasePipeTransport._call_connection_lost(None) handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)> Traceback (most recent call last): File "C:\Users\ryzen\miniconda3\envs\sonitr\lib\asyncio\events.py", line 80, in _run self._context.run(self._callback, *self._args) File "C:\Users\ryzen\miniconda3\envs\sonitr\lib\asyncio\proactor_events.py", line 165, in _call_connection_lost self._sock.shutdown(socket.SHUT_RDWR) ConnectionResetError: [WinError 10054] Une connexion existante a dû être fermée par l’hôte distant

thanks for your help

Selecting a speaker in the subtitle editor

The definition of speaker does not always work correctly. Instead of a female voice, maybe a male voice and vice versa. Is it possible to make it so that the speaker can be assigned in subtitles edit?
Let's say something like:

   {
     "speaker": 1,
     "start": 1.172,
     "text": "Your work is very cool."
   },
   {
     "speaker": 3,
     "start": 2.372,
     "text": "Yes, I agree too, SoniTranslate is great."
   }

So that if necessary, you could fix it manually.

Speech too fast and out of sync

Hello and congratulations for this work! I just tested a lot of other projects and yours is clearly the most efficient :)

Unfortunately and I do not understand why: the speech of the translation has a speed like x10 and we do not understand anything of course ^^ and then 15 seconds later, the speech speed returns to a correct speed.
Here is a short extract to illustrate the problem: FinalTest

Could you give me a clue so that I can find a solution?

(for the same project, I've done a transcription + translate with wisper without any problems, but I'm missing the voice translation and video sync).

Many thanks in advance for your reply!

Problem after the installation

``Trying to run it after the installation gave me this problem:


python app_rvc.py
Traceback (most recent call last):
  File "/home/bazza/src/sonitr/SoniTranslate/app_rvc.py", line 7, in <module>
    import whisperx
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/__init__.py", line 1, in <module>
    from .transcribe import load_model
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/transcribe.py", line 10, in <module>
    from .asr import load_model
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/asr.py", line 13, in <module>
    from .vad import load_vad_model, merge_chunks
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/vad.py", line 11, in <module>
    from pyannote.audio.pipelines import VoiceActivityDetection
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/pyannote/audio/pipelines/__init__.py", line 26, in <module>
    from .speaker_diarization import SpeakerDiarization
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 42, in <module>
    from pyannote.audio.pipelines.speaker_verification import PretrainedSpeakerEmbedding
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_verification.py", line 45, in <module>
    from speechbrain.pretrained import (
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/__init__.py", line 4, in <module>
    from .core import Stage, Brain, create_experiment_directory, parse_arguments
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/core.py", line 38, in <module>
    from speechbrain.utils.optimizers import rm_vector_weight_decay
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/utils/__init__.py", line 11, in <module>
    from . import *  # noqa
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/utils/train_logger.py", line 268, in <module>
    class ProgressSampleLogger:
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/utils/train_logger.py", line 337, in ProgressSampleLogger
    "saver": _get_image_saver(),
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/utils/train_logger.py", line 260, in _get_image_saver
    import torchvision
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/torchvision/__init__.py", line 6, in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/torchvision/_meta_registrations.py", line 164, in <module>
    def meta_nms(dets, scores, iou_threshold):
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/torch/library.py", line 440, in inner
    handle = entry.abstract_impl.register(func_to_register, source)
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/torch/_library/abstract_impl.py", line 30, in register
    if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
RuntimeError: operator torchvision::nms does not exist

fairseq cant be installed

Hello, im using anaconda 3 env and when im trying to use R.V.C which uses fairseq, it cant be installed.

      running build_ext
      building 'fairseq.libbleu' extension
      error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for fairseq
Failed to build fairseq
ERROR: Could not build wheels for fairseq, which is required to install pyproject.toml-based projects 

Please Help!

Stucking after acceleration

[INFO] >> Apply acceleration
[INFO] >> Content in 'audio2/audio/' removed.
0it [00:00, ?it/s]

and its going forever.
Some days ago, all was fine, how to fix?

instalation

could some one write there how to install it in proper way. Not every one is experienced enough in instalation without instructions.

Using RVC 2 model starts the process, does everything up to 90% but then it crashes

I'm using an RVC model pth and index file, everything is working fine but then it crashes when Using RVC 2 model starts the process, does everything up to 90% but then it crashes, not sure why. I changed everything settings, cleared the audios and audios2 folders.

I played those audios and it did a really good job. So what's going on lol

[INFO] >> audio2/audio/0.287.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/2.168.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/6.811.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/8.052.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/14.757.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/18.699.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/20.461.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/21.641.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/23.162.ogg, Tony Robbins.pth

(sonitr) C:\Tools\AI\SoniTranslate>

IndexError in text_to_speech.py when Processing Certain WAV Files

Hi, I ran into a problem when trying to manage a specific audio file. The trouble comes up e­xactly when the code trie­s to manage a WAV document called "XTTS/AUTOMATIC_SPEAKER_00.wav". Be­low is the traceback providing more information:

[INFO] >> XTTS/AUTOMATIC_SPEAKER_00.wav
Traceback (most recent call last):
File "/home/lapo/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
...
File "/home/lapo/SoniTranslate/soni_translate/text_to_speech.py", line 474, in create_new_files_for_vc
if filtered_speaker[0]["tts_name"] == "XTTS/AUTOMATIC.wav":
IndexError: list index out of range

It seems like the code running the text-to-speech process has a bug. Specifically, when it makes new audio files for changing the voice, it tries to use an index for a list that does not exist. This causes an 'IndexError' error.

Could you please take a look into this issue? I'm unsure if the problem is with how the audio files are named or somewhere else in the steps used.

Thank you for your assistance on this project and the great work you've done so far.

AttributeError: 'list' object has no attribute 'endswith'

After install dependencies on win 10 i got this error

AttributeError: 'list' object has no attribute 'endswith'
Traceback (most recent call last):
File "C:\SoniTranslate\soni\lib\site-packages\gradio\queueing.py", line 407, in call_prediction
output = await route_utils.call_process_api(
File "C:\SoniTranslate\soni\lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
File "C:\SoniTranslate\soni\lib\site-packages\gradio\blocks.py", line 1559, in process_api
data = self.postprocess_data(fn_index, result["prediction"], state)
File "C:\SoniTranslate\soni\lib\site-packages\gradio\blocks.py", line 1447, in postprocess_data
prediction_value = block.postprocess(prediction_value)
File "C:\SoniTranslate\soni\lib\site-packages\gradio\components\file.py", line 247, in postprocess
"name": self.make_temp_copy_if_needed(y),
File "C:\SoniTranslate\soni\lib\site-packages\gradio\components\base.py", line 233, in make_temp_copy_if_needed
temp_dir = self.hash_file(file_path)
File "C:\SoniTranslate\soni\lib\site-packages\gradio\components\base.py", line 197, in hash_file
with open(file_path, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'sub_ori.[]'

RuntimeError: Model has been downloaded but the SHA256 checksum does not not match

[INFO] >> Cache flushed
[INFO] >> Processing video...
[INFO] >> Process video...
[INFO] >> Process audio...
[INFO] >> Transcribing...
Traceback (most recent call last):
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\route_utils.py", line 235, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\blocks.py", line 1627, in process_api
result = await self.call_function(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\blocks.py", line 1173, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\anyio_backends_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\anyio_backends_asyncio.py", line 851, in run
result = context.run(func, *args)
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\utils.py", line 690, in wrapper
response = f(*args, **kwargs)
File "C:\SoniTranslate\app_rvc.py", line 436, in multilingual_media_conversion
audio, self.result = transcribe_speech(
File "C:\SoniTranslate\soni_translate\speech_segmentation.py", line 34, in transcribe_speech
model = whisperx.load_model(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\whisperx\asr.py", line 347, in load_model
vad_model = load_vad_model(torch.device(device), use_auth_token=None, **default_vad_options)
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\whisperx\vad.py", line 47, in load_vad_model
raise RuntimeError(
RuntimeError: Model has been downloaded but the SHA256 checksum does not not match. Please retry loading the model.
How to solve this problem

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.