r3gm / sonitranslate Goto Github PK

Synchronized Translation for Videos. Video dubbing

License: Apache License 2.0

Python 84.11% Jupyter Notebook 15.89%

audio-processing diarization translation translate-audio translate-video video-dubbing asr automatic-dubbing document-translator dubbing

sonitranslate's Introduction

🎥 SoniTranslate 🈷️

🎬 Video Translation with Synchronized Audio 🌐

SonyTranslate is a powerful and user-friendly web application that allows you to easily translate videos into different languages. This repository hosts the code for the SonyTranslate web UI, which is built with the Gradio library to provide a seamless and interactive user experience.

Description	Link
📙 Colab Notebook
🎉 Repository
🚀 Online DEMO

SonyTranslate's web UI, which features a browser interface built on the Gradio library.

Using the project: A video guide

For a comprehensive understanding of the project, we highly recommend watching this video tutorial by DEV-MalletteS. You can watch it on YouTube by clicking the thumbnail below:

Supported languages for translation

Language Code	Language
en	English
fr	French
de	German
es	Spanish
it	Italian
ja	Japanese
nl	Dutch
uk	Ukrainian
pt	Portuguese
ar	Arabic
zh	Chinese - Simplified
zh-TW	Chinese - Traditional
cs	Czech
da	Danish
fi	Finnish
el	Greek
he	Hebrew
hu	Hungarian
ko	Korean
fa	Persian
pl	Polish
ru	Russian
tr	Turkish
ur	Urdu
hi	Hindi
vi	Vietnamese
id	Indonesian
bn	Bengali
te	Telugu
mr	Marathi
ta	Tamil
jw (or jv)	Javanese
ca	Catalan
ne	Nepali
th	Thai
sv	Swedish
am	Amharic
cy	Welsh
hr	Croatian
is	Icelandic
ka	Georgian
km	Khmer
sk	Slovak
sq	Albanian
sr	Serbian
az	Azerbaijani
bg	Bulgarian
gl	Galician
gu	Gujarati
kk	Kazakh
kn	Kannada
lt	Lithuanian
lv	Latvian
ml	Malayalam
ro	Romanian
si	Sinhala
su	Sundanese
et	Estonian
mk	Macedonian
sw	Swahili
af	Afrikaans
bs	Bosnian
la	Latin
my	Myanmar Burmese
no	Norwegian
as	Assamese
eu	Basque
ha	Hausa
ht	Haitian Creole
hy	Armenian
lo	Lao
mg	Malagasy
mn	Mongolian
mt	Maltese
pa	Punjabi
ps	Pashto
sl	Slovenian
sn	Shona
so	Somali
tg	Tajik
tk	Turkmen
tt	Tatar
uz	Uzbek
yo	Yoruba

Non-transcription

Language Code	Language
ay	Aymara
bm	Bambara
ceb	Cebuano
ny	Chichewa
dv	Divehi
doi	Dogri
ee	Ewe
gn	Guarani
ilo	Iloko
rw	Kinyarwanda
kri	Krio
ku	Kurdish
ky	Kirghiz
lg	Ganda
mai	Maithili
or	Oriya
om	Oromo
qu	Quechua
sm	Samoan
ti	Tigrinya
ts	Tsonga
ak	Akan
ug	Uighur

Example:

Original audio

Video_t.mp4

Translated audio

video_dub.mp4

Colab Runtime

To run SoniTranslate using Colab Runtime:

Install Locally (Installation tested in Linux)

Before You Start

Before you start installing and using SoniTranslate, there are a few things you need to do:

Install the NVIDIA drivers for CUDA 11.8.0, NVIDIA CUDA is a parallel computing platform and programming model that enables developers to use the power of NVIDIA graphics processing units (GPUs) to speed up compute-intensive tasks. You can find the drivers here. Follow the instructions on the website to download and install the drivers.
Accept the license agreement for using Pyannote. You need to have an account on Hugging Face and accept the license to use the models: https://huggingface.co/pyannote/speaker-diarization and https://huggingface.co/pyannote/segmentation
Create a huggingface token. Hugging Face is a natural language processing platform that provides access to state-of-the-art models and tools. You will need to create a token in order to use some of the automatic model download features in SoniTranslate. Follow the instructions on the Hugging Face website to create a token. When you are creating the new Access Token in Hugging Face, make sure to tick "Read access to contents of all public gated repos you can access".
Install Anaconda or Miniconda. Anaconda is a free and open-source distribution of Python and R. It includes a package manager called conda that makes it easy to install and manage Python environments and packages. Follow the instructions on the Anaconda website to download and install Anaconda on your system.
Install Git for your system. Git is a version control system that helps you track changes to your code and collaborate with other developers. You can install Git with Anaconda by running conda install -c anaconda git -y in your terminal (Do this after step 1 in the following section.). If you have trouble installing Git via Anaconda, you can use the following link instead:
- Git for Linux

Once you have completed these steps, you will be ready to install SoniTranslate.

Getting Started

To install SoniTranslate, follow these steps:

Create a suitable anaconda environment for SoniTranslate and activate it:

conda create -n sonitr python=3.10 -y
conda activate sonitr
python -m pip install pip==23.1.2
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Clone this github repository and navigate to it:

git clone https://github.com/r3gm/SoniTranslate.git
cd SoniTranslate

Install required packages:

pip install -r requirements_base.txt -v
pip install -r requirements_extra.txt -v
pip install onnxruntime-gpu

Install ffmpeg. FFmpeg is a free software project that produces libraries and programs for handling multimedia data. You will need it to process audio and video files. You can install ffmpeg with Anaconda by running conda install -y ffmpeg in your terminal (recommended). If you have trouble installing ffmpeg via Anaconda, you can use the following link instead: (https://ffmpeg.org/ffmpeg.html). Once it is installed, make sure it is in your PATH by running ffmpeg -h in your terminal. If you don't get an error message, you're good to go.
Optional install:

After installing FFmpeg, you can install these optional packages.

Piper TTS is a fast, local neural text to speech system that sounds great and is optimized for the Raspberry Pi 4. Piper is used in a variety of projects. Voices are trained with VITS and exported to the onnxruntime.

pip install -q piper-tts==1.2.0

Coqui XTTS is a text-to-speech (TTS) model that lets you generate realistic voices in different languages. It can clone voices with just a short audio clip, even speak in a different language! It's like having a personal voice mimic for any text you need spoken.

pip install -q -r requirements_xtts.txt
pip install -q TTS==0.21.1  --no-deps

Running SoniTranslate

To run SoniTranslate locally, make sure the sonitr conda environment is active:

conda activate sonitr

Setting your Hugging Face token as an environment variable in Linux:

export YOUR_HF_TOKEN="YOUR_HUGGING_FACE_TOKEN"

Then navigate to the SoniTranslate folder and run either the app_rvc.py

python app_rvc.py

When the local URL http://127.0.0.1:7860 is displayed in the terminal, simply open this URL in your web browser to access the SoniTranslate interface.

Stop and close SoniTranslate.

In most environments, you can stop the execution by pressing Ctrl+C in the terminal where you launched the script app_rvc.py. This will interrupt the program and stop the Gradio app. To deactivate the Conda environment, you can use the following command:

conda deactivate

This will deactivate the currently active Conda environment sonitr, and you'll return to the base environment or the global Python environment.

Starting Over

If you need to start over from scratch, you can delete the SoniTranslate folder and remove the sonitr conda environment with the following set of commands:

conda deactivate
conda env remove -n sonitr

With the sonitr environment removed, you can start over with a fresh installation.

Notes

Alternatively, you can set your Hugging Face token as a permanent environment variable with:

conda activate sonitr
conda env config vars set YOUR_HF_TOKEN="YOUR_HUGGING_FACE_TOKEN_HERE"
conda deactivate

To use OpenAI's GPT API for translation, tts or transcription, set up your OpenAI API key as an environment variable in quotes:

conda activate sonitr
conda env config vars set OPENAI_API_KEY="your-api-key-here"
conda deactivate

Command line arguments

The app_rvc.py script supports command-line arguments to customize its behavior. Here's a brief guide on how to use them:

Argument command	Default	Value	Description
--theme	Taithrah/Minimal	String	Sets the theme for the interface. Themes can be found in the Theme Gallery.
--language	english	String	Selects the interface language. Available options: afrikaans, arabic, azerbaijani, chinese_zh_cn, english, french, german, hindi, indonesian, italian, japanese, korean, marathi, persian, polish, portuguese, russian, spanish, swedish, turkish, ukrainian, vietnamese.
--verbosity_level	info	String	Sets the verbosity level of the logger: debug, info, warning, error, or critical.
--public_url		Boolean	Enables a public link.
--cpu_mode		Boolean	Enable CPU mode to run the program without utilizing GPU acceleration.
--logs_in_gui		Boolean	Shows the operations performed in Logs (obsolete).

Example usage:

python app_rvc.py --theme aliabid94/new-theme --language french

This command sets the theme to a custom theme and selects French as the interface language. Feel free to customize these arguments according to your preferences and requirements.

📖 News

🔥 2024/18/05: New Update Details

Added option Overlap Reduction
OpenAI API Key Integration for Transcription, translation, and TTS
More output types: subtitles by speaker, separate audio sound, and video only with subtitles
Access to a better-performing version of Whisper for transcribing speech on the Hugging Face Whisper page. Copy the repository ID and paste it into the 'Whisper ASR model' section in 'Advanced Settings'; e.g., kotoba-tech/kotoba-whisper-v1.1 for Japanese transcription available here
Support for ASS subtitles and batch processing with subtitles
Vocal enhancement before transcription
Added CPU mode with app_rvc.py --cpu_mode
TTS now supports up to 12 speakers
OpenVoiceV2 integration for voice imitation
PDF to videobook (displays images from the PDF)
GUI language translation in Persian and Afrikaans
New Language Support:
- Complete support: Estonian, Macedonian, Malay, Swahili, Afrikaans, Bosnian, Latin, Myanmar Burmese, Norwegian, Traditional Chinese, Assamese, Basque, Hausa, Haitian Creole, Armenian, Lao, Malagasy, Mongolian, Maltese, Punjabi, Pashto, Slovenian, Shona, Somali, Tajik, Turkmen, Tatar, Uzbek, and Yoruba
- Non-transcription: Aymara, Bambara, Cebuano, Chichewa, Divehi, Dogri, Ewe, Guarani, Iloko, Kinyarwanda, Krio, Kurdish, Kirghiz, Ganda, Maithili, Oriya, Oromo, Quechua, Samoan, Tigrinya, Tsonga, Akan, and Uighur

🔥 2024/03/02: Preserve file names in output. Multiple archives can now be submitted simultaneously by specifying their paths, directories or URLs separated by commas. Processing of a full YouTube playlist. About supported sites URL, please be aware that not all sites may work optimally. Added option for disabling diarization. Implemented soft subtitles. Format output (MP3, MP4, MKV, WAV, and OGG), and resolved issues related to file reading and diarization.

🔥 2024/02/22: Added freevc for voice imitation, fixed voiceless track, divide segments. New languages support (Swedish, Amharic, Welsh, Croatian, Icelandic, Georgian, Khmer, Slovak, Albanian, Serbian, Azerbaijani, Bulgarian, Galician, Gujarati, Kazakh, Kannada, Lithuanian, Latvian, Malayalam, Romanian, Sinhala and Sundanese). New translations of the GUI (Spanish, French, German, Italian, Japanese, Chinese Simplified, Ukrainian, Arabic, Russian, Turkish, Indonesian, Portuguese, Hindi, Vietnamese, Polish, Swedish, Korean, Marathi and Azerbaijani). With subtitle file, no align and the media file is not needed to process the SRT file. Burn subtitles to video. Queue can accept multiple tasks simultaneously. Sound alert notification. Continue process from last checkpoint. Acceleration rate regulation.

🔥 2024/01/16: Expanded language support (Thai, Nepali, Catalan, Javanese, Tamil, Marathi, Telugu, Bengali and Indonesian), the introduction of whisper large v3, configurable GUI options, integration of BARK, Facebook-mms, Coqui XTTS, and Piper-TTS. Additional features included audio separation utilities, XTTS WAV creation, use an SRT file as a base for translation, document translation, manual speaker editing, and flexible output options (video, audio, subtitles).

🔥 2023/10/29: Edit the translated subtitle, download it, adjust volume and speed options.

🔥 2023/08/03: Changed default options and added directory view of downloads.

🔥 2023/08/02: Added support for Arabic, Czech, Danish, Finnish, Greek, Hebrew, Hungarian, Korean, Persian, Polish, Russian, Turkish, Urdu, Hindi, and Vietnamese languages. 🌐

🔥 2023/08/01: Add options for use RVC models.

🔥 2023/07/27: Fix some bug processing the video and audio.

🔥 2023/07/26: New UI and add mix options.

Contributing

Welcome to contributions from the community! If you have any ideas, bug reports, or feature requests, please open an issue or submit a pull request. For more information, please refer to the contribution guidelines.

Credits

This project leverages a number of open-source projects. We would like to acknowledge and thank the contributors of the following repositories:

License

Although the code is licensed under Apache 2, the models or weights may have commercial restrictions, as seen with pyannote diarization.

sonitranslate's People

Stargazers

Watchers

Forkers

hitech777 wonhyeongseo sangramdhurve kennytat oijoijcoiejoijce denskl1 aryusoni27 render-ai transonit magicse ap1075 kingmacth serjik777 huangweiboy2 jmaigc kaptainkangaroo vcstack yaranbarzi clebersonjf83 vamsigottipati kcbf isaacmuxic bzxxxxxx vieux448 b4zz4 michahial alilotfi1389 trananh1992 arghyadipbiswas ponros gusakovgiorgi dev-mallettes shs2008 victorzrv akuzmenkov niumosun nata-art-60 artvandalism fishercc atulshukla telegahjkl davahiatak1 positivewon dsyemen syedusama5556 zhenqi1688 ultramarkorj berkblg96 orkcode grim-reapper lcsouzamenezes djmuratb lailson trader6363 vasba jmwdpk ars1ck gonzalomoramd emilalvaroaitekph2024 prashant-bhar8waj deadlycode neuralfalconyt niugee cyberxsboy jakley iamabhishekt ayman-elbanhawy gersooonn anhlbt ratmirtech gui2107 kewa2016 bigqazaqbro eternity1king trofimovm jalaj-kt shadm00 diegoccx toanluongnhu1709 erlinares wupton123 chong5u yasdelayu ieya114 fiaful christof-k whattojbank voreeent roy-bertoli

sonitranslate's Issues

rmvpe+

Hi, sorry for posting here, you don't respond to huggingface. I have a question about rmvpe+. Where did rmvpe+ come from? I can't find anything about it on the Internet. Where can I download it for local use? It works very well

API Keys

Hello, I'm running sonitranslate on Windows (anaconda), and I have a problem because I don't know where to put the OpenAi API, and is there an option to save it permanently somewhere in a file? the same with the HF token, would someone be kind enough to suggest how to deal with it?

error on M1 Mac when creating SRT file from audio file.

I'm running a local install on my M1 Mac Air. I'm getting this error when trying to create an SRT file from an audio file.

"Error
Model has been downloaded but the SHA256 checksum does not not match. Please retry loading the model."

Here's my log `(sonitr) userName@MacBook-Air SoniTranslate % python app_rvc.py
objc[5317]: Class AVFFrameReceiver is implemented in both /Users/userName/anaconda3/envs/sonitr/lib/libavdevice.58.8.100.dylib (0x122394798) and /Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/av/.dylibs/libavdevice.60.1.100.dylib (0x179a54760). One of the two will be used. Which one is undefined.
objc[5317]: Class AVFAudioReceiver is implemented in both /Users/userName/anaconda3/envs/sonitr/lib/libavdevice.58.8.100.dylib (0x1223947e8) and /Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/av/.dylibs/libavdevice.60.1.100.dylib (0x179a547b0). One of the two will be used. Which one is undefined.
[INFO] >> Working in: cpu
[WARNING] >> No module named 'piper'
[INFO] >> PIPER TTS disabled
[INFO] >> Coqui XTTS enabled
[INFO] >> In this app, by using Coqui TTS (text-to-speech), you acknowledge and agree to the license.
You confirm that you have read, understood, and agreed to the Terms and Conditions specified at the following link:
https://coqui.ai/cpml.txt.
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
[WARNING] >> Make sure to select a 'TTS Speaker' suitable for the translation language to avoid errors with the TTS.
[INFO] >> Cache flushed
[INFO] >> Processing audio...
[INFO] >> Transcribing...
Traceback (most recent call last):
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/route_utils.py", line 235, in call_process_api
output = await app.get_blocks().process_api(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/blocks.py", line 1627, in process_api
result = await self.call_function(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/blocks.py", line 1173, in call_function
prediction = await anyio.to_thread.run_sync(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, *args)
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/utils.py", line 690, in wrapper
response = f(*args, **kwargs)
File "/Users/userName/Documents/projects/vsCode projects/soni_translate/SoniTranslate/app_rvc.py", line 436, in multilingual_media_conversion
audio, self.result = transcribe_speech(
File "/Users/userName/Documents/projects/vsCode projects/soni_translate/SoniTranslate/soni_translate/speech_segmentation.py", line 34, in transcribe_speech
model = whisperx.load_model(
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/asr.py", line 347, in load_model
vad_model = load_vad_model(torch.device(device), use_auth_token=None, **default_vad_options)
File "/Users/userName/anaconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/vad.py", line 47, in load_vad_model
raise RuntimeError(
RuntimeError: Model has been downloaded but the SHA256 checksum does not not match. Please retry loading the model.`

Yandex SpeechKit Python SDK

Could you add a Yandex voice speaker?
They have SDK
https://github.com/TikhonP/yandex-speechkit-lib-python

Thanks for your hard work!

Control working mode: cpu/cuda

Hi guys,

Can someone please suggest how to effectively control the working mode?

app_rvc.py is automatically started in cuda mode:

[INFO] >> Working in: cuda

However I have a quite old MX150 GPU, and it constantly fails with a CUDA out of memory on Transcribing stage, no matter how I tweak Batch size / Compute type / Whisper ASR model or PYTORCH_CUDA_ALLOC_CONF.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 78.00 MiB. GPU 0 has a total capacity of 2.00 GiB of which 0 bytes is free. Of the allocated memory 101.51 MiB is allocated by PyTorch, and 72.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Therefore I would like to fallback to cpu mode.

Running SoniTranslate dev_24_3 on Windows 11.

Thanks!

Collaboration Proposal

Hey, your work is truly amazing, love it!!!

I am a UI/UX designer and Python developer, sometimes you know I want to translate Indian, Korean, and Chinese lectures into my target language whether on YouTube or other streaming platforms, some services translate videos to other languages, and yours as well, however, would it not be easier and simpler to develop like an extension that translates right away like how google translates web pages in a click of a button. Plus, podcasts...

I am willing to collaborate and further develop your amazing program if you like the idea, you can reach me at:
[email protected]

Sincerely,
Shakhruz Bakhtiyarov

Issues with diarization and pyannote 3.1

Hey, truly incredible, thank you for all your efforts,

i am having an issue with pyannote 3.1 on Google Collab (pyannote 2.0 works fine)

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 235, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1627, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1173, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 690, in wrapper
response = f(*args, **kwargs)
File "/content/SoniTranslate/app_rvc.py", line 288, in batch_multilingual_media_conversion
output_file = self.multilingual_media_conversion(
File "/content/SoniTranslate/app_rvc.py", line 549, in multilingual_media_conversion
self.result_diarize = diarize_speech(
File "/content/SoniTranslate/soni_translate/speech_segmentation.py", line 175, in diarize_speech
raise error
TypeError: exceptions must derive from BaseException

app_rvc.py

When running python app_rvc.py, how do I resolve this?
/tmp/gradio/6cb4020ad75bb1cb116c865ab91842f8753c7acc/Video_main.mp4 Process video... process audio... process audio... ... Error can't create the audio file Traceback (most recent call last): File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/routes.py", line 488, in run_predict output = await app.get_blocks().process_api( File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1434, in process_api data = self.postprocess_data(fn_index, result["prediction"], state) File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1335, in postprocess_data prediction_value = block.postprocess(prediction_value) File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/components/file.py", line 254, in postprocess "name": self.make_temp_copy_if_needed(y), File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/components/base.py", line 226, in make_temp_copy_if_needed temp_dir = self.hash_file(file_path) File "/home/mohit/Projects/venv/lib/python3.10/site-packages/gradio/components/base.py", line 190, in hash_file with open(file_path, "rb") as f: FileNotFoundError: [Errno 2] No such file or directory: 'sub_ori.srt'

Voiceless Track Separation Error

Propose a file and function well, but another file not found:

[INFO] >> Voiceless Track Separation...
100%|█████████████████████████████████████████| 210/210 [00:51<00:00,  4.05it/s]
[ERROR] >> Error comnand
Traceback (most recent call last):
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/route_utils.py", line 235, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/blocks.py", line 1627, in process_api
    result = await self.call_function(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/blocks.py", line 1173, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/gradio/utils.py", line 690, in wrapper
    response = f(*args, **kwargs)
  File "/home/bazza/src/sonitr/SoniTranslate/app_rvc.py", line 365, in batch_multilingual_media_conversion
    output_file = self.multilingual_media_conversion(
  File "/home/bazza/src/sonitr/SoniTranslate/app_rvc.py", line 1085, in multilingual_media_conversion
    run_command(command_volume_mix)
  File "/home/bazza/src/sonitr/SoniTranslate/soni_translate/utils.py", line 66, in run_command
    raise Exception(errors.decode())
Exception: ffmpeg version 6.1.1 Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 12.3.0 (conda-forge gcc 12.3.0-5)

ffmpeg error

how to fix this ?

Metadata:
encoder : Lavf60.16.100
Duration: 00:01:43.31, bitrate: 1411 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s
[aist#1:0/pcm_s32le @ 0000014CFD9B66C0] Guessed Channel Layout: mono
Input #1, wav, from 'audio_dub_solo.ogg':
Duration: 00:01:42.73, bitrate: 768 kb/s
Stream #1:0: Audio: pcm_s32le ([1][0][0][0] / 0x0001), 24000 Hz, 1 channels, s32, 768 kb/s
[aost#0:0 @ 0000014CFDA64CC0] Unknown encoder 'libmp3lame'
[aost#0:0 @ 0000014CFDA64CC0] Error selecting an encoder
Error opening output file audio_mix.mp3.
Error opening output files: Encoder not found

Need Support for FastAPI for the SoniTranslate

I am looking for FastAPi support for this project I have gone through the codebase and cannot find it. Can anyone help me with that I would be very gratefull. Or can someone guide me step by step I want to make a FastAPi that will take a video translates using coqui TTS and then dub the video on it.

Best Regards

portable version<3

Hello! Can I run this project locally on my PC? <3

help me please

Hello, I'm deploying on a computer, it seems that everything is installed. I get the link, everything works. I insert a token, and a link to the video, then an error pops up. What's wrong?
Initial log:

end log with error:

Thanks a lot! Good luck)

Whisper

Whisper Do a good translation of subtitles by this command :

whisper x.wav --model large-v3 --task translate --output_format srt --threads 10

even better than google translate and multipe languages.
So I hope you can add an option to use it
Thanks you

[ERROR] >> [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Transpose node. Name:'Transpose_2'

Firstly thanks for your great project <3
I hope will be a free ai translation in the future update its seem openai api cost little expensive for me but Thank you for this awesome tool.
and MY ISSUE IS :
Everything work fine until this problem when I use Voiceless Track, the problem seem on onnxruntime-gpu but I dont know why , I have geforce 960M its 4gb vram

[INFO] >> Creating final translated video...
1it [00:00,  5.05it/s][INFO] >> Avoid overlap for audio2/audio/5.1.ogg with 5.6
19it [00:01, 10.99it/s][INFO] >> Avoid overlap for audio2/audio/90.8.ogg with 91.02
29it [00:02, 10.91it/s][INFO] >> Avoid overlap for audio2/audio/160.8.ogg with 161.22000000000003
31it [00:02, 10.84it/s][INFO] >> Avoid overlap for audio2/audio/165.8.ogg with 166.54000000000005
[INFO] >> Avoid overlap for audio2/audio/170.8.ogg with 171.56000000000006
37it [00:03, 10.36it/s]
[INFO] >> Voiceless Track Separation...
2024-05-19 04:15:50.5631879 [E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running Transpose node. Name:'Transpose_2' Status Message: CUDA error cudaErrorNoKernelImageForDevice:no kernel image is available for execution on the device
[ERROR] >> [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Transpose node. Name:'Transpose_2' Status Message: CUDA error cudaErrorNoKernelImageForDevice:no kernel image is available for execution on the device
[INFO] >> Done: C:\Users\PORTATIL\SoniTranslate\outputs\Fury __en.mp4

(sonitr) PS C:\Users\PORTATIL\SoniTranslate> python app_rvc.py
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
[INFO] >> PIPER TTS enabled
[INFO] >> Coqui XTTS enabled
[INFO] >> In this app, by using Coqui TTS (text-to-speech), you acknowledge and agree to the license.
You confirm that you have read, understood, and agreed to the Terms and Conditions specified at the following link:
https://coqui.ai/cpml.txt.
[INFO] >> Working in: cuda
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
IMPORTANT: You are using gradio version 4.19.2, however version 4.29.0 is available, please upgrade.

Sorry for my English

Document Translation (No generated SRT file)

Hey there, when translating documentations and creating videobooks, I would love it we could include the actual documentation in there translated or having the option to have the original on one side and the translated one on the other side.

But my real issue is when I create audiobook, I would definitely need an SRT file, that would be generated when translating audio.

I know it's almost impossible because you have to define the amount of characters and since it's splitted in segmentations (let's pretend I've set the maximum characters to 200) it will be hard to generate a SRT file if the segments are not exactly as the actual documentation. But there has to be some ways to achieve this. This would be a game changer for my work! We work in IT Accessibility.

Thanks!

The queue can accept several tasks

"The queue can accept several tasks at the same time."
Can I upload multiple files to the queue?
And how to do it?

I would like to add lip sync.

Hi,
I would like to add lip sync in separate tab by using SadTalkerVideo-Lip (my fixed fork) or video-retalking.
I would also like to translate it into my own language.

I hope you like the first idea and add it.

AI Dubbing API

Thank you for building this project! I work at a company called Sieve and this is a part of what inspired us to build our Dubbing API. It's a bit different than this as it supports voice cloning, different voice engines, and higher quality translations using other closed-source solutions but it's an example of the bounds of what this tech can do today.

I'd love to contribute our learnings in some way to this project. I think the most challenge part of the problem is around how one handles audio speedups and slowdowns across languages. Different applications seem to want different tradeoffs in the "sync"-ness versus how drastic the speedup tends to be.

Curious if there are improvements in the queue on that vector for this project and if we can contribute in any way? Would also love feedback on what we've built as I think it's something the community would love!

IDEA: clone voice Whisper Speech

https://github.com/collabora/whisperspeech

[WinError 2] The system cannot find the specified file (Windows)

Hi, I just installed and ran sonitranslate and everything works fine.
But the problem is that it only works when I pass it a YouTube url, but when I want to upload a file locally it gives me the error "[WinError 2] The system cannot find the specified file".
I attach the details of the console.

The translation speed is incredible

Hi, Your work is incredible.
The translator has become much faster, now it runs 10 times faster (on gpu).
Is it possible to speed up RVC processing? When processing, my video card loads by 30%, it would be possible to process it in several threads to speed it up (the option to select threads, if that would be really cool). The same Wishperx uses 100% load on the video card and converts audio into text in just a few seconds.

Also, as I understand, piper tts does not work on Windows
[WARNING] >> No module named 'piper'
[INFO] >> PIPER TTS disabled

during installation it says: ERROR: Cannot install piper-tts==1.1.0 and piper-tts==1.2.0 because these package versions have conflicting dependencies.
Maybe there is some solution?

There is also an excellent fast tts silero: https://github.com/snakers4/silero-models
Maybe it will suit you.
Choice tts is always good.

Thank you so much for your work!

any way to add srt file in the source?

hi,
translating directly from danish to english is never working correctly with anything i have tried,
but can get AI translated subtitless that is 70-80% correct, and then modify them to be understandable in english.
so is there any way to either modify whatever SoniTranslate translates, or get it to take an srt file with timing into account when generating new audio?

I would like to suggest a couple of functions

Hello, your creation is beautiful!
I would like to suggest a couple of functions.
From simple:
Add a checkbox on/off audio acceleration
In the audio mixer output settings (detailed volume settings((volume percentage))
RVC settings (index_rate, rms_mix_rate, protect)
From a more complex one:
Save .srt subtitles files in the original and target languages.
Editing the translated subtitle (a field in which the translation text will be and it can be changed, save and then once again assembling the sound and applying the RVС)

How increase the number of max speakers?

I tried to increase the number of speakers to at least 8 but I end up with error messages such as : "NameError: name 'model_voice_path08' is not defined. Did you mean: 'model_voice_path00'?"
I modified three folders but this is obviously not enough. How to have 8 speakers ?
Modif.zip
Thanks

Ps : In app_rvc.Py look line 307 and 1409 = "auto" compute mode I find this in wisperX documentation

Setting your Hugging Face token as an environment in windows ?

how to or where to insert my Hugging Face token in windows ? I already have , but where to insert ?

Add Background music after done please!

wow this is 100% best model chain i never seen before! but here is what i ask for improvement. can you add a function to Add Background music of input video after all process is done!

Issue installing Piper TTS and Coqui XTTS

So I don't know much about coding but this is the last steps what appeared:

DEPRECATION: omegaconf 2.0.6 has a non-standard dependency specifier PyYAML>=5.1.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of omegaconf or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at pypa/pip#12063
Installing collected packages: onnxruntime-gpu
Successfully installed onnxruntime-gpu-1.17.1

(sonitr) A:\Art_intel\SoniTranslate>pip install -q piper-tts==1.2.0
ERROR: Could not find a version that satisfies the requirement piper-phonemize~=1.1.0 (from piper-tts) (from versions: none)
ERROR: No matching distribution found for piper-phonemize~=1.1.0

(sonitr) A:\Art_intel\SoniTranslate>pip install -q -r requirements_xtts.txt
DEPRECATION: omegaconf 2.0.6 has a non-standard dependency specifier PyYAML>=5.1.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of omegaconf or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at pypa/pip#12063
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pyannote-audio 3.1.1 requires omegaconf<3.0,>=2.1, but you have omegaconf 2.0.6 which is incompatible.
pyannote-database 5.1.0 requires typer>=0.12.1, but you have typer 0.9.4 which is incompatible.

Uroman Error!

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 235, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1627, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1173, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 690, in wrapper
response = f(*args, **kwargs)
File "/content/ALEPH-WEBETA/app_rvc.py", line 355, in batch_multilingual_media_conversion
return self.multilingual_media_conversion(
File "/content/ALEPH-WEBETA/app_rvc.py", line 955, in multilingual_media_conversion
self.valid_speakers = audio_segmentation_to_voice(
File "/content/ALEPH-WEBETA/soni_translate/text_to_speech.py", line 1057, in audio_segmentation_to_voice
segments_vits_tts(filtered_vits, TRANSLATE_AUDIO_TO) # wav
File "/content/ALEPH-WEBETA/soni_translate/text_to_speech.py", line 346, in segments_vits_tts
romanize_text = uromanize(text)
File "/content/ALEPH-WEBETA/soni_translate/text_to_speech.py", line 316, in uromanize
raise ValueError(f"Error {process.returncode}: {stderr.decode()}")
ValueError: Error 2: Can't open perl script "./uroman/bin/uroman.pl": No such file or directory

Status gets stuck in transcription (30%)

Status gets stuck in transcription, do you know what the problem could be?

I'm running on Ubuntu WSL, I installed all the dependencies, I set my HF token but even so, when I load the video and put it to dub into Portuguese it gets stuck (for now at 3 hours at 30%, transcription stage).

Choosing voices by character

An option to choose the voice for each person detected with whisperx to keep people in a video

public url problem...

can you help?

Problem with Doc translate (PDF and DOCX)

hello,

I have installed Sonitranslate on windows.

(sonitr) D:\GITHUB\Sonitranslate\SoniTranslate>python app_rvc.py --theme aliabid94/new-theme --language french The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows. The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows. [INFO] >> PIPER TTS disabled [INFO] >> Coqui XTTS enabled [INFO] >> In this app, by using Coqui TTS (text-to-speech), you acknowledge and agree to the license. You confirm that you have read, understood, and agreed to the Terms and Conditions specified at the following link: https://coqui.ai/cpml.txt. [INFO] >> Working in: cuda Running on local URL: http://127.0.0.1:7860

DOCX DOCUMENT
I can translate text document. With docx doccument SoniTranslate stopping and disconnecting.
the docx is reading but stopping " de 2014 à 2017 il est élu député en 2017 dans la dixième >> audio/1.0.ogg"

PDF DOCUMENT

Exception in callback _ProactorBasePipeTransport._call_connection_lost(None) handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)> Traceback (most recent call last): File "C:\Users\ryzen\miniconda3\envs\sonitr\lib\asyncio\events.py", line 80, in _run self._context.run(self._callback, *self._args) File "C:\Users\ryzen\miniconda3\envs\sonitr\lib\asyncio\proactor_events.py", line 165, in _call_connection_lost self._sock.shutdown(socket.SHUT_RDWR) ConnectionResetError: [WinError 10054] Une connexion existante a dû être fermée par l’hôte distant

thanks for your help

Selecting a speaker in the subtitle editor

The definition of speaker does not always work correctly. Instead of a female voice, maybe a male voice and vice versa. Is it possible to make it so that the speaker can be assigned in subtitles edit?
Let's say something like:

   {
     "speaker": 1,
     "start": 1.172,
     "text": "Your work is very cool."
   },
   {
     "speaker": 3,
     "start": 2.372,
     "text": "Yes, I agree too, SoniTranslate is great."
   }

So that if necessary, you could fix it manually.

how can I add openai api in the colab version?

Speech too fast and out of sync

Hello and congratulations for this work! I just tested a lot of other projects and yours is clearly the most efficient :)

Unfortunately and I do not understand why: the speech of the translation has a speed like x10 and we do not understand anything of course ^^ and then 15 seconds later, the speech speed returns to a correct speed.
Here is a short extract to illustrate the problem: FinalTest

Could you give me a clue so that I can find a solution?

(for the same project, I've done a transcription + translate with wisper without any problems, but I'm missing the voice translation and video sync).

Many thanks in advance for your reply!

Problem after the installation

``Trying to run it after the installation gave me this problem:


python app_rvc.py
Traceback (most recent call last):
  File "/home/bazza/src/sonitr/SoniTranslate/app_rvc.py", line 7, in <module>
    import whisperx
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/__init__.py", line 1, in <module>
    from .transcribe import load_model
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/transcribe.py", line 10, in <module>
    from .asr import load_model
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/asr.py", line 13, in <module>
    from .vad import load_vad_model, merge_chunks
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/whisperx/vad.py", line 11, in <module>
    from pyannote.audio.pipelines import VoiceActivityDetection
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/pyannote/audio/pipelines/__init__.py", line 26, in <module>
    from .speaker_diarization import SpeakerDiarization
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 42, in <module>
    from pyannote.audio.pipelines.speaker_verification import PretrainedSpeakerEmbedding
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_verification.py", line 45, in <module>
    from speechbrain.pretrained import (
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/__init__.py", line 4, in <module>
    from .core import Stage, Brain, create_experiment_directory, parse_arguments
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/core.py", line 38, in <module>
    from speechbrain.utils.optimizers import rm_vector_weight_decay
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/utils/__init__.py", line 11, in <module>
    from . import *  # noqa
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/utils/train_logger.py", line 268, in <module>
    class ProgressSampleLogger:
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/utils/train_logger.py", line 337, in ProgressSampleLogger
    "saver": _get_image_saver(),
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/speechbrain/utils/train_logger.py", line 260, in _get_image_saver
    import torchvision
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/torchvision/__init__.py", line 6, in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/torchvision/_meta_registrations.py", line 164, in <module>
    def meta_nms(dets, scores, iou_threshold):
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/torch/library.py", line 440, in inner
    handle = entry.abstract_impl.register(func_to_register, source)
  File "/home/bazza/src/miniconda3/envs/sonitr/lib/python3.10/site-packages/torch/_library/abstract_impl.py", line 30, in register
    if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
RuntimeError: operator torchvision::nms does not exist

Tutorial video

Hey, I didn't know how to contact you so I'm posting an issue here instead 😂

I made a tutorial video on how to use it, it's for a presentation

https://youtu.be/SmGkFaSzq_Q?si=16Jt9K144qtdCaR6

Fine with me if you want to use it.

fairseq cant be installed

Hello, im using anaconda 3 env and when im trying to use R.V.C which uses fairseq, it cant be installed.

      running build_ext
      building 'fairseq.libbleu' extension
      error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for fairseq
Failed to build fairseq
ERROR: Could not build wheels for fairseq, which is required to install pyproject.toml-based projects

Please Help!

Stucking after acceleration

[INFO] >> Apply acceleration
[INFO] >> Content in 'audio2/audio/' removed.
0it [00:00, ?it/s]

and its going forever.
Some days ago, all was fine, how to fix?

Recommend feature that integrates GPT's API (OpenAI) to translate subtitles

Hi,

Could I recommend a feature that integrates GPT's API (OpenAI) to translate subtitles

Implement a command line interface

Implement a command line interface for processing of several files with predefined parameters

Request for help on installing on Windows

Could you please add a guide in the readme on how to run everything on Windows, I'm having big problems getting everything running. I will be grateful to you.

instalation

could some one write there how to install it in proper way. Not every one is experienced enough in instalation without instructions.

Using RVC 2 model starts the process, does everything up to 90% but then it crashes

I'm using an RVC model pth and index file, everything is working fine but then it crashes when Using RVC 2 model starts the process, does everything up to 90% but then it crashes, not sure why. I changed everything settings, cleared the audios and audios2 folders.

I played those audios and it did a really good job. So what's going on lol

[INFO] >> audio2/audio/0.287.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/2.168.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/6.811.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/8.052.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/14.757.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/18.699.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/20.461.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/21.641.ogg, Tony Robbins.pth
[INFO] >> audio2/audio/23.162.ogg, Tony Robbins.pth

(sonitr) C:\Tools\AI\SoniTranslate>

New TTS multilingual model with voice cloning

I love this webui!
Coqui released XTTS that will be ideal for sonitranslate

https://huggingface.co/spaces/coqui/xtts

IndexError in text_to_speech.py when Processing Certain WAV Files

Hi, I ran into a problem when trying to manage a specific audio file. The trouble comes up exactly when the code tries to manage a WAV document called "XTTS/AUTOMATIC_SPEAKER_00.wav". Below is the traceback providing more information:

[INFO] >> XTTS/AUTOMATIC_SPEAKER_00.wav
Traceback (most recent call last):
File "/home/lapo/anaconda3/envs/sonitr/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
...
File "/home/lapo/SoniTranslate/soni_translate/text_to_speech.py", line 474, in create_new_files_for_vc
if filtered_speaker[0]["tts_name"] == "XTTS/AUTOMATIC.wav":
IndexError: list index out of range

It seems like the code running the text-to-speech process has a bug. Specifically, when it makes new audio files for changing the voice, it tries to use an index for a list that does not exist. This causes an 'IndexError' error.

Could you please take a look into this issue? I'm unsure if the problem is with how the audio files are named or somewhere else in the steps used.

Thank you for your assistance on this project and the great work you've done so far.

AttributeError: 'list' object has no attribute 'endswith'

After install dependencies on win 10 i got this error

AttributeError: 'list' object has no attribute 'endswith'
Traceback (most recent call last):
File "C:\SoniTranslate\soni\lib\site-packages\gradio\queueing.py", line 407, in call_prediction
output = await route_utils.call_process_api(
File "C:\SoniTranslate\soni\lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
File "C:\SoniTranslate\soni\lib\site-packages\gradio\blocks.py", line 1559, in process_api
data = self.postprocess_data(fn_index, result["prediction"], state)
File "C:\SoniTranslate\soni\lib\site-packages\gradio\blocks.py", line 1447, in postprocess_data
prediction_value = block.postprocess(prediction_value)
File "C:\SoniTranslate\soni\lib\site-packages\gradio\components\file.py", line 247, in postprocess
"name": self.make_temp_copy_if_needed(y),
File "C:\SoniTranslate\soni\lib\site-packages\gradio\components\base.py", line 233, in make_temp_copy_if_needed
temp_dir = self.hash_file(file_path)
File "C:\SoniTranslate\soni\lib\site-packages\gradio\components\base.py", line 197, in hash_file
with open(file_path, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'sub_ori.[]'

Python Unicode string stored as \u0410\u0433\u0430.

Hi, Thank you for your work!, there is a small problems with displaying encoding.
Small snippet of example:
Generated subtitles:
{
"start": 19.768,
"text": "\u0410\u0433\u0430."
},

Here is a description of this problem and a solution: https://stackoverflow.com/questions/11094380/python-unicode-string-stored-as-u84b8-u6c7d-u5730-in-file-how-to-convert-it

RuntimeError: Model has been downloaded but the SHA256 checksum does not not match

[INFO] >> Cache flushed
[INFO] >> Processing video...
[INFO] >> Process video...
[INFO] >> Process audio...
[INFO] >> Transcribing...
Traceback (most recent call last):
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\route_utils.py", line 235, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\blocks.py", line 1627, in process_api
result = await self.call_function(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\blocks.py", line 1173, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\anyio_backends_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\anyio_backends_asyncio.py", line 851, in run
result = context.run(func, *args)
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\gradio\utils.py", line 690, in wrapper
response = f(*args, **kwargs)
File "C:\SoniTranslate\app_rvc.py", line 436, in multilingual_media_conversion
audio, self.result = transcribe_speech(
File "C:\SoniTranslate\soni_translate\speech_segmentation.py", line 34, in transcribe_speech
model = whisperx.load_model(
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\whisperx\asr.py", line 347, in load_model
vad_model = load_vad_model(torch.device(device), use_auth_token=None, **default_vad_options)
File "C:\Users\ded\bin\miniconda\envs\venvst\lib\site-packages\whisperx\vad.py", line 47, in load_vad_model
raise RuntimeError(
RuntimeError: Model has been downloaded but the SHA256 checksum does not not match. Please retry loading the model.
How to solve this problem