zackees / transcribe-anything Goto Github PK

Input a local file or url and this service will transcribe it using Whisper AI. Completely private and Free 🤯🤯🤯

License: MIT License

Python 98.61% Shell 1.39%

transcribe-anything's Issues

Installation Failure: PyTorch 1.12.1 with CUDA 116 Not Available in Package Index

I am trying to install PyTorch version 1.12.1 with CUDA 116 using pip but it fails because this version is not available in the PyTorch package index.

Steps to Reproduce
Run the command: pip install torch==1.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
Expected Behavior
The PyTorch version 1.12.1 with CUDA 116 should be installed successfully.

Actual Behavior
The installation fails with the following error message:

ERROR: Could not find a version that satisfies the requirement torch==1.12.1 (from versions: 2.0.0, 2.0.1)
ERROR: No matching distribution found for torch==1.12.1

And a subprocess.CalledProcessError is raised.

Environment
OS: Windows 10
Python Version: 3.11
Pip Version: pip 23.2.1
Additional Context

Unable to transcribe YouTube videos in Chinese

transcribe-anything --language Chinese --device insane https://www.youtube.com/watch?v=m7huzFiIiGo

Python 3.11.7

{
"cuda_available": true,
"cuda_devices": [
{
"device_id": 0,
"multiprocessors": 76,
"name": "NVIDIA GeForce RTX 4080",
"vram": 17170956288
}
],
"num_cuda_devices": 1
}
Running transcribe_audio on https://www.youtube.com/watch?v=m7huzFiIiGo
Exception in thread Thread-1 (_readerthread):
Traceback (most recent call last):
File "C:\Users\x\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 1045, in _bootstrap_inner
self.run()
File "C:\Users\x\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\x\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 1597, in readerthread
buffer.append(fh.read())
^^^^^^^^^
File "C:\Users\x\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 37: character maps to
Error: 'NoneType' object has no attribute 'strip'
Traceback (most recent call last):
File "C:\Users\x\desktop\t\venv\Lib\site-packages\transcribe_anything\cmd.py", line 188, in main
transcribe(
File "C:\Users\x\desktop\t\venv\Lib\site-packages\transcribe_anything\api.py", line 160, in transcribe
output_dir = "text" + yt_dlp.stdout.strip()
^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'strip'

while processing https://www.youtube.com/watch?v=m7huzFiIiGo

install torchaudio==2.1.2' returned non-zero exit status 1.

'C:\Python312\Lib\site-packages\transcribe_anything\venv\insanely_fast_whisper\Scripts\pip install torchaudio==2.1.2' returned non-zero exit status 1.
Traceback (most recent call last):
File "C:\Python312\Lib\site-packages\transcribe_anything_cmd.py", line 208, in main
transcribe(
File "C:\Python312\Lib\site-packages\transcribe_anything\api.py", line 251, in transcribe
run_insanely_fast_whisper(
File "C:\Python312\Lib\site-packages\transcribe_anything\insanely_fast_whisper.py", line 198, in run_insanely_fast_whisper
env = get_environment()
^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\site-packages\transcribe_anything\insanley_fast_whisper_reqs.py", line 49, in get_environment
env = isolated_environment(venv_dir, deps)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\site-packages\isolated_environment_init_.py", line 22, in isolated_environment
iso_env = IsolatedEnvironment(
^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\site-packages\isolated_environment\api.py", line 145, in init
self.ensure_installed(requirements or Requirements([]))
File "C:\Python312\Lib\site-packages\isolated_environment\api.py", line 324, in ensure_installed
self.pip_install(
File "C:\Python312\Lib\site-packages\isolated_environment\api.py", line 208, in pip_install
_pip_install(self.env_path, package, build_options, full_isolation)
File "C:\Python312\Lib\site-packages\isolated_environment\api.py", line 119, in _pip_install
subprocess.run(cmd, env=act_env.env, shell=True, check=True)
File "C:\Python312\Lib\subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'C:\Python312\Lib\site-packages\transcribe_anything\venv\insanely_fast_whisper\Scripts\pip install torchaudio==2.1.2' returned non-zero exit status 1.

Use stable-ts instead of whisper

If switch to stable-ts, then it can be nice if you add its SRT generating function to transcribe-anything.

Fix video url

Apparently this video breaks transcribe-anything:

https://www.instagram.com/reel/Cqq8zq_Bg8C/?igshid=YmMyMTA2M2Y=

Investigate and fix.

Module installation fail in local Blender python

Usually, running Blender as admin fixes problems of module installation, but not in this case:

Trouble running when it tries to install torch.

Here is the error I get:

Running: pip install torch==2.1.2
ERROR: Could not find a version that satisfies the requirement torch==2.1.2 (from versions: 2.2.0)
ERROR: No matching distribution found for torch==2.1.2

I'm on MacOS on a M2 mac, with python version 3.10.10

Not able to use transcribe-anything on google colab

from transcribe_anything.api import transcribe

transcribe(

    url_or_file="/content/drive/MyDrive/PMS/PMS_6jan_630.ts",
    output_dir="/content/drive/MyDrive/PMS/text/6jan/630/",
    device="insane",
)

On using the api I am getting the following error:
/usr/lib/python3.10/json/decoder.py in raw_decode(self, s, idx)
353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Support for whisper.cpp?

Are there any plans for supporting ggerganov's whisper.cpp implementation of WhisperAI?

transcribe-anything's ease of use and API + whisper.cpp's performance would be fantastic.

FileExistsError

I just discovered your tool last week and had tested some videos, and it worked fine. Liked it and was nice and easy to work with. But today no video and no YouTube URL work. Didn't change anything in particular. Seems to be a problem with permissions? Did try to run as sudo and made sure write permissions are OK.

Getting this error on every file or URL:

sam@sams-iMac-2 Transcribe_Anything-test % transcribe_anything ./video.mp4
Traceback (most recent call last):
  File "/usr/local/bin/transcribe_anything", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/transcribe_anything/cmd.py", line 54, in main
    transcribe(
  File "/usr/local/lib/python3.10/site-packages/transcribe_anything/api.py", line 55, in transcribe
    os.makedirs(output_dir, exist_ok=True)
  File "/usr/local/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/os.py", line 225, in makedirs
    mkdir(name, mode)
FileExistsError: [Errno 17] File exists: 'video.mp4'

I am using Python 3.10.8 on macOS 12.6.1

EDIT
Just noticed, when I set output like dir --output_dir "./output/" it works

Support for Whisper initial_prompt?

Does this project have support for Whisper's initial_prompt?

I wonder if automatically using any of the YouTube title, description, and/or tags could be useful for transcription accuracy since they are likely to be spoken in the audio but are not being used for context.

Output to txt?

This package is amazing, love the simplicity and redundancy in installation to ensure smooth processes. One question though, instead of printing, maybe an option to output a text file with and without timestamps would be a sweet touch? Let me know if you need help implementing this. Thanks for this great package!

Numpy v 2.0 does not let me run on M1 Mac

Cannot run on Mac M1 with Python 3.11

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

I went into the created virtualenv "whisper" and downgraded numpy v2.0.0 to numpy==1.26.4 there but I still get an error:

AttributeError: `np.NaN` was removed in the NumPy 2.0 release. Use `np.nan` instead.. Did you mean: 'nan'?
Error: Failed to execute insanely-fast-whisper --file-name /var/folders/qq/3pmx9x793q53zw0qy3_52z400000gn/T/tmpvfof662m.wav --device-id mps --model-name openai/whisper-small --task transcribe --transcript-path /var/folders/qq/3pmx9x793q53zw0qy3_52z400000gn/T/tmpkau1ddcc/out.json --batch-size 4

Am I missing something?
Thank you

Add subtitle embedding into mp4

ChatGPT 4.0 code snippet:

> ffmpeg -i video.mp4 -i video/en.srt -i video/spa.srt -c copy -c:s mov_text -metadata:s:s:0 language=eng -metadata:s:s:1 language=spa output_video.mp4

Question

The reason that the project will not work on Win64 is that this OS is not supported in static_ffmpeg?

Unable to transcribe YouTube videos

the usage example at https://pypi.org/project/transcribe-anything/ (and in this repo's README file) is:

transcribe_anything https://www.youtube.com/watch?v=dQw4w9WgXcQ

When I run that, I get:

zsh: no matches found: https://www.youtube.com/watch?v=dQw4w9WgXcQ

same with any other YouTube video URLs.

(i am more successful with Rumble links.)

Transcribe-Anything "insane" mode installation issue on M1 Mac

Description:

When attempting to run transcribe-anything in insane mode on my M1 Mac, I encountered an error related to the installation of intel-openmp==2024.0.2. Here are the procedural steps and the error observed.

Steps to reproduce:

I ran the CLI command:

transcribe-anything /Users/xxx/Downloads/Test/xxxx.m4a --hf_token hf_[mytoken] --device insane
	2.	I encountered the following error:
subprocess.CalledProcessError: Command 'pip install intel-openmp==2024.0.2' returned non-zero exit status 1.
I understand the intel-openmp==2024.0.2 package is part of Intel's Math Kernel Library (MKL) and is typically used for linear algebra, Fourier transform, and random number capabilities in numpy, scipy, and scikit-learn. Given MKL currently doesn't support M1 chips, I believe this is causing the error in my case.
While the transcribe-anything software does support M1 chips for CPU tasks, it appears the MKL dependency needed for insane mode isn't compatible.
Please advise on whether there's a workaround or solution on M1 chips or if compatibility is planned in future updates. Thanks in advance!
System Specs: Apple M1 MacBook
Python Version: Python 3.11

zackees / transcribe-anything Goto Github PK

transcribe-anything's Issues

Description:

Recommend Projects

Recommend Topics

Recommend Org

Jobs