zackees / transcribe-anything Goto Github PK
View Code? Open in Web Editor NEWInput a local file or url and this service will transcribe it using Whisper AI. Completely private and Free 🤯🤯🤯
License: MIT License
Input a local file or url and this service will transcribe it using Whisper AI. Completely private and Free 🤯🤯🤯
License: MIT License
I am trying to install PyTorch version 1.12.1 with CUDA 116 using pip but it fails because this version is not available in the PyTorch package index.
Steps to Reproduce
Run the command: pip install torch==1.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
Expected Behavior
The PyTorch version 1.12.1 with CUDA 116 should be installed successfully.
Actual Behavior
The installation fails with the following error message:
ERROR: Could not find a version that satisfies the requirement torch==1.12.1 (from versions: 2.0.0, 2.0.1)
ERROR: No matching distribution found for torch==1.12.1
And a subprocess.CalledProcessError is raised.
Environment
OS: Windows 10
Python Version: 3.11
Pip Version: pip 23.2.1
Additional Context
transcribe-anything --language Chinese --device insane https://www.youtube.com/watch?v=m7huzFiIiGo
Python 3.11.7
{
"cuda_available": true,
"cuda_devices": [
{
"device_id": 0,
"multiprocessors": 76,
"name": "NVIDIA GeForce RTX 4080",
"vram": 17170956288
}
],
"num_cuda_devices": 1
}
Running transcribe_audio on https://www.youtube.com/watch?v=m7huzFiIiGo
Exception in thread Thread-1 (_readerthread):
Traceback (most recent call last):
File "C:\Users\x\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 1045, in _bootstrap_inner
self.run()
File "C:\Users\x\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\x\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 1597, in readerthread
buffer.append(fh.read())
^^^^^^^^^
File "C:\Users\x\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 37: character maps to
Error: 'NoneType' object has no attribute 'strip'
Traceback (most recent call last):
File "C:\Users\x\desktop\t\venv\Lib\site-packages\transcribe_anything\cmd.py", line 188, in main
transcribe(
File "C:\Users\x\desktop\t\venv\Lib\site-packages\transcribe_anything\api.py", line 160, in transcribe
output_dir = "text" + yt_dlp.stdout.strip()
^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'strip'
while processing https://www.youtube.com/watch?v=m7huzFiIiGo
'C:\Python312\Lib\site-packages\transcribe_anything\venv\insanely_fast_whisper\Scripts\pip install torchaudio==2.1.2' returned non-zero exit status 1.
Traceback (most recent call last):
File "C:\Python312\Lib\site-packages\transcribe_anything_cmd.py", line 208, in main
transcribe(
File "C:\Python312\Lib\site-packages\transcribe_anything\api.py", line 251, in transcribe
run_insanely_fast_whisper(
File "C:\Python312\Lib\site-packages\transcribe_anything\insanely_fast_whisper.py", line 198, in run_insanely_fast_whisper
env = get_environment()
^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\site-packages\transcribe_anything\insanley_fast_whisper_reqs.py", line 49, in get_environment
env = isolated_environment(venv_dir, deps)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\site-packages\isolated_environment_init_.py", line 22, in isolated_environment
iso_env = IsolatedEnvironment(
^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\site-packages\isolated_environment\api.py", line 145, in init
self.ensure_installed(requirements or Requirements([]))
File "C:\Python312\Lib\site-packages\isolated_environment\api.py", line 324, in ensure_installed
self.pip_install(
File "C:\Python312\Lib\site-packages\isolated_environment\api.py", line 208, in pip_install
_pip_install(self.env_path, package, build_options, full_isolation)
File "C:\Python312\Lib\site-packages\isolated_environment\api.py", line 119, in _pip_install
subprocess.run(cmd, env=act_env.env, shell=True, check=True)
File "C:\Python312\Lib\subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'C:\Python312\Lib\site-packages\transcribe_anything\venv\insanely_fast_whisper\Scripts\pip install torchaudio==2.1.2' returned non-zero exit status 1.
If switch to stable-ts, then it can be nice if you add its SRT generating function to transcribe-anything.
Apparently this video breaks transcribe-anything:
https://www.instagram.com/reel/Cqq8zq_Bg8C/?igshid=YmMyMTA2M2Y=
Investigate and fix.
Here is the error I get:
Running: pip install torch==2.1.2
ERROR: Could not find a version that satisfies the requirement torch==2.1.2 (from versions: 2.2.0)
ERROR: No matching distribution found for torch==2.1.2
I'm on MacOS on a M2 mac, with python version 3.10.10
from transcribe_anything.api import transcribe
transcribe(
url_or_file="/content/drive/MyDrive/PMS/PMS_6jan_630.ts",
output_dir="/content/drive/MyDrive/PMS/text/6jan/630/",
device="insane",
)
On using the api I am getting the following error:
/usr/lib/python3.10/json/decoder.py in raw_decode(self, s, idx)
353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Are there any plans for supporting ggerganov's whisper.cpp implementation of WhisperAI?
transcribe-anything's ease of use and API + whisper.cpp's performance would be fantastic.
I just discovered your tool last week and had tested some videos, and it worked fine. Liked it and was nice and easy to work with. But today no video and no YouTube URL work. Didn't change anything in particular. Seems to be a problem with permissions? Did try to run as sudo and made sure write permissions are OK.
Getting this error on every file or URL:
sam@sams-iMac-2 Transcribe_Anything-test % transcribe_anything ./video.mp4
Traceback (most recent call last):
File "/usr/local/bin/transcribe_anything", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/site-packages/transcribe_anything/cmd.py", line 54, in main
transcribe(
File "/usr/local/lib/python3.10/site-packages/transcribe_anything/api.py", line 55, in transcribe
os.makedirs(output_dir, exist_ok=True)
File "/usr/local/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/os.py", line 225, in makedirs
mkdir(name, mode)
FileExistsError: [Errno 17] File exists: 'video.mp4'
I am using Python 3.10.8 on macOS 12.6.1
EDIT
Just noticed, when I set output like dir --output_dir "./output/" it works
Does this project have support for Whisper's initial_prompt?
I wonder if automatically using any of the YouTube title, description, and/or tags could be useful for transcription accuracy since they are likely to be spoken in the audio but are not being used for context.
This package is amazing, love the simplicity and redundancy in installation to ensure smooth processes. One question though, instead of printing, maybe an option to output a text file with and without timestamps would be a sweet touch? Let me know if you need help implementing this. Thanks for this great package!
Cannot run on Mac M1 with Python 3.11
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.
I went into the created virtualenv "whisper" and downgraded numpy v2.0.0 to numpy==1.26.4 there but I still get an error:
AttributeError: `np.NaN` was removed in the NumPy 2.0 release. Use `np.nan` instead.. Did you mean: 'nan'?
Error: Failed to execute insanely-fast-whisper --file-name /var/folders/qq/3pmx9x793q53zw0qy3_52z400000gn/T/tmpvfof662m.wav --device-id mps --model-name openai/whisper-small --task transcribe --transcript-path /var/folders/qq/3pmx9x793q53zw0qy3_52z400000gn/T/tmpkau1ddcc/out.json --batch-size 4
Am I missing something?
Thank you
ChatGPT 4.0 code snippet:
> ffmpeg -i video.mp4 -i video/en.srt -i video/spa.srt -c copy -c:s mov_text -metadata:s:s:0 language=eng -metadata:s:s:1 language=spa output_video.mp4
The reason that the project will not work on Win64 is that this OS is not supported in static_ffmpeg?
the usage example at https://pypi.org/project/transcribe-anything/ (and in this repo's README file) is:
transcribe_anything https://www.youtube.com/watch?v=dQw4w9WgXcQ
When I run that, I get:
zsh: no matches found: https://www.youtube.com/watch?v=dQw4w9WgXcQ
same with any other YouTube video URLs.
(i am more successful with Rumble links.)
When attempting to run transcribe-anything
in insane
mode on my M1 Mac, I encountered an error related to the installation of intel-openmp==2024.0.2
. Here are the procedural steps and the error observed.
Steps to reproduce:
transcribe-anything /Users/xxx/Downloads/Test/xxxx.m4a --hf_token hf_[mytoken] --device insane
2. I encountered the following error:
subprocess.CalledProcessError: Command 'pip install intel-openmp==2024.0.2' returned non-zero exit status 1.
I understand the intel-openmp==2024.0.2 package is part of Intel's Math Kernel Library (MKL) and is typically used for linear algebra, Fourier transform, and random number capabilities in numpy, scipy, and scikit-learn. Given MKL currently doesn't support M1 chips, I believe this is causing the error in my case.
While the transcribe-anything software does support M1 chips for CPU tasks, it appears the MKL dependency needed for insane mode isn't compatible.
Please advise on whether there's a workaround or solution on M1 chips or if compatibility is planned in future updates. Thanks in advance!
System Specs: Apple M1 MacBook
Python Version: Python 3.11
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.