zackees / transcribe-anything Goto Github PK

Input a local file or url and this service will transcribe it using Whisper AI. Completely private and Free 🤯🤯🤯

License: MIT License

Python 98.61% Shell 1.39%

transcribe-anything's Introduction

transcribe-anything

USES WHISPER AI

Over 300+⭐'s because this program this app just works! This whisper front-end app is the only one to generate a speaker.json file which partitions the conversation by who doing the speaking.

Easiest whisper implementation to install and use. Just install with pip install transcribe-anything. GPU acceleration is automatic, using the blazingly fast insanely-fast-whisper as the backend for --device insane. This is the only tool to optionally produces a speaker.json file, representing speaker-assigned text that has been de-chunkified.

Hardware acceleration on Windows/Linux/MacOS Arm (M1, M2, +) via --device insane

Input a local file or youtube/rumble url and this tool will transcribe it using Whisper AI into subtitle files and raw text.

Uses whisper AI so this is state of the art translation service - completely free. 🤯🤯🤯

Your data stays private and is not uploaded to any service.

The new version now has state of the art speed in transcriptions, thanks to the new backend --device insane, as well as producing a speaker.json file.

pip install transcribe-anything
# slow cpu mode, works everywhere
transcribe-anything https://www.youtube.com/watch?v=dQw4w9WgXcQ
# insanely fast using the insanely-fast-whisper backend.
transcribe-anything https://www.youtube.com/watch?v=dQw4w9WgXcQ --device insane
# translate from any language to english
transcribe-anything https://www.youtube.com/watch?v=dQw4w9WgXcQ --device insane --task translate

Insanely fast on `cuda` platforms

If you pass in --device insane on a cuda platform then this tool will use this state of the art version of whisper: https://github.com/Vaibhavs10/insanely-fast-whisper, which is MUCH faster and has a pipeline for speaker identification (diarization) using the --hf_token option.

Also note, insanely-fast-whisper (--device insane) included in this project has been fixed to work with python 3.11. The upstream version is still broken on python 3.11 as of 1/22/2024.

Speaker.json

When diarization is enabled via --hf_token (hugging face token) then the output json will contain speaker info labeled as SPEAKER_00, SPEAKER_01 etc. For licensing agreement reasons, you must get your own hugging face token if you want to enable this feature. Also there is an additional step to agree to the user policies for the pyannote.audio located here: https://huggingface.co/pyannote/segmentation-3.0. If you don't do this then you'll see runtime exceptions from pyannote when the --hf_token is used.

What's special to this app is that we also generate a speaker.json which is a de-chunkified version of the output json speaker section.

speaker.json example:

[
  {
    "speaker": "SPEAKER_00",
    "timestamp": [
      0.0,
      7.44
    ],
    "text": "for that. But welcome, Zach Vorhees. Great to have you back on. Thank you, Matt. Craving me back onto your show. Man, we got a lot to talk about.",
    "reason": "beginning"
  },
  {
    "speaker": "SPEAKER_01",
    "timestamp": [
      7.44,
      33.52
    ],
    "text": "Oh, we do. 2023 was the year that OpenAI released, you know, chat GPT-4, which I think most people would say has surpassed average human intelligence, at least in test taking, perhaps not in, you know, reasoning and things like that. But it was a major year for AI. I think that most people are behind the curve on this. What's your take of what just happened in the last 12 months and what it means for the future of human cognition versus machine cognition?",
    "reason": "speaker-switch"
  },
  {
    "speaker": "SPEAKER_00",
    "timestamp": [
      33.52,
      44.08
    ],
    "text": "Yeah. Well, you know, at the beginning of 2023, we had a pretty weak AI system, which was a chat GPT 3.5 turbo was the best that we had. And then between the beginning of last",
    "reason": "speaker-switch"
  }
]

Note that speaker.json is only generated when using --device insane and not for --device cuda nor --device cpu.

`cuda` vs `insane`

Insane mode eats up a lot of memory and it's common to get out of memory errors while transcribing. For example a 3060 12GB nividia card produced out of memory errors are common for big content. If you experience this then pass in --batch-size 8 or smaller. Note that any arguments not recognized by transcribe-anything are passed onto the backend transcriber.

Also, please don't use distil-whisper/distil-large-v2, it produces extremely bad stuttering and it's not entirely clear why this is. I've had to switch it out of production environments because it's so bad. It's also non-deterministic so I think that somehow a fallback non-zero temperature is being used, which produces these stutterings.

cuda is the original AI model supplied by openai. It's more stable but MUCH slower. It also won't produce a speaker.json file which looks like this:

--embed. This app will optionally embed subtitles directly "burned" into an output video.

Install

This front end app for whisper boasts the easiest install in the whisper ecosystem thanks to isolated-environment. You can simply install it with pip, like this:

pip install transcribe-anything

GPU Acceleration

GPU acceleration will be automatically enabled for windows and linux. Mac users are stuck with --device cpu mode. But it's possible that --device insane and --model mps on Mac M1+ will work, but this has been completely untested.

Usage

 transcribe-anything https://www.youtube.com/watch?v=dQw4w9WgXcQ

Will output:

Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:27.000]  We're no strangers to love, you know the rules, and so do I
[00:27.000 --> 00:31.000]  I've built commitments while I'm thinking of
[00:31.000 --> 00:35.000]  You wouldn't get this from any other guy
[00:35.000 --> 00:40.000]  I just wanna tell you how I'm feeling
[00:40.000 --> 00:43.000]  Gotta make you understand
[00:43.000 --> 00:45.000]  Never gonna give you up
[00:45.000 --> 00:47.000]  Never gonna let you down
[00:47.000 --> 00:51.000]  Never gonna run around and desert you
[00:51.000 --> 00:53.000]  Never gonna make you cry
[00:53.000 --> 00:55.000]  Never gonna say goodbye
[00:55.000 --> 00:58.000]  Never gonna tell a lie
[00:58.000 --> 01:00.000]  And hurt you
[01:00.000 --> 01:04.000]  We've known each other for so long
[01:04.000 --> 01:09.000]  Your heart's been aching but you're too shy to say it
[01:09.000 --> 01:13.000]  Inside we both know what's been going on
[01:13.000 --> 01:17.000]  We know the game and we're gonna play it
[01:17.000 --> 01:22.000]  And if you ask me how I'm feeling
[01:22.000 --> 01:25.000]  Don't tell me you're too much to see
[01:25.000 --> 01:27.000]  Never gonna give you up
[01:27.000 --> 01:29.000]  Never gonna let you down
[01:29.000 --> 01:33.000]  Never gonna run around and desert you
[01:33.000 --> 01:35.000]  Never gonna make you cry
[01:35.000 --> 01:38.000]  Never gonna say goodbye
[01:38.000 --> 01:40.000]  Never gonna tell a lie
[01:40.000 --> 01:42.000]  And hurt you
[01:42.000 --> 01:44.000]  Never gonna give you up
[01:44.000 --> 01:46.000]  Never gonna let you down
[01:46.000 --> 01:50.000]  Never gonna run around and desert you
[01:50.000 --> 01:52.000]  Never gonna make you cry
[01:52.000 --> 01:54.000]  Never gonna say goodbye
[01:54.000 --> 01:57.000]  Never gonna tell a lie
[01:57.000 --> 01:59.000]  And hurt you
[02:08.000 --> 02:10.000]  Never gonna give
[02:12.000 --> 02:14.000]  Never gonna give
[02:16.000 --> 02:19.000]  We've known each other for so long
[02:19.000 --> 02:24.000]  Your heart's been aching but you're too shy to say it
[02:24.000 --> 02:28.000]  Inside we both know what's been going on
[02:28.000 --> 02:32.000]  We know the game and we're gonna play it
[02:32.000 --> 02:37.000]  I just wanna tell you how I'm feeling
[02:37.000 --> 02:40.000]  Gotta make you understand
[02:40.000 --> 02:42.000]  Never gonna give you up
[02:42.000 --> 02:44.000]  Never gonna let you down
[02:44.000 --> 02:48.000]  Never gonna run around and desert you
[02:48.000 --> 02:50.000]  Never gonna make you cry
[02:50.000 --> 02:53.000]  Never gonna say goodbye
[02:53.000 --> 02:55.000]  Never gonna tell a lie
[02:55.000 --> 02:57.000]  And hurt you
[02:57.000 --> 02:59.000]  Never gonna give you up
[02:59.000 --> 03:01.000]  Never gonna let you down
[03:01.000 --> 03:05.000]  Never gonna run around and desert you
[03:05.000 --> 03:08.000]  Never gonna make you cry
[03:08.000 --> 03:10.000]  Never gonna say goodbye
[03:10.000 --> 03:12.000]  Never gonna tell a lie
[03:12.000 --> 03:14.000]  And hurt you
[03:14.000 --> 03:16.000]  Never gonna give you up
[03:16.000 --> 03:23.000]  If you want, never gonna let you down Never gonna run around and desert you
[03:23.000 --> 03:28.000]  Never gonna make you hide Never gonna say goodbye
[03:28.000 --> 03:42.000]  Never gonna tell you I ain't ready

Api

from transcribe_anything.api import transcribe

transcribe(
    url_or_file="https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    output_dir="output_dir",
)

Develop

Works for Ubuntu/MacOS/Win32(in git-bash) This will create a virtual environment

> cd transcribe_anything
> ./install_dev.sh
# Enter the environment:
> source activate.sh

The environment is now active and the next step will only install to the local python. If the terminal is closed then to get back into the environment cd transcribe_anything and execute source activate.sh

Required: Install to current python environment

pip install transcribe-anything
- The command transcribe_anything will magically become available.
transcribe_anything <YOUTUBE_URL>

Tech Stack

OpenAI whisper
insanely-fast-whisper
yt-dlp: https://github.com/yt-dlp/yt-dlp
static-ffmpeg
- github: https://github.com/zackees/static_ffmpeg
- pypi: https://pypi.org/project/static-ffmpeg/

Testing

Every commit is tested for standard linters and a batch of unit tests.

Versions

2.7.36: Fixed some ffmpeg dependencies.
2.7.35: All ffmpeg commands are now static_ffmpeg commands. Fixes issue.
2.7.34: Various fixes.
2.7.33: Fixes linux
2.7.32: Fixes mac m1 and m2.
2.7.31: Adds a warning if using python 3.12, which isn't supported yet in the backend.
2.7.30: adds --query-gpu-json-path
2.7.29: Made to json -> srt more robust for --device insane, bad entries will be skipped but warn.
2.7.28: Fixes bad title fetching with weird characters.
2.7.27: pytorch-audio upgrades broke this package. Upgrade to latest version to resolve.
2.7.26: Add model option distil-whisper/distil-large-v2
2.7.25: Windows (Linux/MacOS) bug with --device insane and python 3.11 installing wrong insanely-fast-whisper version.
2.7.22: Fixes transcribe-anything on Linux.
2.7.21: Tested that Mac Arm can run --device insane. Added tests to ensure this.
2.7.20: Fixes wrong type being returned when speaker.json happens to be empty.
2.7.19: speaker.json is now in plain json format instead of json5 format
2.7.18: Fixes tests
2.7.17: Fixes speaker.json nesting.
2.7.16: Adds --save_hf_token
2.7.15: Fixes 2.7.14 breakage.
2.7.14: (Broken) Now generates speaker.json when diarization is enabled.
2.7.13: Default diarization model is now pyannote/speaker-diarization-3.1
2.7.12: Adds srt_swap for line breaks and improved isolated_environment usage.
2.7.11: --device insane now generates a *.vtt translation file
2.7.10: Better support for namespaced models. Trims text output in output json. Output json is now formatted with indents. SRT file is now printed out for --device insane
2.7.9: All SRT translation errors fixed for --device insane. All tests pass.
2.7.8: During error of --device insane, write out the error.json file into the destination.
2.7.7: Better error messages during failure.
2.7.6: Improved generation of out.txt, removes linebreaks.
2.7.5: --device insane now generates better conforming srt files.
2.7.3: Various fixes for the insane mode backend.
2.7.0: Introduces an insanely-fast-whisper, enable by using --device insane
2.6.0: GPU acceleration now happens automatically on Windows thanks to isolated-environment. This will also prevent interference with different versions of torch for other AI tools.
2.5.0: --model large now aliases to --model large-v3. Use --model large-legacy to use original large model.
2.4.0: pytorch updated to 2.1.2, gpu install script updated to same + cuda version is now 121.
2.3.9: Fallback to cpu device if gpu device is not compatible.
2.3.8: Fix --models arg which
2.3.7: Critical fix: fixes dependency breakage with open-ai. Fixes windows use of embedded tool.
2.3.6: Fixes typo in readme for installation instructions.
2.3.5: Now has --embed to burn the subtitles into the video itself. Only works on local mp4 files at the moment.
2.3.4: Removed out.mp3 and instead use a temporary wav file, as that is faster to process. --no-keep-audio has now been removed.
2.3.3: Fix case where there spaces in name (happens on windows)
2.3.2: Fix windows transcoding error
2.3.1: static-ffmpeg >= 2.5 now specified
2.3.0: Now uses the official version of whisper ai
2.2.1: "test_" is now prepended to all the different output folder names.
2.2.0: Now explictly setting a language will put the file in a folder with that language name, allowing multi language passes without overwriting.
2.1.2: yt-dlp pinned to new minimum version. Fixes downloading issues from old lib. Adds audio normalization by default.
2.1.1: Updates keywords for easier pypi finding.
2.1.0: Unknown args are now assumed to be for whisper and passed to it as-is. Fixes #3
2.0.13: Now works with python 3.9
2.0.12: Adds --device to argument parameters. This will default to CUDA if available, else CPU.
2.0.11: Automatically deletes files in the out directory if they already exist.
2.0.10: fixes local file issue #2
2.0.9: fixes sanitization of path names for some youtube videos
2.0.8: fix --output_dir not being respected.
2.0.7: install_cuda.sh -> install_cuda.py
2.0.6: Fixes twitter video fetching. --keep-audio -> --no-keep-audio
2.0.5: Fix bad filename on trailing urls ending with /, adds --keep-audio
2.0.3: GPU support is now added. Run the install_cuda.sh script to enable.
2.0.2: Minor cleanup of file names (no more out.mp3.txt, it's now out.txt)
2.0.1: Fixes missing dependencies and adds whisper option.
2.0.0: New! Now a front end for Whisper ai!

Notes:

Insanely Fast whisper for GPU
- https://github.com/Vaibhavs10/insanely-fast-whisper
Fast Whisper for CPU
- https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file
A better whisper CLI that supports more options but has a manual install.
- https://github.com/ochen1/insanely-fast-whisper-cli/blob/main/requirements.txt
Subtitles translator:
- https://github.com/TDHM/Subtitles-Translator
Forum post on how to avoid stuttering
- https://community.openai.com/t/how-to-avoid-hallucinations-in-whisper-transcriptions/125300/23
More stable transcriptions:
- https://github.com/jianfch/stable-ts?tab=readme-ov-file

transcribe-anything's People

Stargazers

Watchers

transcribe-anything's Issues

Use stable-ts instead of whisper

If switch to stable-ts, then it can be nice if you add its SRT generating function to transcribe-anything.

Trouble running when it tries to install torch.

Here is the error I get:

Running: pip install torch==2.1.2
ERROR: Could not find a version that satisfies the requirement torch==2.1.2 (from versions: 2.2.0)
ERROR: No matching distribution found for torch==2.1.2

I'm on MacOS on a M2 mac, with python version 3.10.10

FileExistsError

I just discovered your tool last week and had tested some videos, and it worked fine. Liked it and was nice and easy to work with. But today no video and no YouTube URL work. Didn't change anything in particular. Seems to be a problem with permissions? Did try to run as sudo and made sure write permissions are OK.

Getting this error on every file or URL:

sam@sams-iMac-2 Transcribe_Anything-test % transcribe_anything ./video.mp4
Traceback (most recent call last):
  File "/usr/local/bin/transcribe_anything", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/transcribe_anything/cmd.py", line 54, in main
    transcribe(
  File "/usr/local/lib/python3.10/site-packages/transcribe_anything/api.py", line 55, in transcribe
    os.makedirs(output_dir, exist_ok=True)
  File "/usr/local/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/os.py", line 225, in makedirs
    mkdir(name, mode)
FileExistsError: [Errno 17] File exists: 'video.mp4'

I am using Python 3.10.8 on macOS 12.6.1

EDIT
Just noticed, when I set output like dir --output_dir "./output/" it works

install torchaudio==2.1.2' returned non-zero exit status 1.

'C:\Python312\Lib\site-packages\transcribe_anything\venv\insanely_fast_whisper\Scripts\pip install torchaudio==2.1.2' returned non-zero exit status 1.
Traceback (most recent call last):
File "C:\Python312\Lib\site-packages\transcribe_anything_cmd.py", line 208, in main
transcribe(
File "C:\Python312\Lib\site-packages\transcribe_anything\api.py", line 251, in transcribe
run_insanely_fast_whisper(
File "C:\Python312\Lib\site-packages\transcribe_anything\insanely_fast_whisper.py", line 198, in run_insanely_fast_whisper
env = get_environment()
^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\site-packages\transcribe_anything\insanley_fast_whisper_reqs.py", line 49, in get_environment
env = isolated_environment(venv_dir, deps)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\site-packages\isolated_environment_init_.py", line 22, in isolated_environment
iso_env = IsolatedEnvironment(
^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\site-packages\isolated_environment\api.py", line 145, in init
self.ensure_installed(requirements or Requirements([]))
File "C:\Python312\Lib\site-packages\isolated_environment\api.py", line 324, in ensure_installed
self.pip_install(
File "C:\Python312\Lib\site-packages\isolated_environment\api.py", line 208, in pip_install
_pip_install(self.env_path, package, build_options, full_isolation)
File "C:\Python312\Lib\site-packages\isolated_environment\api.py", line 119, in _pip_install
subprocess.run(cmd, env=act_env.env, shell=True, check=True)
File "C:\Python312\Lib\subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'C:\Python312\Lib\site-packages\transcribe_anything\venv\insanely_fast_whisper\Scripts\pip install torchaudio==2.1.2' returned non-zero exit status 1.

Unable to transcribe YouTube videos

the usage example at https://pypi.org/project/transcribe-anything/ (and in this repo's README file) is:

transcribe_anything https://www.youtube.com/watch?v=dQw4w9WgXcQ

When I run that, I get:

zsh: no matches found: https://www.youtube.com/watch?v=dQw4w9WgXcQ

same with any other YouTube video URLs.

(i am more successful with Rumble links.)

Unable to transcribe YouTube videos in Chinese

transcribe-anything --language Chinese --device insane https://www.youtube.com/watch?v=m7huzFiIiGo

Python 3.11.7

{
"cuda_available": true,
"cuda_devices": [
{
"device_id": 0,
"multiprocessors": 76,
"name": "NVIDIA GeForce RTX 4080",
"vram": 17170956288
}
],
"num_cuda_devices": 1
}
Running transcribe_audio on https://www.youtube.com/watch?v=m7huzFiIiGo
Exception in thread Thread-1 (_readerthread):
Traceback (most recent call last):
File "C:\Users\x\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 1045, in _bootstrap_inner
self.run()
File "C:\Users\x\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\x\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 1597, in readerthread
buffer.append(fh.read())
^^^^^^^^^
File "C:\Users\x\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 37: character maps to
Error: 'NoneType' object has no attribute 'strip'
Traceback (most recent call last):
File "C:\Users\x\desktop\t\venv\Lib\site-packages\transcribe_anything\cmd.py", line 188, in main
transcribe(
File "C:\Users\x\desktop\t\venv\Lib\site-packages\transcribe_anything\api.py", line 160, in transcribe
output_dir = "text" + yt_dlp.stdout.strip()
^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'strip'

while processing https://www.youtube.com/watch?v=m7huzFiIiGo

Output to txt?

This package is amazing, love the simplicity and redundancy in installation to ensure smooth processes. One question though, instead of printing, maybe an option to output a text file with and without timestamps would be a sweet touch? Let me know if you need help implementing this. Thanks for this great package!

Support for Whisper initial_prompt?

Does this project have support for Whisper's initial_prompt?

I wonder if automatically using any of the YouTube title, description, and/or tags could be useful for transcription accuracy since they are likely to be spoken in the audio but are not being used for context.

Numpy v 2.0 does not let me run on M1 Mac

Cannot run on Mac M1 with Python 3.11

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

I went into the created virtualenv "whisper" and downgraded numpy v2.0.0 to numpy==1.26.4 there but I still get an error:

AttributeError: `np.NaN` was removed in the NumPy 2.0 release. Use `np.nan` instead.. Did you mean: 'nan'?
Error: Failed to execute insanely-fast-whisper --file-name /var/folders/qq/3pmx9x793q53zw0qy3_52z400000gn/T/tmpvfof662m.wav --device-id mps --model-name openai/whisper-small --task transcribe --transcript-path /var/folders/qq/3pmx9x793q53zw0qy3_52z400000gn/T/tmpkau1ddcc/out.json --batch-size 4

Am I missing something?
Thank you

Module installation fail in local Blender python

Usually, running Blender as admin fixes problems of module installation, but not in this case:

Add subtitle embedding into mp4

ChatGPT 4.0 code snippet:

> ffmpeg -i video.mp4 -i video/en.srt -i video/spa.srt -c copy -c:s mov_text -metadata:s:s:0 language=eng -metadata:s:s:1 language=spa output_video.mp4

Support for whisper.cpp?

Are there any plans for supporting ggerganov's whisper.cpp implementation of WhisperAI?

transcribe-anything's ease of use and API + whisper.cpp's performance would be fantastic.

Installation Failure: PyTorch 1.12.1 with CUDA 116 Not Available in Package Index

I am trying to install PyTorch version 1.12.1 with CUDA 116 using pip but it fails because this version is not available in the PyTorch package index.

Steps to Reproduce
Run the command: pip install torch==1.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
Expected Behavior
The PyTorch version 1.12.1 with CUDA 116 should be installed successfully.

Actual Behavior
The installation fails with the following error message:

ERROR: Could not find a version that satisfies the requirement torch==1.12.1 (from versions: 2.0.0, 2.0.1)
ERROR: No matching distribution found for torch==1.12.1

And a subprocess.CalledProcessError is raised.

Environment
OS: Windows 10
Python Version: 3.11
Pip Version: pip 23.2.1
Additional Context

Question

The reason that the project will not work on Win64 is that this OS is not supported in static_ffmpeg?

Fix video url

Apparently this video breaks transcribe-anything:

https://www.instagram.com/reel/Cqq8zq_Bg8C/?igshid=YmMyMTA2M2Y=

Investigate and fix.

Not able to use transcribe-anything on google colab

from transcribe_anything.api import transcribe

transcribe(

    url_or_file="/content/drive/MyDrive/PMS/PMS_6jan_630.ts",
    output_dir="/content/drive/MyDrive/PMS/text/6jan/630/",
    device="insane",
)

On using the api I am getting the following error:
/usr/lib/python3.10/json/decoder.py in raw_decode(self, s, idx)
353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Transcribe-Anything "insane" mode installation issue on M1 Mac

Description:

When attempting to run transcribe-anything in insane mode on my M1 Mac, I encountered an error related to the installation of intel-openmp==2024.0.2. Here are the procedural steps and the error observed.

Steps to reproduce:

I ran the CLI command:

transcribe-anything /Users/xxx/Downloads/Test/xxxx.m4a --hf_token hf_[mytoken] --device insane
	2.	I encountered the following error:
subprocess.CalledProcessError: Command 'pip install intel-openmp==2024.0.2' returned non-zero exit status 1.
I understand the intel-openmp==2024.0.2 package is part of Intel's Math Kernel Library (MKL) and is typically used for linear algebra, Fourier transform, and random number capabilities in numpy, scipy, and scikit-learn. Given MKL currently doesn't support M1 chips, I believe this is causing the error in my case.
While the transcribe-anything software does support M1 chips for CPU tasks, it appears the MKL dependency needed for insane mode isn't compatible.
Please advise on whether there's a workaround or solution on M1 chips or if compatibility is planned in future updates. Thanks in advance!
System Specs: Apple M1 MacBook
Python Version: Python 3.11

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.