thewh1teagle / vibe Goto Github PK

View Code? Open in Web Editor NEW

646.0 15.0 44.0 12.54 MB

Transcribe on your own!

Home Page: https://thewh1teagle.github.io/vibe/

License: MIT License

Rust 36.20% HTML 0.40% CSS 0.19% TypeScript 43.85% JavaScript 10.27% Svelte 7.11% PowerShell 1.61% Python 0.37%

ai cross-platform desktop openai rust transcribe whisper

vibe's Introduction

Vibe - Transcribe on your own!

⌨️ Transcribe audio / video offline using OpenAI Whisper

🔗 Download Vibe | Give it a Star ⭐ | Support the project 🤝

Screenshots

Features 🌟

🌍 Transcribe almost every language
🔒 Ultimate privacy: fully offline transcription, no data ever leaves your device
🎨 User friendly design
🎙️ Transcribe audio / video
📂 Batch transcribe multiple files!
📝 Support SRT, VTT, TXT, HTML, PDF, JSON formats
👀 Realtime preview
🌐 Translate to English from any language
🖨️ Print transcript directly to any printer
🔄 Automatic updates
🖥️ Optimized for CPU on (Windows / Linux)
💻 Optimized for GPU (macOS, Windows)
🎮 Optimized for Nvidia GPUs! (see INSTALL.md#nvidia)
🎮 Optimized for AMD GPUs (linux only)! (see INSTALL.md#amd)
🔧 Total Freedom: Customize Models Easily via Settings
⚙️ Model arguments for advanced users
⏳ Transcribe system audio
🎤 Transcribe from microphone
🖥️ CLI support: Use Vibe directly from the command line interface! (see --help)
👥 Speaker diarization (Beta)
📱 ~~iOS & Android support~~ (coming soon)
📥 Integrate custom models from your own site: Use vibe://download/?url=<model url>
📹 Choose caption length optimized for videos / reels
⚡ HTTP API with Swagger docs! (use --server and open http://<host>:3022/docs for docs)

Supported platforms 🖥️

MacOS Windows Linux

Install notes

See Install.md

Contribute 🤝

PRs are welcomed! In addition, you're welcome to add translations.

We would like to express our sincere gratitude to all the contributors.

Community

Roadmap 🛣️

You can see the roadmap in Vibe-Roadmap

Add translation 🌐

Copy en from desktop/src-tauri/locales folder to new directory eg pt-BR (use bcp47 language code)
Change every value in the files there, to the new language and keep the keys as is
create PR / issue in Github

In addition you can add translation to Vibe website by creating new files in the landing/static/locales.

Build 🛠️

see BUILDING.md

I want to know more!

Medium post

Issue report

You can open new issue and it's recommend to check DEBUG.md first.

Credits

Thanks for tauri.app for making the best apps framework I ever seen

Thanks for wang-bin/avbuild for pre built ffmpeg

Thanks for github.com/whisper.cpp for outstanding interface for the AI model.

Thanks for openai.com for their amazing Whisper model

Thanks for github.com for their support in open source projects, providing infastructure completly free.

And for all the amazing open source frameworks and libraries which this project uses...

vibe's People

Contributors

Stargazers

Watchers

vibe's Issues

[Bug]: Settings not availible on Win11

What happened?

A bug happened!
Cant access settings on Win11
Works on win10

Steps to reproduce

see attached clip

Inspelning.2024-04-19.152301.mp4

Windows version:
Edition Windows 11 Pro
Version 23H2
Installed on ‎2024-‎03-‎03
OS build 22631.3447
Experience Windows Feature Experience Pack 1000.22688.1000.0

What OS are you seeing the problem on?

Window

Relevant log output

No response

[Bug]: error and antivirus

Thank you very much for the software.

When I try to run (after I choose a language) and a file it writes to me like this:
AVAST antivirus blocks the installation of the file

What happened?

לאחר תמלול הקלטה של 50 דקות, התוכנה הודיעה על שגיאת failed to get segment.

Steps to reproduce

step one...
step two...

What OS are you seeing the problem on?

חלונות

Relevant log output

failed to get segment

App Version: 0.0.6
Arch: x86_64
Platform: win32
Kernel Version: 10.0.22631
OS: Windows_NT
OS Version: 10.0.22631
Models: ggml-medium.bin, ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Mode: "C:\\Users\\1234\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin"

Describe the feature

התוכנה תופסת רק חלק מהחלון המוקצה לה. הנה צילו"מ:

גם התמלול תופס רק חלק מהחלון, מה שגורם ליותר שורות, כמובן. האם ניתן לתקן זאת?
תודה רבה על התוכנה המדהימה הזו!
העברתי לחברים!

What happened?

The software makes lots of spelling mistakes in Hebrew and occasionally outputs SRT files that are not in sync.

Steps to reproduce

step one...
step two...

What OS are you seeing the problem on?

No response

Relevant log output

No response

[Feature Request]: Title: Add size information and estimated download time for OpenAI model.

Proposing the addition of two helpful details to the installation process of Vibe:

View model size: Before starting the Vibe installation, view the size of the OpenAI model and additional packages that will be required to download.
Download Time Estimate: During download, provide an estimate of remaining download time.

This helps users choose installation times and feel more comfortable.Thank you for considering this feature request.

What happened?

בהפעלה ראשונה שפת התוכנה (לא שפת התמלול) היתה בעברית, ובהפעלות הבאות השתנה לאנגלית.

Steps to reproduce

שישאר בעברית, או שתהיה אפשרות לבחור את שפת התוכנה.

What OS are you seeing the problem on?

ווינדוס 11

Relevant log output

App Version: 0.0.6
Arch: x86_64
Platform: win32
Kernel Version: 10.0.22631
OS: Windows_NT
OS Version: 10.0.22631
Models: ggml-medium.bin, ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Mode: "C:\\Users\\1234\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin"

[Bug]: Blue screen when transcribing a file

What happened?

I ran vibe.exe with an .opus file probably about 1hr in length, with large3 model, the PC randomly restarted. Then PC wouldn't boot until I unplugged/replugged all the monitor and USB cables, and it repaired some windows update.
This may be an issue on my end. It could be faulty hardware. This is the first time my PC has done this, I haven't had this problem with WhisperGUI 0.1, stable diffusion, games or encoding.

I successfully recreated the issue by trying to transcribe another file.
I posted the results of the WinDBG from the MEMORY.DMP file.

Ryzen 5700G, ASUS B450 Tomahawk MAX, 2x16GB G.SKILL 3200mhz, ASUS ROG STRIX RTX 3060 OC (12GB), Seasonic Focus+ Gold 850W

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.

IMAGE_NAME: nvlddmkm.sys

Steps to reproduce

transcribe 1hr long audio file, GPU freezes

What OS are you seeing the problem on?

Window

Relevant log output

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.

IMAGE_NAME:  nvlddmkm.sys

[Bug]: Wrong transcriptions times in Hebrew

What happened?

When using SRT / VTT and the language is hebrew the transcriptions times (for each sentence) are wrong
maybe a bug in whisper.cpp

Steps to reproduce

What OS are you seeing the problem on?

No response

Relevant log output

No response

[Bug]: failed to transcode mp3 file

What happened?

A bug happened!

Steps to reproduce

step one...
step two...

What OS are you seeing the problem on?

No response

Relevant log output

options: {
  "path": "/Users/slim/Desktop/13940-21.05.2024-ITEMA_23748018-2024F22805S0142-22.mp3",
  "model_path": "/Users/slim/Library/Application Support/github.com.thewh1teagle.vibe/ggml-medium.bin",
  "lang": "fr",
  "verbose": false,
  "n_threads": 4,
  "init_prompt": "",
  "temperature": 0.4
}

Caused by:
    Invalid data found when processing input

Location:
    core/src/audio/encoder.rs:175:9
App Version: 1.0.7
Commit Hash: 99ae746dc02135ad7a27ec0f9adafe016b8c96e4
Arch: aarch64
Platform: macos
Kernel Version: 14.0.0
OS: macos
OS Version: 14.0.0
Models: ggml-medium.bin
Default Model: ggml-medium.bin"

[Bug]: invalid discord link

What happened?

invalid discord link

Steps to reproduce

step one...
step two...

What OS are you seeing the problem on?

No response

Relevant log output

No response

Bug: failed to open model

What happened?

A bug happened!

Steps to reproduce

step one...
step two...

What OS are you seeing the problem on?

No response

Relevant log output

failed to open model

App Version: 0.0.6
Arch: x86_64
Platform: win32
Kernel Version: 10.0.19045
OS: Windows_NT
OS Version: 10.0.19045
Models: ggml-medium.bin
Default Mode: "C:\\Users\\הלל פישרמן\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin"

[Bug]: vibe: error while loading shared libraries: libavutil.so.56: cannot open shared object file: No such file or directory

What happened?

Vibe 1.0.7 crashes when starting it on Ubuntu 23.10.

Steps to reproduce

Run Ubuntu 23.10
Install Vibe deb package: vibe_1.0.7_amd64.deb
Run it
Bam: vibe: error while loading shared libraries: libavutil.so.56: cannot open shared object file: No such file or directory

What OS are you seeing the problem on?

Linux

Relevant log output

vibe: error while loading shared libraries: libavutil.so.56: cannot open shared object file: No such file or directory

Problem seems to be caused by wrong library versions expected by vibe:

Lib	Expected version	Installed version
libavcodec.so	58	60
libswresample.so	3	4
libswscale.so	5	7
libavdevice.so	58	60
libavfilter.so	7	9
libavformat.so	58	60
libavutil.so	56	58

Bug: "Vibe has stopped working"

What happened?

After selecting an .mp3 file, when I click Transcribe, I immediately get the "Vibe has stopped working" window and have to close the program.

Steps to reproduce

Clicked "Select Audio File"
Selected an .mp3 file
Clicked "Transcribe"

What OS are you seeing the problem on?

Window

Relevant log output

App Version: 1.0.5
Commit Hash: 3b4d74fa6b8f3171078df97f14add4a7463e7624
Arch: x86_64
Platform: windows
Kernel Version: 10.0.19045
OS: windows
OS Version: 10.0.19045
Models: ggml-medium.bin
Default Model: "C:\\Users\\Tom\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin"

[Feature Request]: Use vibe for YouTube videos

Describe the feature

Would it be possible to add the ability to transcribe from YouTube videos as in this Colab notebook?

https://github.com/Sourasky-DHLAB/Whisper/blob/main/Colab/Whisper_from_Youtube.ipynb

Bug: download model failed

What happened?

A bug happened!

Steps to reproduce

step one...
step two...

What OS are you seeing the problem on?

No response

Relevant log output

Failed to get content length from 'https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.bin?download=true'
App Version: 1.0.1
Commit Hash: e27ce1b4317952a4856471a4d9349ca77aeee686
Arch: x86_64
Platform: windows
Kernel Version: 10.0.22631
OS: windows
OS Version: 10.0.22631
Models: 
Default Model: Not Found

Displaying progress percentages, and not just a progress bar

Describe the feature

I can't clearly understand what the progress is from the bar, so I prefer percentages

[Feature Request]: Speaker labels (Diarization)

Goal

Provide speaker labels along with the transcriptions (eg. Speaker1: ..., Speaker2: ...)
Do it in the same time when transcribing efficient and lightweight.

Research

https://github.com/wq2012/awesome-diarization

Possible ways:
Use c/c++ diarization libs in Rust using bindgen
Replicate pyannote-audio to Rust with tch-rs

Use ONNX runtime with ort

pykeio/ort#208

pyannote/pyannote-audio#1322

Best combination:
pyannote-segmentation-30
WespeakerVoxcelebResnet34LM

I would be happy if the software displayed the transcript immediately after decoding, and not at the end of the process.

Describe the feature

This way it will be possible to correct, edit and copy the text while working, and not only after the whole process.

Bug[ivrit]: failed to get segment

What happened?

A bug happened!

Steps to reproduce

step one...
step two...

What OS are you seeing the problem on?

No response

Relevant log output

failed to get segment

App Version: 0.0.6
Arch: x86_64
Platform: win32
Kernel Version: 10.0.19041
OS: Windows_NT
OS Version: 10.0.19041
Models: ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Mode: "C:\\Users\\shayh\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin"

Bug: Invalid data found when processing input

What happened?

A bug happened!

Steps to reproduce

step one...
step two...

What OS are you seeing the problem on?

Window

Relevant log output

"Error in desktop\\src-tauri\\src\\cmd.rs at line 71: Invalid data found when processing input"
options: ModelArgs { path: "C:\\Users\\1234\\Desktop\\short.wav", model: "C:\\Users\\1234\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin", lang: Some("he"), verbose: false, n_threads: Some(4), init_prompt: Some(""), temperature: Some(0.4) }

App Version: 0.0.9
Arch: x86_64
Platform: windows
Kernel Version: 10.0.22631
OS: windows
OS Version: 10.0.22631
Models: ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Mode: "C:\\Users\\1234\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin"

Bug: Incorrect "NaN" end timestamps

What happened?

Thanks for the update! But the new version 1.0.6 has this problem: The model is generating incorrect end timestamps in the output. The end timestamps are appearing as "NaN:NaN:NaN,NaN" instead of the expected time format (e.g., "0:00:10,500").
I have tried using the ggml-medium.bin and ggml-large-v3.bin models and tried a few different videos, all resulting in the same invalid end timestamps.

Here is an example:

1
0:00:00,000 --> NaN:NaN:NaN,NaN
Some text here

2
0:00:04,160 --> NaN:NaN:NaN,NaN
Some text here

Steps to reproduce

Transcribe any video
Change the display option from Text to the SRT
The end timestamp for each line is shown as "NaN:NaN:NaN,NaN"

What OS are you seeing the problem on?

Window

Relevant log output

App Version: 1.0.6
Commit Hash: cb51db5bc8ae1d800b3f7af9faa780552808710b
Arch: x86_64
Platform: windows
Kernel Version: 10.0.22631
OS: windows
OS Version: 10.0.22631
Models: ggml-large-v3.bin, ggml-medium.bin
Default Model: "ggml-medium.bin"

Bug: the vibe application is crashing

What happened?

A bug happened! I added steps to reproduce.
please help i need to transcribe a large file
[email protected]

Steps to reproduce

install the app
download the new hebrew model
move it to the model folder
open any size of audio file from the app and click transcribe
Result:
the app just crashes and i get the following error message:
Problem Report for vibe
vibe quit unexpectedly.
Click Reopen to open the application again. This report will be sent
automatically to Apple.

Comments
Show Details OK Reopen

What OS are you seeing the problem on?

No response

Relevant log output

App Version: 0.0.6
Arch: x86_64
Platform: darwin
Kernel Version: 13.1.0
OS: Darwin
OS Version: 13.1.0
Models: ivrit-ai--whisper-large-v2-tuned-ggml-model.bin, ggml-medium.bin
Default Mode: ggml-medium.bin"

Bug: failed to create whisper context

What happened?

A bug happened!

Steps to reproduce

step one...
step two...

What OS are you seeing the problem on?

No response

Relevant log output

"Error in desktop\\src-tauri\\src\\main.rs at line 60: failed to open model\n\nCaused by:\n    Failed to create a new whisper context."
options: ModelArgs { path: "C:\\Users\\מנהל ומבקר\\Downloads\\AA.mp3", model: "C:\\Users\\מנהל ומבקר\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin", lang: Some("he"), verbose: false, n_threads: Some(4), init_prompt: Some(""), temperature: Some(0.4) }

App Version: 0.0.7
Arch: x86_64
Platform: windows

"Error in desktop\\src-tauri\\src\\main.rs at line 60: failed to open model\n\nCaused by:\n    Failed to create a new whisper context."
options: ModelArgs { path: "C:\\Users\\מנהל ומבקר\\Downloads\\AA.mp3", model: "C:\\Users\\מנהל ומבקר\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin", lang: Some("he"), verbose: false, n_threads: Some(4), init_prompt: Some(""), temperature: Some(0.4) }
Kernel Version: 10.0.19045
OS: windows
OS Version: 10.0.19045
Models: ggml-medium.bin, ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Mode: "C:\\Users\\מנהל ומבקר\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin"

[Feature Request]: Linux release

Describe the feature

Why no Linux release if it's mentioned in the README?

Bug: Vibe does not detect ggml-large-v3.bin

What happened?

I did replaced the medium model with the large model to save space. I have chosen the large model in the model customisation under setting, but Vibe is still default to loading the medium Model.

Steps to reproduce

Download the Large Model
Copy and Paste the Large Model
Remove the Medium Model

What OS are you seeing the problem on?

No response

Relevant log output

options: ModelArgs { path: "/Users/**Users**/Downloads/Video.mp4", model: "/Users/**Users**/Library/Application Support/github.com.thewh1teagle.vibe/ggml-medium.bin", lang: Some("en"), verbose: false, n_threads: Some(4), init_prompt: Some(""), temperature: Some(0.4) }
App Version: 1.0.2
Commit Hash: 6ba4a50a11e1b425575f2cbecfc505920b737a77
Arch: aarch64
Platform: macos
Kernel Version: 14.4.1
OS: macos
OS Version: 14.4.1
Models: ggml-large-v3.bin
Default Model: ggml-medium.bin"

[Bug]: Crash at launch on linux

What happened?

Instant crash on linux when launching.
This is what I read from the console:

vibe: error while loading shared libraries: libopenblas.so.0: cannot open shared object file: No such file or directory

You should add this requirement on readme file:

sudo apt-get install libopenblas-dev

Steps to reproduce

Launch vibe on ubuntu 22

What OS are you seeing the problem on?

Linux

Relevant log output

No response

Bug: failed to open model

What happened?

A bug happened!

Steps to reproduce

step one...
step two...

What OS are you seeing the problem on?

No response

Relevant log output

failed to open model

App Version: 0.0.6
Arch: x86_64
Platform: win32
Kernel Version: 10.0.19045
OS: Windows_NT
OS Version: 10.0.19045
Models: ggml-medium.bin, ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Mode: "C:\\Users\\הלל פישרמן\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin"

Transcribe several files one by one

Describe the feature

אשמח שתהיה אפשרות לבחירת כמה קבצים ותמלולם. כרגע ניתן לבחור קובץ נוסף, רק לאחר שהקודם סיים לרוץ.
אשמח גם שהייצוא לקובץ יהיה אוטומטי, עם שם קובץ האודיו המתומלל.

[Bug]: Window bigger than monitor when opening first time

What happened?

Window bigger than monitor when opening first time
Possible solution to remove width and height from conf and set it to maximized at first time

[Bug]: Reliable format across TS and Rust

What happened?

Every platform has its own VSCode with its own settings, so every few days the format of some of the files completely changes somehow (tabs/spaces, etc.). Rust makes it easy using rustfmt, but TypeScript/JSON, etc., don't.

Steps to reproduce

What OS are you seeing the problem on?

No response

Relevant log output

Bug: hangs in the end of transcription

What happened?

בנסיון תמלול בגרסה 9, (בגרסה 8 תומלל מצויין) התוכנה נסגרת לאחר כמה שניות:

Steps to reproduce

step one...
step two...

What OS are you seeing the problem on?

Window

Relevant log output

App Version: 0.0.9
Arch: x86_64
Platform: windows
Kernel Version: 10.0.22631
OS: windows
OS Version: 10.0.22631
Models: ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Mode: "C:\\Users\\1234\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin"

Bug: Vibe - failed to get segment

What happened?

A bug happened!
failed to get segment

Steps to reproduce

step one...
step two...

What OS are you seeing the problem on?

Window

Relevant log output

failed to get segment

App Version: 0.0.4
Arch: x86_64
Platform: win32
Kernel Version: 10.0.19045
OS: Windows_NT
OS Version: 10.0.19045
Models: ggml-medium.bin
Default Mode: "C:\\Users\\hatzerh\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin"

Bug: failed to get segment

What happened?

A bug happened!

Steps to reproduce

An attempt to transcribe a 20-minute file.
Files that are a few minutes old are successfully transcribed!

What OS are you seeing the problem on?

Window

Relevant log output

"Error in desktop\\src-tauri\\src\\main.rs at line 60: failed to get segment\n\nCaused by:\n    Invalid UTF-8 detected in a string from Whisper. Index: 428, Length: 1."
options: ModelArgs { path: "C:\\Users\\user\\Downloads\\פודקאסטים\\כל מה שרצית לדעת על הכל 🐱_💻 סייברסייבר ע07פ10(2).mp3", model: "C:\\Users\\user\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model_2.bin", lang: Some("he"), verbose: false, n_threads: Some(4), init_prompt: Some(""), temperature: Some(0.4) }

App Version: 0.0.7
Arch: x86_64
Platform: windows
Kernel Version: 10.0.22631
OS: windows
OS Version: 10.0.22631
Models: ggml-large.bin, ivrit-ai--whisper-large-v2-tuned-ggml-model_2.bin
Default Mode: "C:\\Users\\user\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model_2.bin"

[Bug]: Build for osx x86 failed

What happened?

A bug happened!

Steps to reproduce

step one...
step two...

What OS are you seeing the problem on?

No response

Relevant log output

No response

Bug: Crash on loading model

What happened?

After I hit "Transcribe", the app crashes.

Since, at least from what I've noticed, there are no crash logs, here's what the cmd log shows:

log

C:\Windows\System32>[2024-05-22T01:57:23Z DEBUG vibe_desktop] Vibe App Running
[2024-05-22T01:57:24Z DEBUG vibe_desktop::setup] webview version: 125.0.2535.51
[2024-05-22T01:57:33Z DEBUG vibe::model] Transcribe called with {
      "path": "D:\\2023-11-30 11-25-58.mkv",
      "model_path": "C:\\Users\\AutumnLeaf\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin",
      "lang": "en",
      "verbose": false,
      "n_threads": 4,
      "init_prompt": "",
      "temperature": 0.4
    }
[2024-05-22T01:57:33Z DEBUG vibe::audio] input is D:\2023-11-30 11-25-58.mkv and output is C:\Users\AUTUMN~1\AppData\Local\Temp\.tmptp0lsV.wav
[2024-05-22T01:57:33Z DEBUG vibe::audio::encoder] decoder channel layout is 2
[2024-05-22T01:57:33Z DEBUG vibe::audio::encoder] +-----------+
    |    in     |default--[48000Hz fltp:stereo]--auto_aresample_0:default
    | (abuffer) |
    +-----------+

                                                       +---------------+
    Parsed_anull_0:default--[16000Hz s16:mono]--default|      out      |
                                                       | (abuffersink) |
                                                       +---------------+

                                                         +----------------+
    auto_aresample_0:default--[16000Hz s16:mono]--default| Parsed_anull_0 |default--[16000Hz s16:mono]--out:default
                                                         |    (anull)     |
                                                         +----------------+

                                              +------------------+
    in:default--[48000Hz fltp:stereo]--default| auto_aresample_0 |default--[16000Hz s16:mono]--Parsed_anull_0:default
                                              |   (aresample)    |
                                              +------------------+


[2024-05-22T01:57:33Z DEBUG vibe::audio] wav reader read from "C:\\Users\\AUTUMN~1\\AppData\\Local\\Temp\\.tmptp0lsV.wav"
[2024-05-22T01:57:33Z DEBUG vibe::audio] parsing C:\Users\AUTUMN~1\AppData\Local\Temp\.tmptp0lsV.wav
[2024-05-22T01:57:33Z DEBUG vibe::model] open model...

Steps to reproduce

Open Vibe
Attempt to Transcribe
Crash when trying to load model

Tried on Windows 10 first, and now on Windows 11. Also, my brother's pc (Windows 10) for some reason has no issue with Vibe.

Could it be because I've got an old cpu (i7-3770 vs my brother's Ryzen 5 2600) or motherboard or something like that that there's some conflict happening? In case it helps, I am able to run Stable Diffusion locally without issues (both Automatic 1111 and ComfyUI).

I should also note that the user shown on the logs of Vibe is AUTUMN~1 instead of AutumnLeaf. Since I've seen a few past issues here due to hebraic user names, perhaps the tilde (which shouldn't even be there) could be a cause.

What OS are you seeing the problem on?

Window

logs

Relevant log output

App Version: 1.0.7
Commit Hash: 99ae746dc02135ad7a27ec0f9adafe016b8c96e4
Arch: x86_64
Platform: windows
Kernel Version: 10.0.22631
OS: windows
OS Version: 10.0.22631
Models: ggml-medium.bin
Default Model: "C:\\Users\\AutumnLeaf\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin"

Bug: failed to create whisper context

What happened?

A bug happened!

Steps to reproduce

I added a file
I clicked on "Transcribe"
In practice it threw an error right at the start

What OS are you seeing the problem on?

Window

Relevant log output

"Error in desktop\\src-tauri\\src\\main.rs at line 60: failed to open model\n\nCaused by:\n    Failed to create a new whisper context."
options: ModelArgs { path: "C:\\Users\\משתמש\\Music\\שירי יוסי גרין\\song_04.mp3", model: "C:\\Users\\משתמש\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin", lang: Some("auto"), verbose: false, n_threads: Some(4), init_prompt: Some(""), temperature: Some(0.4) }

App Version: 0.0.8
Arch: x86_64
Platform: windows
Kernel Version: 10.0.26100
OS: windows
OS Version: 10.0.26100
Models: ggml-medium.bin, ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Mode: "C:\\Users\\משתמש\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin"

[Bug]: GPU seems not working on M2 macbook pro

What happened?

The transcribing is extremely slow on my M2 Macbook Pro.

Steps to reproduce

I am using an M2 chip Macbook Pro. The transcribing is quite slow. While I check the GPU usage, it shows 0%.
Is this normal?

I did not find any setting related to GPU.

What OS are you seeing the problem on?

MacOS

Relevant log output

App Version: 1.0.6
Commit Hash: cb51db5bc8ae1d800b3f7af9faa780552808710b
Arch: aarch64
Platform: macos
Kernel Version: 14.3.0
OS: macos
OS Version: 14.3.0
Models: ggml-large-v2.bin
Default Model: Not Found

Bug: failed to get segment

What happened?

A bug happened!

Steps to reproduce

I transcribed a large file, at the end of the transcription, after the progress bar was full, I got this bug.

What OS are you seeing the problem on?

Window

Relevant log output

failed to get segment

App Version: 0.0.6
Arch: x86_64
Platform: win32
Kernel Version: 10.0.22631
OS: Windows_NT
OS Version: 10.0.22631
Models: ggml-large.bin, ggml-medium.bin, ivrit-ai--whisper-large-v2-tuned-ggml-model_2.bin
Default Mode: "C:\\Users\\user\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model_2.bin"

Failed to get segment with ivrit model

What happened?

A bug happened!

Steps to reproduce

step one...
step two...

What OS are you seeing the problem on?

Window

Relevant log output

Error in desktop\src-tauri\src\main.rs at line 59: failed to get segment

Caused by:
    Invalid UTF-8 detected in a string from Whisper. Index: 726.

App Version: 0.0.7
Arch: x86_64
Platform: windows
Kernel Version: 10.0.22631
OS: windows
OS Version: 10.0.22631
Models: ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Mode: "C:\\Users\\1234\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin"

[Bug]: You can’t open the application “vibe” because this application is not supported on this Mac.

What happened?

You can’t open the application “vibe” because this application is not supported on this Mac.

Steps to reproduce

step one...
step two...

What OS are you seeing the problem on?

MacOS

Relevant log output

MAC 14.3 (23D56)

Bug: failed to get segment

What happened?

A bug happened!

Steps to reproduce

Transcription of a 40 minute file, ended with this bug
Continuation of this bug:
#35

What OS are you seeing the problem on?

No response

Relevant log output

"Error in desktop\\src-tauri\\src\\main.rs at line 60: failed to get segment\n\nCaused by:\n    Invalid UTF-8 detected in a string from Whisper. Index: 119, Length: 1."
options: ModelArgs { path: "C:\\Users\\user\\Downloads\\הקלטה בקו של הקהילה שלנו.wav", model: "C:\\Users\\user\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model_2.bin", lang: Some("he"), verbose: false, n_threads: Some(4), init_prompt: Some(""), temperature: Some(0.4) }

App Version: 0.0.7
Arch: x86_64
Platform: windows
Kernel Version: 10.0.22631
OS: windows
OS Version: 10.0.22631
Models: ggml-large.bin, ivrit-ai--whisper-large-v2-tuned-ggml-model_2.bin
Default Mode: "C:\\Users\\user\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model_2.bin"

Bug: failed to open model

What happened?

ניסיתי לתמלל שיעור, ומיד 'שגיאה לא צפויה' וזה השגיאה שעלתה:

options: {
"path": "C:\מהאונקי של זיידי\שואה נויגרשל\הרב מרדכי נויגרשל, חורבן יהדות אירופה - שיעור א - מבוא.mp3",
"model_path": "C:\Users\shechter 0533159559\AppData\Local\github.com.thewh1teagle.vibe\ggml-medium.bin",
"lang": "he",
"verbose": false,
"n_threads": 4,
"init_prompt": "",
"temperature": 0.4
}

Caused by:
0: failed to open model
1: Failed to create a new whisper context.

Location:
core\src\model.rs:49:6

Steps to reproduce

step one...
step two...

What OS are you seeing the problem on?

No response

Relevant log output

options: {
  "path": "C:\\מהאונקי של זיידי\\שואה נויגרשל\\הרב מרדכי נויגרשל, חורבן יהדות אירופה - שיעור א - מבוא.mp3",
  "model_path": "C:\\Users\\shechter 0533159559\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin",
  "lang": "he",
  "verbose": false,
  "n_threads": 4,
  "init_prompt": "",
  "temperature": 0.4
}

Caused by:
   0: failed to open model
   1: Failed to create a new whisper context.

Location:
    core\src\model.rs:49:6
App Version: 1.0.7
Commit Hash: 341bee8566855858aa9f3d1a8c860c3b281d855f
Arch: x86_64
Platform: windows
Kernel Version: 10.0.19045
OS: windows
OS Version: 10.0.19045
Models: ggml-medium.bin, ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Model: Not Found

[Bug]: Windows Defender detects Trojan:Script/Wacatac.B!ml

What happened?

In https://github.com/thewh1teagle/vibe/releases/download/v1.0.6/vibe_1.0.6_x64-setup.exe, downloaded from https://thewh1teagle.github.io/vibe/, Windows Defender detects Trojan:Script/Wacatac.B!ml. I'm not sure if this is a false positive, but maybe should be added to ReadMe.

Steps to reproduce

Downloaded file, executed file.

What OS are you seeing the problem on?

Window

Relevant log output

No response

Transcribe apps audio

Goal

Transcribe system audio / microphone (single or both) and preview it in realtime

Research

Possible to follow approaches in
https://github.com/CapSoftware/Cap

Useful Rust Crate
https://github.com/helmerapp/scap

Perhaps on:
macOS: https://github.com/svtlabs/screencapturekit-rs (screen capture kit)
Graphics.capture on Windows (https://github.com/NiiightmareXD/windows-capture)

macOS app which provides a way to capture system audio using ScreenCaptureKit API
https://github.com/Mnpn/Azayaka

Microsoft answer for how audacity manage to record audio from speakers (TLDR: Windows WASAPI)
https://answers.microsoft.com/en-us/windows/forum/all/how-record-speaker-output-windows-10/251bb695-5170-4a35-a90f-42d9f6f3345a

MacOS sample
https://gist.github.com/thewh1teagle/d02415b9768fd816a780f9af6a3f2bdb

Some platforms provide virtual channels for monitoring (PulseAudio and PipeWire on Linux, WASAPI on Windows, Core Audio on macOS), though not all, and cpal does not expose them
(not sure on Core Audio actually, they might have disabled it or removed it for security reasons)

Loopback added to cpal
RustAudio/cpal#478 (working in windows)

Additional questions:
How to get system audio + microfone at the same time into single stream
Linux?

TLDR

Rust crate cpal provides a way to get audio stream from microfone(s)
On Windows it also provides audio stream from default output device (system audio)
On macOS we should use screencapturekit-rs and provide stream which is equivalent to cpal stream.

If two streams used, then mix them by adding both (simple addition to the sample(s) numbers works)
Push them to whisper in loop
Mixing can introduce synchronization issues (is it's two different sound cards etc) and RtAudio handle that better and possible to use through rtaudio-rs
whisper.cpp expects single channel (mono) 16khz rate and size of 16 bit
Probably need resampling, and converting to mono from stereo is by mean of both.

Simple approach

Record from speakers/mic concurrently and write to file every 5-10 at the best silent position
Write to queue of paths (each item will be one or two paths)
Another task which iterate the queue, merge if needed, and transcribe it.

https://github.com/ggerganov/whisper.cpp/tree/master/examples/stream#sliding-window-mode-with-vad

[Feature Request]: Added a short explanation of the features in the "advanced" option

Describe the feature

Since the software is known to be as simple as possible and suitable for all end users, it is recommended to add concise help when hovering over a feature in the "Advanced" tab.
At the moment it is not clear enough what the "prompt" is used for, and what is the recommended number in "Threads" and so on

[Bug]: Not open in Sonoma Apple Silicon

What happened?

Double click and the app says file its damaged.
Using terminal I try the xattr -cr /path and the I try to open again, nothing happens.

Steps to reproduce

Just try to open.

What OS are you seeing the problem on?

MacOS

Relevant log output

No response

[Bug]: file exported is always txt file

What happened?

When export the transcriptions it export hardcoded txt extension
https://github.com/thewh1teagle/vibe/blob/main/desktop/src/components/TextArea.tsx#L16

Steps to reproduce

export transcription to file

What OS are you seeing the problem on?

No response

Relevant log output

No response

Translation problem

What happened?

In Hebrew, The transcription cancellation button is called "Stop" and not "Cancel".

Steps to reproduce

should be corrected to: "ביטול"

What OS are you seeing the problem on?

Window

Relevant log output

No response

Running the software on an Intel GPU

Describe the feature

I have an Intel processor with GPU, Intel(R) UHD Graphics 620. It has 8 GB RAM that are not used at all. Is there a way for the software 1) to utilize this memory, 2) work on this GPU?

thewh1teagle / vibe Goto Github PK

vibe's Introduction

Vibe - Transcribe on your own!

Screenshots

Features 🌟

Supported platforms 🖥️

Install notes

Contribute 🤝

Community

Roadmap 🛣️

Add translation 🌐

Build 🛠️

I want to know more!

Issue report

Credits

vibe's People

Contributors

Stargazers

Watchers

Forkers

vibe's Issues

What happened?

Steps to reproduce

What OS are you seeing the problem on?

Relevant log output

What happened?

Steps to reproduce

What OS are you seeing the problem on?

Relevant log output

Describe the feature

What happened?

Steps to reproduce

What OS are you seeing the problem on?

Relevant log output

What happened?

Steps to reproduce

What OS are you seeing the problem on?

Relevant log output

What happened?

Steps to reproduce

What OS are you seeing the problem on?

Relevant log output

What happened?

Steps to reproduce

What OS are you seeing the problem on?

Relevant log output

What happened?

Steps to reproduce

What OS are you seeing the problem on?

Relevant log output

What happened?

Steps to reproduce

What OS are you seeing the problem on?

Relevant log output

What happened?

Steps to reproduce

What OS are you seeing the problem on?

Relevant log output

What happened?

Steps to reproduce

What OS are you seeing the problem on?

Relevant log output

What happened?

Steps to reproduce

What OS are you seeing the problem on?

Relevant log output

Describe the feature

What happened?

Steps to reproduce

What OS are you seeing the problem on?

Relevant log output

Describe the feature

Goal

Research

Describe the feature

What happened?

Steps to reproduce

What OS are you seeing the problem on?

Relevant log output

What happened?