GithubHelp home page GithubHelp logo

thewh1teagle / vibe Goto Github PK

View Code? Open in Web Editor NEW
646.0 15.0 44.0 12.54 MB

Transcribe on your own!

Home Page: https://thewh1teagle.github.io/vibe/

License: MIT License

Rust 36.20% HTML 0.40% CSS 0.19% TypeScript 43.85% JavaScript 10.27% Svelte 7.11% PowerShell 1.61% Python 0.37%
ai cross-platform desktop openai rust transcribe whisper

vibe's Introduction

Vibe logo

Vibe - Transcribe on your own!

โŒจ๏ธ Transcribe audio / video offline using OpenAI Whisper

๐Ÿ”— Download Vibe ย  | ย  Give it a Star โญ | ย  Support the project ๐Ÿค


Screenshots

Features ๐ŸŒŸ

  • ๐ŸŒ Transcribe almost every language
  • ๐Ÿ”’ Ultimate privacy: fully offline transcription, no data ever leaves your device
  • ๐ŸŽจ User friendly design
  • ๐ŸŽ™๏ธ Transcribe audio / video
  • ๐Ÿ“‚ Batch transcribe multiple files!
  • ๐Ÿ“ Support SRT, VTT, TXT, HTML, PDF, JSON formats
  • ๐Ÿ‘€ Realtime preview
  • ๐ŸŒ Translate to English from any language
  • ๐Ÿ–จ๏ธ Print transcript directly to any printer
  • ๐Ÿ”„ Automatic updates
  • ๐Ÿ–ฅ๏ธ Optimized for CPU on (Windows / Linux)
  • ๐Ÿ’ป Optimized for GPU (macOS, Windows)
  • ๐ŸŽฎ Optimized for Nvidia GPUs! (see INSTALL.md#nvidia)
  • ๐ŸŽฎ Optimized for AMD GPUs (linux only)! (see INSTALL.md#amd)
  • ๐Ÿ”ง Total Freedom: Customize Models Easily via Settings
  • โš™๏ธ Model arguments for advanced users
  • โณ Transcribe system audio
  • ๐ŸŽค Transcribe from microphone
  • ๐Ÿ–ฅ๏ธ CLI support: Use Vibe directly from the command line interface! (see --help)
  • ๐Ÿ‘ฅ Speaker diarization (Beta)
  • ๐Ÿ“ฑ iOS & Android support (coming soon)
  • ๐Ÿ“ฅ Integrate custom models from your own site: Use vibe://download/?url=<model url>
  • ๐Ÿ“น Choose caption length optimized for videos / reels
  • โšก HTTP API with Swagger docs! (use --server and open http://<host>:3022/docs for docs)

Supported platforms ๐Ÿ–ฅ๏ธ

MacOS Windows Linux

Install notes

See Install.md

Contribute ๐Ÿค

PRs are welcomed! In addition, you're welcome to add translations.

We would like to express our sincere gratitude to all the contributors.

Community

Discord

Roadmap ๐Ÿ›ฃ๏ธ

You can see the roadmap in Vibe-Roadmap

Add translation ๐ŸŒ

  1. Copy en from desktop/src-tauri/locales folder to new directory eg pt-BR (use bcp47 language code)
  2. Change every value in the files there, to the new language and keep the keys as is
  3. create PR / issue in Github

In addition you can add translation to Vibe website by creating new files in the landing/static/locales.

Build ๐Ÿ› ๏ธ

see BUILDING.md

I want to know more!

Medium post

Issue report

You can open new issue and it's recommend to check DEBUG.md first.

Credits

Thanks for tauri.app for making the best apps framework I ever seen

Thanks for wang-bin/avbuild for pre built ffmpeg

Thanks for github.com/whisper.cpp for outstanding interface for the AI model.

Thanks for openai.com for their amazing Whisper model

Thanks for github.com for their support in open source projects, providing infastructure completly free.

And for all the amazing open source frameworks and libraries which this project uses...

vibe's People

Contributors

thewh1teagle avatar ifan24 avatar oleole39 avatar newfla avatar josemoura212 avatar 2bbe avatar chrisns avatar giteshubisz avatar eltociear avatar lovishchhabra avatar oferze avatar

Stargazers

Daniel Moretti V. avatar Nick Petropoulos avatar Daichan avatar MC avatar Tomas Adomaviฤius avatar Divagnz avatar  avatar  avatar Rodolfo avatar deep-soft avatar scotares avatar MarcShawn avatar Akai Omurbek uulu avatar Night Runner avatar Avenom avatar ไธŠๅฎ˜ๆฑŸ avatar hagai luger avatar Sirawat S. avatar Christopher Wong avatar Betsy Dupuis avatar John Goodwin avatar Dan DeFelippi avatar Ryan Maynard avatar Sascha Foerster avatar Andy Berman avatar Razon Yang avatar  avatar Ulrich Diedrichsen avatar Piotr Caล‚us avatar LOAY KHALIFA avatar  avatar  avatar  avatar Tomo avatar  avatar Jake Harvey avatar  avatar  avatar  avatar  avatar Byron Bowerman avatar LucasF.Alonso avatar Benjamin Piouffle avatar Sandeep avatar Adrian avatar Samuel Williams avatar  avatar  avatar  avatar  avatar Yusuf avatar  avatar Trevor Hedley avatar  avatar  avatar Tarnor the Hatchetman avatar  avatar  avatar  avatar T-N-Z avatar Yehuda avatar Robert avatar Fernando de la Rosa avatar  avatar  avatar  avatar  avatar  avatar  avatar Shane avatar  avatar  avatar Champ Ramentio avatar Chris Love avatar  avatar  avatar ZeroChaos avatar Oliver Busch avatar Hsiang-Ju (James) Kai avatar jesuscorner avatar Chikahiro Tokoro avatar David Huang avatar  avatar  avatar  avatar Raymond avatar codemylife avatar  avatar  avatar Sam avatar  avatar Jean-Louis ERRANTE avatar MBG avatar  avatar Mel Massadian avatar Kaddicus avatar Soheyb Samadi avatar Ruud Erie avatar Nick Pappageorge avatar  avatar

Watchers

azu avatar Neustradamus avatar Timur Khamrakulov avatar  avatar Nickolay V. Shmyrev avatar  avatar  avatar  avatar Etienne Monneret avatar  avatar  avatar Yurii Zelenko avatar MrPowley avatar  avatar  avatar

vibe's Issues

[Bug]: Settings not availible on Win11

What happened?

A bug happened!
Cant access settings on Win11
Works on win10

Steps to reproduce

see attached clip

Inspelning.2024-04-19.152301.mp4

Windows version:
Edition Windows 11 Pro
Version 23H2
Installed on โ€Ž2024-โ€Ž03-โ€Ž03
OS build 22631.3447
Experience Windows Feature Experience Pack 1000.22688.1000.0

What OS are you seeing the problem on?

Window

Relevant log output

No response

[Bug]: error and antivirus

Thank you very much for the software.

  1. When I try to run (after I choose a language) and a file it writes to me like this:
    image

  2. AVAST antivirus blocks the installation of the file

failed to get segment

What happened?

ืœืื—ืจ ืชืžืœื•ืœ ื”ืงืœื˜ื” ืฉืœ 50 ื“ืงื•ืช, ื”ืชื•ื›ื ื” ื”ื•ื“ื™ืขื” ืขืœ ืฉื’ื™ืืช failed to get segment.

Steps to reproduce

  1. step one...
  2. step two...

What OS are you seeing the problem on?

ื—ืœื•ื ื•ืช

Relevant log output

failed to get segment

App Version: 0.0.6
Arch: x86_64
Platform: win32
Kernel Version: 10.0.22631
OS: Windows_NT
OS Version: 10.0.22631
Models: ggml-medium.bin, ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Mode: "C:\\Users\\1234\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin"

Wider UI in main window

Describe the feature

ื”ืชื•ื›ื ื” ืชื•ืคืกืช ืจืง ื—ืœืง ืžื”ื—ืœื•ืŸ ื”ืžื•ืงืฆื” ืœื”. ื”ื ื” ืฆื™ืœื•"ืž:
image
ื’ื ื”ืชืžืœื•ืœ ืชื•ืคืก ืจืง ื—ืœืง ืžื”ื—ืœื•ืŸ, ืžื” ืฉื’ื•ืจื ืœื™ื•ืชืจ ืฉื•ืจื•ืช, ื›ืžื•ื‘ืŸ. ื”ืื ื ื™ืชืŸ ืœืชืงืŸ ื–ืืช?
ืชื•ื“ื” ืจื‘ื” ืขืœ ื”ืชื•ื›ื ื” ื”ืžื“ื”ื™ืžื” ื”ื–ื•!
ื”ืขื‘ืจืชื™ ืœื—ื‘ืจื™ื!

spelling and sync problems

What happened?

The software makes lots of spelling mistakes in Hebrew and occasionally outputs SRT files that are not in sync.

Steps to reproduce

  1. step one...
  2. step two...

What OS are you seeing the problem on?

No response

Relevant log output

No response

[Feature Request]: Title: Add size information and estimated download time for OpenAI model.

Proposing the addition of two helpful details to the installation process of Vibe:

  1. View model size: Before starting the Vibe installation, view the size of the OpenAI model and additional packages that will be required to download.

  2. Download Time Estimate: During download, provide an estimate of remaining download time.

This helps users choose installation times and feel more comfortable.Thank you for considering this feature request.

Bug: display language

What happened?

ื‘ื”ืคืขืœื” ืจืืฉื•ื ื” ืฉืคืช ื”ืชื•ื›ื ื” (ืœื ืฉืคืช ื”ืชืžืœื•ืœ) ื”ื™ืชื” ื‘ืขื‘ืจื™ืช, ื•ื‘ื”ืคืขืœื•ืช ื”ื‘ืื•ืช ื”ืฉืชื ื” ืœืื ื’ืœื™ืช.

Steps to reproduce

ืฉื™ืฉืืจ ื‘ืขื‘ืจื™ืช, ืื• ืฉืชื”ื™ื” ืืคืฉืจื•ืช ืœื‘ื—ื•ืจ ืืช ืฉืคืช ื”ืชื•ื›ื ื”.

What OS are you seeing the problem on?

ื•ื•ื™ื ื“ื•ืก 11

Relevant log output

App Version: 0.0.6
Arch: x86_64
Platform: win32
Kernel Version: 10.0.22631
OS: Windows_NT
OS Version: 10.0.22631
Models: ggml-medium.bin, ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Mode: "C:\\Users\\1234\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin"

[Bug]: Blue screen when transcribing a file

What happened?

I ran vibe.exe with an .opus file probably about 1hr in length, with large3 model, the PC randomly restarted. Then PC wouldn't boot until I unplugged/replugged all the monitor and USB cables, and it repaired some windows update.
This may be an issue on my end. It could be faulty hardware. This is the first time my PC has done this, I haven't had this problem with WhisperGUI 0.1, stable diffusion, games or encoding.

I successfully recreated the issue by trying to transcribe another file.
I posted the results of the WinDBG from the MEMORY.DMP file.

Ryzen 5700G, ASUS B450 Tomahawk MAX, 2x16GB G.SKILL 3200mhz, ASUS ROG STRIX RTX 3060 OC (12GB), Seasonic Focus+ Gold 850W

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.

IMAGE_NAME: nvlddmkm.sys

Steps to reproduce

transcribe 1hr long audio file, GPU freezes

What OS are you seeing the problem on?

Window

Relevant log output

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.

IMAGE_NAME:  nvlddmkm.sys

[Bug]: Wrong transcriptions times in Hebrew

What happened?

When using SRT / VTT and the language is hebrew the transcriptions times (for each sentence) are wrong
maybe a bug in whisper.cpp

Steps to reproduce

What OS are you seeing the problem on?

No response

Relevant log output

No response

[Bug]: failed to transcode mp3 file

What happened?

A bug happened!

Steps to reproduce

  1. step one...
  2. step two...

What OS are you seeing the problem on?

No response

Relevant log output

options: {
  "path": "/Users/slim/Desktop/13940-21.05.2024-ITEMA_23748018-2024F22805S0142-22.mp3",
  "model_path": "/Users/slim/Library/Application Support/github.com.thewh1teagle.vibe/ggml-medium.bin",
  "lang": "fr",
  "verbose": false,
  "n_threads": 4,
  "init_prompt": "",
  "temperature": 0.4
}

Caused by:
    Invalid data found when processing input

Location:
    core/src/audio/encoder.rs:175:9
App Version: 1.0.7
Commit Hash: 99ae746dc02135ad7a27ec0f9adafe016b8c96e4
Arch: aarch64
Platform: macos
Kernel Version: 14.0.0
OS: macos
OS Version: 14.0.0
Models: ggml-medium.bin
Default Model: ggml-medium.bin"

[Bug]: invalid discord link

What happened?

invalid discord link

Steps to reproduce

  1. step one...
  2. step two...

What OS are you seeing the problem on?

No response

Relevant log output

No response

Bug: failed to open model

What happened?

A bug happened!

Steps to reproduce

  1. step one...
  2. step two...

What OS are you seeing the problem on?

No response

Relevant log output

failed to open model

App Version: 0.0.6
Arch: x86_64
Platform: win32
Kernel Version: 10.0.19045
OS: Windows_NT
OS Version: 10.0.19045
Models: ggml-medium.bin
Default Mode: "C:\\Users\\ื”ืœืœ ืคื™ืฉืจืžืŸ\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin"

[Bug]: vibe: error while loading shared libraries: libavutil.so.56: cannot open shared object file: No such file or directory

What happened?

Vibe 1.0.7 crashes when starting it on Ubuntu 23.10.

Steps to reproduce

  1. Run Ubuntu 23.10
  2. Install Vibe deb package: vibe_1.0.7_amd64.deb
  3. Run it
  4. Bam: vibe: error while loading shared libraries: libavutil.so.56: cannot open shared object file: No such file or directory

What OS are you seeing the problem on?

Linux

Relevant log output

vibe: error while loading shared libraries: libavutil.so.56: cannot open shared object file: No such file or directory

Problem seems to be caused by wrong library versions expected by vibe:

Lib Expected version Installed version
libavcodec.so 58 60
libswresample.so 3 4
libswscale.so 5 7
libavdevice.so 58 60
libavfilter.so 7 9
libavformat.so 58 60
libavutil.so 56 58

Bug: "Vibe has stopped working"

What happened?

After selecting an .mp3 file, when I click Transcribe, I immediately get the "Vibe has stopped working" window and have to close the program.
stopped working

Steps to reproduce

  1. Clicked "Select Audio File"
  2. Selected an .mp3 file
  3. Clicked "Transcribe"

What OS are you seeing the problem on?

Window

Relevant log output

App Version: 1.0.5
Commit Hash: 3b4d74fa6b8f3171078df97f14add4a7463e7624
Arch: x86_64
Platform: windows
Kernel Version: 10.0.19045
OS: windows
OS Version: 10.0.19045
Models: ggml-medium.bin
Default Model: "C:\\Users\\Tom\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin"

Bug: download model failed

What happened?

A bug happened!

Steps to reproduce

  1. step one...
  2. step two...

What OS are you seeing the problem on?

No response

Relevant log output

Failed to get content length from 'https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.bin?download=true'
App Version: 1.0.1
Commit Hash: e27ce1b4317952a4856471a4d9349ca77aeee686
Arch: x86_64
Platform: windows
Kernel Version: 10.0.22631
OS: windows
OS Version: 10.0.22631
Models: 
Default Model: Not Found

[Feature Request]: Speaker labels (Diarization)

Goal

Provide speaker labels along with the transcriptions (eg. Speaker1: ..., Speaker2: ...)
Do it in the same time when transcribing efficient and lightweight.

Research

https://github.com/wq2012/awesome-diarization

Possible ways:
Use c/c++ diarization libs in Rust using bindgen
Replicate pyannote-audio to Rust with tch-rs

Use ONNX runtime with ort

pykeio/ort#208

pyannote/pyannote-audio#1322

Best combination:
pyannote-segmentation-30
WespeakerVoxcelebResnet34LM

Bug[ivrit]: failed to get segment

What happened?

A bug happened!

Steps to reproduce

  1. step one...
  2. step two...

What OS are you seeing the problem on?

No response

Relevant log output

failed to get segment

App Version: 0.0.6
Arch: x86_64
Platform: win32
Kernel Version: 10.0.19041
OS: Windows_NT
OS Version: 10.0.19041
Models: ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Mode: "C:\\Users\\shayh\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin"

Bug: Invalid data found when processing input

What happened?

A bug happened!

Steps to reproduce

  1. step one...
  2. step two...

What OS are you seeing the problem on?

Window

Relevant log output

"Error in desktop\\src-tauri\\src\\cmd.rs at line 71: Invalid data found when processing input"
options: ModelArgs { path: "C:\\Users\\1234\\Desktop\\short.wav", model: "C:\\Users\\1234\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin", lang: Some("he"), verbose: false, n_threads: Some(4), init_prompt: Some(""), temperature: Some(0.4) }

App Version: 0.0.9
Arch: x86_64
Platform: windows
Kernel Version: 10.0.22631
OS: windows
OS Version: 10.0.22631
Models: ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Mode: "C:\\Users\\1234\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin"

Bug: Incorrect "NaN" end timestamps

What happened?

Thanks for the update! But the new version 1.0.6 has this problem: The model is generating incorrect end timestamps in the output. The end timestamps are appearing as "NaN:NaN:NaN,NaN" instead of the expected time format (e.g., "0:00:10,500").
I have tried using the ggml-medium.bin and ggml-large-v3.bin models and tried a few different videos, all resulting in the same invalid end timestamps.

Here is an example:

1
0:00:00,000 --> NaN:NaN:NaN,NaN
Some text here

2
0:00:04,160 --> NaN:NaN:NaN,NaN
Some text here

Steps to reproduce

  1. Transcribe any video
  2. Change the display option from Text to the SRT
  3. The end timestamp for each line is shown as "NaN:NaN:NaN,NaN"

What OS are you seeing the problem on?

Window

Relevant log output

App Version: 1.0.6
Commit Hash: cb51db5bc8ae1d800b3f7af9faa780552808710b
Arch: x86_64
Platform: windows
Kernel Version: 10.0.22631
OS: windows
OS Version: 10.0.22631
Models: ggml-large-v3.bin, ggml-medium.bin
Default Model: "ggml-medium.bin"

Bug: the vibe application is crashing

What happened?

A bug happened! I added steps to reproduce.
please help i need to transcribe a large file
[email protected]

Steps to reproduce

  1. install the app
  2. download the new hebrew model
  3. move it to the model folder
  4. open any size of audio file from the app and click transcribe
    Result:
    the app just crashes and i get the following error message:
    Problem Report for vibe
    vibe quit unexpectedly.
    Click Reopen to open the application again. This report will be sent
    automatically to Apple.

Comments
Show Details OK Reopen

What OS are you seeing the problem on?

No response

Relevant log output

App Version: 0.0.6
Arch: x86_64
Platform: darwin
Kernel Version: 13.1.0
OS: Darwin
OS Version: 13.1.0
Models: ivrit-ai--whisper-large-v2-tuned-ggml-model.bin, ggml-medium.bin
Default Mode: ggml-medium.bin"

Bug: failed to create whisper context

What happened?

A bug happened!

  • [ ]

  • [ ]

Steps to reproduce

  1. step one...
  2. step two...

What OS are you seeing the problem on?

No response

Relevant log output

"Error in desktop\\src-tauri\\src\\main.rs at line 60: failed to open model\n\nCaused by:\n    Failed to create a new whisper context."
options: ModelArgs { path: "C:\\Users\\ืžื ื”ืœ ื•ืžื‘ืงืจ\\Downloads\\AA.mp3", model: "C:\\Users\\ืžื ื”ืœ ื•ืžื‘ืงืจ\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin", lang: Some("he"), verbose: false, n_threads: Some(4), init_prompt: Some(""), temperature: Some(0.4) }

App Version: 0.0.7
Arch: x86_64
Platform: windows

"Error in desktop\\src-tauri\\src\\main.rs at line 60: failed to open model\n\nCaused by:\n    Failed to create a new whisper context."
options: ModelArgs { path: "C:\\Users\\ืžื ื”ืœ ื•ืžื‘ืงืจ\\Downloads\\AA.mp3", model: "C:\\Users\\ืžื ื”ืœ ื•ืžื‘ืงืจ\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin", lang: Some("he"), verbose: false, n_threads: Some(4), init_prompt: Some(""), temperature: Some(0.4) }
Kernel Version: 10.0.19045
OS: windows
OS Version: 10.0.19045
Models: ggml-medium.bin, ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Mode: "C:\\Users\\ืžื ื”ืœ ื•ืžื‘ืงืจ\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin"

Bug: Vibe does not detect ggml-large-v3.bin

What happened?

I did replaced the medium model with the large model to save space. I have chosen the large model in the model customisation under setting, but Vibe is still default to loading the medium Model.

Steps to reproduce

  1. Download the Large Model
  2. Copy and Paste the Large Model
  3. Remove the Medium Model

What OS are you seeing the problem on?

No response

Relevant log output

options: ModelArgs { path: "/Users/**Users**/Downloads/Video.mp4", model: "/Users/**Users**/Library/Application Support/github.com.thewh1teagle.vibe/ggml-medium.bin", lang: Some("en"), verbose: false, n_threads: Some(4), init_prompt: Some(""), temperature: Some(0.4) }
App Version: 1.0.2
Commit Hash: 6ba4a50a11e1b425575f2cbecfc505920b737a77
Arch: aarch64
Platform: macos
Kernel Version: 14.4.1
OS: macos
OS Version: 14.4.1
Models: ggml-large-v3.bin
Default Model: ggml-medium.bin"

[Bug]: Crash at launch on linux

What happened?

Instant crash on linux when launching.
This is what I read from the console:

vibe: error while loading shared libraries: libopenblas.so.0: cannot open shared object file: No such file or directory

You should add this requirement on readme file:

sudo apt-get install libopenblas-dev

Steps to reproduce

Launch vibe on ubuntu 22

What OS are you seeing the problem on?

Linux

Relevant log output

No response

Bug: failed to open model

What happened?

A bug happened!

Steps to reproduce

  1. step one...
  2. step two...

What OS are you seeing the problem on?

No response

Relevant log output

failed to open model

App Version: 0.0.6
Arch: x86_64
Platform: win32
Kernel Version: 10.0.19045
OS: Windows_NT
OS Version: 10.0.19045
Models: ggml-medium.bin, ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Mode: "C:\\Users\\ื”ืœืœ ืคื™ืฉืจืžืŸ\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin"

Transcribe several files one by one

Describe the feature

ืืฉืžื— ืฉืชื”ื™ื” ืืคืฉืจื•ืช ืœื‘ื—ื™ืจืช ื›ืžื” ืงื‘ืฆื™ื ื•ืชืžืœื•ืœื. ื›ืจื’ืข ื ื™ืชืŸ ืœื‘ื—ื•ืจ ืงื•ื‘ืฅ ื ื•ืกืฃ, ืจืง ืœืื—ืจ ืฉื”ืงื•ื“ื ืกื™ื™ื ืœืจื•ืฅ.
ืืฉืžื— ื’ื ืฉื”ื™ื™ืฆื•ื ืœืงื•ื‘ืฅ ื™ื”ื™ื” ืื•ื˜ื•ืžื˜ื™, ืขื ืฉื ืงื•ื‘ืฅ ื”ืื•ื“ื™ื• ื”ืžืชื•ืžืœืœ.

[Bug]: Reliable format across TS and Rust

What happened?

Every platform has its own VSCode with its own settings, so every few days the format of some of the files completely changes somehow (tabs/spaces, etc.). Rust makes it easy using rustfmt, but TypeScript/JSON, etc., don't.

Steps to reproduce

_

What OS are you seeing the problem on?

No response

Relevant log output

_

Bug: hangs in the end of transcription

What happened?

ื‘ื ืกื™ื•ืŸ ืชืžืœื•ืœ ื‘ื’ืจืกื” 9, (ื‘ื’ืจืกื” 8 ืชื•ืžืœืœ ืžืฆื•ื™ื™ืŸ) ื”ืชื•ื›ื ื” ื ืกื’ืจืช ืœืื—ืจ ื›ืžื” ืฉื ื™ื•ืช:
image

Steps to reproduce

  1. step one...
  2. step two...

What OS are you seeing the problem on?

Window

Relevant log output

App Version: 0.0.9
Arch: x86_64
Platform: windows
Kernel Version: 10.0.22631
OS: windows
OS Version: 10.0.22631
Models: ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Mode: "C:\\Users\\1234\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin"

Bug: Vibe - failed to get segment

What happened?

A bug happened!
failed to get segment

Steps to reproduce

  1. step one...
  2. step two...

What OS are you seeing the problem on?

Window

Relevant log output

failed to get segment

App Version: 0.0.4
Arch: x86_64
Platform: win32
Kernel Version: 10.0.19045
OS: Windows_NT
OS Version: 10.0.19045
Models: ggml-medium.bin
Default Mode: "C:\\Users\\hatzerh\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin"

Bug: failed to get segment

What happened?

A bug happened!

Steps to reproduce

An attempt to transcribe a 20-minute file.
Files that are a few minutes old are successfully transcribed!

What OS are you seeing the problem on?

Window

Relevant log output

"Error in desktop\\src-tauri\\src\\main.rs at line 60: failed to get segment\n\nCaused by:\n    Invalid UTF-8 detected in a string from Whisper. Index: 428, Length: 1."
options: ModelArgs { path: "C:\\Users\\user\\Downloads\\ืคื•ื“ืงืืกื˜ื™ื\\ื›ืœ ืžื” ืฉืจืฆื™ืช ืœื“ืขืช ืขืœ ื”ื›ืœ ๐Ÿฑ_๐Ÿ’ป ืกื™ื™ื‘ืจืกื™ื™ื‘ืจ ืข07ืค10(2).mp3", model: "C:\\Users\\user\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model_2.bin", lang: Some("he"), verbose: false, n_threads: Some(4), init_prompt: Some(""), temperature: Some(0.4) }

App Version: 0.0.7
Arch: x86_64
Platform: windows
Kernel Version: 10.0.22631
OS: windows
OS Version: 10.0.22631
Models: ggml-large.bin, ivrit-ai--whisper-large-v2-tuned-ggml-model_2.bin
Default Mode: "C:\\Users\\user\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model_2.bin"

[Bug]: Build for osx x86 failed

What happened?

A bug happened!

Steps to reproduce

  1. step one...
  2. step two...

What OS are you seeing the problem on?

No response

Relevant log output

No response

Bug: Crash on loading model

What happened?

After I hit "Transcribe", the app crashes.

Since, at least from what I've noticed, there are no crash logs, here's what the cmd log shows:

log
C:\Windows\System32>[2024-05-22T01:57:23Z DEBUG vibe_desktop] Vibe App Running
[2024-05-22T01:57:24Z DEBUG vibe_desktop::setup] webview version: 125.0.2535.51
[2024-05-22T01:57:33Z DEBUG vibe::model] Transcribe called with {
      "path": "D:\\2023-11-30 11-25-58.mkv",
      "model_path": "C:\\Users\\AutumnLeaf\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin",
      "lang": "en",
      "verbose": false,
      "n_threads": 4,
      "init_prompt": "",
      "temperature": 0.4
    }
[2024-05-22T01:57:33Z DEBUG vibe::audio] input is D:\2023-11-30 11-25-58.mkv and output is C:\Users\AUTUMN~1\AppData\Local\Temp\.tmptp0lsV.wav
[2024-05-22T01:57:33Z DEBUG vibe::audio::encoder] decoder channel layout is 2
[2024-05-22T01:57:33Z DEBUG vibe::audio::encoder] +-----------+
    |    in     |default--[48000Hz fltp:stereo]--auto_aresample_0:default
    | (abuffer) |
    +-----------+

                                                       +---------------+
    Parsed_anull_0:default--[16000Hz s16:mono]--default|      out      |
                                                       | (abuffersink) |
                                                       +---------------+

                                                         +----------------+
    auto_aresample_0:default--[16000Hz s16:mono]--default| Parsed_anull_0 |default--[16000Hz s16:mono]--out:default
                                                         |    (anull)     |
                                                         +----------------+

                                              +------------------+
    in:default--[48000Hz fltp:stereo]--default| auto_aresample_0 |default--[16000Hz s16:mono]--Parsed_anull_0:default
                                              |   (aresample)    |
                                              +------------------+


[2024-05-22T01:57:33Z DEBUG vibe::audio] wav reader read from "C:\\Users\\AUTUMN~1\\AppData\\Local\\Temp\\.tmptp0lsV.wav"
[2024-05-22T01:57:33Z DEBUG vibe::audio] parsing C:\Users\AUTUMN~1\AppData\Local\Temp\.tmptp0lsV.wav
[2024-05-22T01:57:33Z DEBUG vibe::model] open model...

Steps to reproduce

  1. Open Vibe
  2. Attempt to Transcribe
  3. Crash when trying to load model

Tried on Windows 10 first, and now on Windows 11. Also, my brother's pc (Windows 10) for some reason has no issue with Vibe.

Could it be because I've got an old cpu (i7-3770 vs my brother's Ryzen 5 2600) or motherboard or something like that that there's some conflict happening? In case it helps, I am able to run Stable Diffusion locally without issues (both Automatic 1111 and ComfyUI).

I should also note that the user shown on the logs of Vibe is AUTUMN~1 instead of AutumnLeaf. Since I've seen a few past issues here due to hebraic user names, perhaps the tilde (which shouldn't even be there) could be a cause.

What OS are you seeing the problem on?

Window

logs

Relevant log output

App Version: 1.0.7
Commit Hash: 99ae746dc02135ad7a27ec0f9adafe016b8c96e4
Arch: x86_64
Platform: windows
Kernel Version: 10.0.22631
OS: windows
OS Version: 10.0.22631
Models: ggml-medium.bin
Default Model: "C:\\Users\\AutumnLeaf\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin"

Bug: failed to create whisper context

What happened?

A bug happened!

Steps to reproduce

  1. I added a file
  2. I clicked on "Transcribe"
  3. In practice it threw an error right at the start

What OS are you seeing the problem on?

Window

Relevant log output

"Error in desktop\\src-tauri\\src\\main.rs at line 60: failed to open model\n\nCaused by:\n    Failed to create a new whisper context."
options: ModelArgs { path: "C:\\Users\\ืžืฉืชืžืฉ\\Music\\ืฉื™ืจื™ ื™ื•ืกื™ ื’ืจื™ืŸ\\song_04.mp3", model: "C:\\Users\\ืžืฉืชืžืฉ\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin", lang: Some("auto"), verbose: false, n_threads: Some(4), init_prompt: Some(""), temperature: Some(0.4) }

App Version: 0.0.8
Arch: x86_64
Platform: windows
Kernel Version: 10.0.26100
OS: windows
OS Version: 10.0.26100
Models: ggml-medium.bin, ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Mode: "C:\\Users\\ืžืฉืชืžืฉ\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin"

[Bug]: GPU seems not working on M2 macbook pro

What happened?

The transcribing is extremely slow on my M2 Macbook Pro.

Steps to reproduce

I am using an M2 chip Macbook Pro. The transcribing is quite slow. While I check the GPU usage, it shows 0%.
Is this normal?

image

I did not find any setting related to GPU.

What OS are you seeing the problem on?

MacOS

Relevant log output

App Version: 1.0.6
Commit Hash: cb51db5bc8ae1d800b3f7af9faa780552808710b
Arch: aarch64
Platform: macos
Kernel Version: 14.3.0
OS: macos
OS Version: 14.3.0
Models: ggml-large-v2.bin
Default Model: Not Found

Bug: failed to get segment

What happened?

A bug happened!

Steps to reproduce

I transcribed a large file, at the end of the transcription, after the progress bar was full, I got this bug.

What OS are you seeing the problem on?

Window

Relevant log output

failed to get segment

App Version: 0.0.6
Arch: x86_64
Platform: win32
Kernel Version: 10.0.22631
OS: Windows_NT
OS Version: 10.0.22631
Models: ggml-large.bin, ggml-medium.bin, ivrit-ai--whisper-large-v2-tuned-ggml-model_2.bin
Default Mode: "C:\\Users\\user\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model_2.bin"

Failed to get segment with ivrit model

What happened?

A bug happened!

Steps to reproduce

  1. step one...
  2. step two...

What OS are you seeing the problem on?

Window

Relevant log output

Error in desktop\src-tauri\src\main.rs at line 59: failed to get segment

Caused by:
    Invalid UTF-8 detected in a string from Whisper. Index: 726.

App Version: 0.0.7
Arch: x86_64
Platform: windows
Kernel Version: 10.0.22631
OS: windows
OS Version: 10.0.22631
Models: ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Mode: "C:\\Users\\1234\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model.bin"

Bug: failed to get segment

What happened?

A bug happened!

Steps to reproduce

Transcription of a 40 minute file, ended with this bug
Continuation of this bug:
#35

What OS are you seeing the problem on?

No response

Relevant log output

"Error in desktop\\src-tauri\\src\\main.rs at line 60: failed to get segment\n\nCaused by:\n    Invalid UTF-8 detected in a string from Whisper. Index: 119, Length: 1."
options: ModelArgs { path: "C:\\Users\\user\\Downloads\\ื”ืงืœื˜ื” ื‘ืงื• ืฉืœ ื”ืงื”ื™ืœื” ืฉืœื ื•.wav", model: "C:\\Users\\user\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model_2.bin", lang: Some("he"), verbose: false, n_threads: Some(4), init_prompt: Some(""), temperature: Some(0.4) }

App Version: 0.0.7
Arch: x86_64
Platform: windows
Kernel Version: 10.0.22631
OS: windows
OS Version: 10.0.22631
Models: ggml-large.bin, ivrit-ai--whisper-large-v2-tuned-ggml-model_2.bin
Default Mode: "C:\\Users\\user\\AppData\\Local\\github.com.thewh1teagle.vibe\\ivrit-ai--whisper-large-v2-tuned-ggml-model_2.bin"

Bug: failed to open model

What happened?

ื ื™ืกื™ืชื™ ืœืชืžืœืœ ืฉื™ืขื•ืจ, ื•ืžื™ื“ 'ืฉื’ื™ืื” ืœื ืฆืคื•ื™ื”' ื•ื–ื” ื”ืฉื’ื™ืื” ืฉืขืœืชื”:

options: {
"path": "C:\ืžื”ืื•ื ืงื™ ืฉืœ ื–ื™ื™ื“ื™\ืฉื•ืื” ื ื•ื™ื’ืจืฉืœ\ื”ืจื‘ ืžืจื“ื›ื™ ื ื•ื™ื’ืจืฉืœ, ื—ื•ืจื‘ืŸ ื™ื”ื“ื•ืช ืื™ืจื•ืคื” - ืฉื™ืขื•ืจ ื - ืžื‘ื•ื.mp3",
"model_path": "C:\Users\shechter 0533159559\AppData\Local\github.com.thewh1teagle.vibe\ggml-medium.bin",
"lang": "he",
"verbose": false,
"n_threads": 4,
"init_prompt": "",
"temperature": 0.4
}

Caused by:
0: failed to open model
1: Failed to create a new whisper context.

Location:
core\src\model.rs:49:6

Steps to reproduce

  1. step one...
  2. step two...

What OS are you seeing the problem on?

No response

Relevant log output

options: {
  "path": "C:\\ืžื”ืื•ื ืงื™ ืฉืœ ื–ื™ื™ื“ื™\\ืฉื•ืื” ื ื•ื™ื’ืจืฉืœ\\ื”ืจื‘ ืžืจื“ื›ื™ ื ื•ื™ื’ืจืฉืœ, ื—ื•ืจื‘ืŸ ื™ื”ื“ื•ืช ืื™ืจื•ืคื” - ืฉื™ืขื•ืจ ื - ืžื‘ื•ื.mp3",
  "model_path": "C:\\Users\\shechter 0533159559\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin",
  "lang": "he",
  "verbose": false,
  "n_threads": 4,
  "init_prompt": "",
  "temperature": 0.4
}

Caused by:
   0: failed to open model
   1: Failed to create a new whisper context.

Location:
    core\src\model.rs:49:6
App Version: 1.0.7
Commit Hash: 341bee8566855858aa9f3d1a8c860c3b281d855f
Arch: x86_64
Platform: windows
Kernel Version: 10.0.19045
OS: windows
OS Version: 10.0.19045
Models: ggml-medium.bin, ivrit-ai--whisper-large-v2-tuned-ggml-model.bin
Default Model: Not Found

Transcribe apps audio

Goal

Transcribe system audio / microphone (single or both) and preview it in realtime

Research

Possible to follow approaches in
https://github.com/CapSoftware/Cap

Useful Rust Crate
https://github.com/helmerapp/scap

Perhaps on:
macOS: https://github.com/svtlabs/screencapturekit-rs (screen capture kit)
Graphics.capture on Windows (https://github.com/NiiightmareXD/windows-capture)

macOS app which provides a way to capture system audio using ScreenCaptureKit API
https://github.com/Mnpn/Azayaka

Microsoft answer for how audacity manage to record audio from speakers (TLDR: Windows WASAPI)
https://answers.microsoft.com/en-us/windows/forum/all/how-record-speaker-output-windows-10/251bb695-5170-4a35-a90f-42d9f6f3345a

MacOS sample
https://gist.github.com/thewh1teagle/d02415b9768fd816a780f9af6a3f2bdb

Some platforms provide virtual channels for monitoring (PulseAudio and PipeWire on Linux, WASAPI on Windows, Core Audio on macOS), though not all, and cpal does not expose them
(not sure on Core Audio actually, they might have disabled it or removed it for security reasons)

Loopback added to cpal
RustAudio/cpal#478 (working in windows)

Additional questions:
How to get system audio + microfone at the same time into single stream
Linux?

TLDR

Rust crate cpal provides a way to get audio stream from microfone(s)
On Windows it also provides audio stream from default output device (system audio)
On macOS we should use screencapturekit-rs and provide stream which is equivalent to cpal stream.

If two streams used, then mix them by adding both (simple addition to the sample(s) numbers works)
Push them to whisper in loop
Mixing can introduce synchronization issues (is it's two different sound cards etc) and RtAudio handle that better and possible to use through rtaudio-rs
whisper.cpp expects single channel (mono) 16khz rate and size of 16 bit
Probably need resampling, and converting to mono from stereo is by mean of both.

Simple approach

Record from speakers/mic concurrently and write to file every 5-10 at the best silent position
Write to queue of paths (each item will be one or two paths)
Another task which iterate the queue, merge if needed, and transcribe it.

https://github.com/ggerganov/whisper.cpp/tree/master/examples/stream#sliding-window-mode-with-vad

[Bug]: Not open in Sonoma Apple Silicon

What happened?

Double click and the app says file its damaged.
Using terminal I try the xattr -cr /path and the I try to open again, nothing happens.

Steps to reproduce

Just try to open.

What OS are you seeing the problem on?

MacOS

Relevant log output

No response

Translation problem

What happened?

In Hebrew, The transcription cancellation button is called "Stop" and not "Cancel".

Steps to reproduce

should be corrected to: "ื‘ื™ื˜ื•ืœ"

What OS are you seeing the problem on?

Window

Relevant log output

No response

Running the software on an Intel GPU

Describe the feature

I have an Intel processor with GPU, Intel(R) UHD Graphics 620. It has 8 GB RAM that are not used at all. Is there a way for the software 1) to utilize this memory, 2) work on this GPU?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.