const-me / whisper Goto Github PK

View Code? Open in Web Editor NEW

7.9K 84.0 677.0 4.41 MB

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model

License: Mozilla Public License 2.0

C++ 68.27% C 21.08% HLSL 4.86% C# 5.49% Batchfile 0.10% PowerShell 0.20%

whisper's Issues

Where are the releases?

Sorry to ask but could not find releases link to download for windows executable. Is there any link to it on read.me?

Add option for batch transcribing multiple files?

Hi.
Thanks for the great work,
GPU acceleration extremely faster decodes text, than official ggerganov project which doesnt support gpu yet.
Could you please implement option for multiple subtitle generation, to program allow batch import files?
Thank you very much in advance!

Request: Include binary for whispercpp beta v1.1.0

Great work! I did not expect this program to be so fast. One thing that can make it better: to include an additional WhisperDesktop.exe with the updates in the official whispercpp beta v1.1.0. This beta improves the transcription of files that previously got stuck in the same sentence. I hope that this is a reasonable request.

Feature request: --output-file option for cli version

It would be handy for processing files in batches to have a --output-file option (like in Georgi Gerganov's tool) to specify a path+prefix for output files.

I tried it on a long form video pocast which is about 4.5 hours long and foud lots of punctuaiton missing

Dear Sir, I am really impressive with your application.
Today I tried to generate txt transcript for a 4.5 hours podcast of Huberman Lab and it was really fast.
I checked the transcript and most of the punctuations are good except when it come to the end of the text.
Lots of puncations are missing. Maybe I should split the video file into two and then try again. Thanks!

Mojibake in Debug Window

Like what the upstream issue ggerganov/whisper.cpp#399 issued, when using -pc output in the terminal, some characters cannot be displayed normally.
So maybe there could be an opinion to disable the color print function?

App crash on model loading

I try to load model and app crash without any message. I tried with different models (tiny, small, medium, large) with no result. I use Win10x64. GPU - GTX 970 4 Gb, CPU - i7 3930K, RAM - 28 Gb. Also I tested app on VM with the same result.

There will be a phenomenon that the subtitles are identified and repeated

Is it possible to operate other application software that occupies the CPU or GPU during the conversion?
I am using the large.bin model.....
Occasionally such problems arise.

Application Audio Capture

Hello! Thanks for putting together this project, performance on my (non Ti) 1080 seems remarkably good!

One thing I'd love to see is the ability to stream capture audio from an application rather than from a microphone. Something like https://github.com/bozbez/win-capture-audio ?

Cheers!

Feature request: stop transcribing button

What about a "stop transcribing" button?

Multilingual recognition in one file

Is possible to make multilingual recognition in one record like in a python version? I mean, when in one audio people sometimes talk in different languages, python version with model large-v2 decode all languages at once, but in this C++ implementation Whisper write [Spanish] or [Different language] etc.

Plans for a Simple C interface ?

Are there any plans for a simple C Interface, so folks can wrap your library from Go, Python etc ?

transcription does not terminate at eof

first, thank you for your great work.
In certain cases the transcription process does not terminate at the end of the file and does
not write to the txt/vtt/srt file. My testfile was a 48kHz/stereo/mp3 file. If I convert it to
16bit wav as expected by the original whisper.cpp it works fine.

Request: Implement llama.cpp as a windows app with D3D acceleration

in the "final words" section of your readme file you talk about the possibility of implementing GPT-2. Now that llama.cpp is mature enough, I think It would be great to have a D3D accelerated version, if you have the time. It seems to use the same ggml.c base with other files to support the quantized LLaMa models.

Thanks for considering it!

-ml/--max-len don't seem to work.

Works in ggerganov/whisper.cpp with exact same command line

I get this error:

eFullParamsFlags.TokenTimestamps flag is not supported in streaming mode
Unable to process audio: Not implemented

Possible methods to eliminate GPU usage

Hi there,
I'd first like to thank you for writing this code. It's been a godsend to use it as a baseplate, and implementing loopback support via WASAPI was certainly not as bad as it would've been with other codebases.

However, I'm trying to get the model to use less GPU. I understand it's using compute shaders but this method causes noticeable lag when using the GPU for other things such as games. Is there any method for reducing load on the GPU short of fully switching to CPU?

Thanks

Is there a problem with the GPU call logic

Excuse me, why does the application still call the integrated graphics card of the system to work when the P100 graphics card has been installed

voice recognition problem

Submit a recognition problem. If a video has only human voices at the beginning and end, and no voice in the middle, the human voice at the end will not be recognized correctly during recognition, and it will be directly recognized as no voice.

Linux port

Thanks for your great job!
Any chances to see porting on linux in roadmap?

Performance notes on AMD (RDNA2)

Not really an issue but I'm giving my thoughts on RDNA2 performance. In short - It's great. On my 6900xt it looks around 7.7x realtime for the large model and for the medium one - 10.5x

Transcription results produce very short lines of text

I'm using the latest build (1.7) and transcribing Mandarin audio. The output I've been getting from over 10 files fails to include any punctuation and each line is usually ten characters or less. Any idea why this is happening?

"Hybrid" mode is not working

Other than "GPU" mode, I could not run other modes such as "Hybrid". Throw error as missing some "DLLs" for both other two modes. How can we fix this and what are the main purposes of these modes as could not find any explanation for them? By the way, very good project and results are great!

feature request: Transcribe by different model one by one.

By tests, ggml-medium.bin and ggml-large.bin each have their own wins and losses.
So I'd like to get two output txt files just one click.

We can load 2 models at the same time.
Output to filename-large.txt and filename-medium.txt one by one.

Building error: shaderData-Release.inl

Feature Request: Auto Populate output path based on input

Hello

Great program, been testing it for a few days now and am loving it! One small feature request I'd like to make is if it would be possible to auto-infer + populate the output path and filename based on the input.

Big thanks on the great proj!

Doesn't support AC3 format?

Support more file types in the picker

Currently .m4a files do not show in the file picker. I guess they do not fall under "multimedia files". When I put in the path to file manually it works fine.

Thank you for your work.

running multiple concurrent threads (each with their own model+context) throws exceptions

I tried running 2 threads - each allocating its own model and context - but calling context.runFull() from both threads at the same time causes exceptions.

Is this tested/supported? My GPU has more than enough VRAM and shaders to run several instances concurrently. This would be highly useful since it would increase throughput anywhere from 2x to 20x depending on GPU.

is it possible to add Add no_speech_threshold, logprob_threshold and compression_ratio_threshold on the ui and let user to change value

Cannot locate literal strings that are output during detection/transcription

I notice that sometimes the following strings (amongst others) will appear while it's transcribing:

[BLANK_AUDIO]
[MUSIC]
[VIDEO STARTING]
etc.

I did a search for these strings and they do not appear to be in the code base. I extended the search to my C: drive and could not find them anywhere. My search facility may have skipped some files due to format (dlls and exes)...

Can you tell me where they originate from?

Thanks!

Provide raw audio samples from .NET

Not really an issuel; merely curious.
One if MF helpers simply accepts file path, however providing audio samples directly would also be nice, like when recording audio from .NET. Is it planned?

Build issues

Tried building the entire solution in VS 2019 and got errors that led me to use VS 2022 (because it needs .NET 6, presumably).

Tried using VS 2022 and got:

Build started...
1>------ Build started: Project: PerfSummary, Configuration: Debug Any CPU ------
2>------ Build started: Project: compareTraces, Configuration: Debug x64 ------
3>------ Build started: Project: OldMain, Configuration: Debug x64 ------
4>------ Build started: Project: ComputeShaders, Configuration: Debug x64 ------
5>------ Build started: Project: ComLightLib, Configuration: Debug x64 ------
2>stdafx.cpp
3>ggmlMsvc.c
3>C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.33.31629\include\xmmintrin.h(79,10): fatal error C1083: Cannot open include file: 'malloc.h': No such file or directory
2>C:\Users\smbik\Desktop\Git\Whisper\Tools\compareTraces\stdafx.h(3,10): fatal error C1083: Cannot open include file: 'assert.h': No such file or directory
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\addRows.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\addRepeat64.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\addRepeatEx.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\addRepeat.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\convolutionMain2.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\convolutionPrep1.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\copyTranspose.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\addRepeatGelu64.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\convolutionPrep2.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\addRepeatGelu.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\addRepeatScale.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\addInPlace.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\add.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\copyConvert.cso
4>C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\convolutionMain.hlsl(34,4-29): warning X3557: loop doesn't seem to do anything, forcing loop to unroll
4>C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\convolutionMain.hlsl(34,4-29): warning X3557: loop doesn't seem to do anything, forcing loop to unroll
4>
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\convolutionMain.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\convolutionMain2Fixed.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\flashAttentionCompat2.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\diagMaskInf.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\fmaRepeat1.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\flashAttentionCompat1.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\fmaRepeat164.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\fmaRepeat2.cso
3>Done building project "OldMain.vcxproj" -- FAILED.
2>Done building project "compareTraces.vcxproj" -- FAILED.
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\flashAttention.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\mulMatByScalar.cso
4>C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\mulMatByRow64.hlsl(37,3-70): warning X3557: loop only executes for 1 iteration(s), forcing loop to unroll
4>
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\mulMatByRow64.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\flashAttentionCompat3.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\mulMatDotMain.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\matReshapePanels.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\mulMatByRowTiledEx.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\mulMatByRow.cso
5>freeThreadedMarshaller.cpp
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\mulMatDotReshape.cso
5>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\um\winnt.h(34,10): fatal error C1083: Cannot open include file: 'ctype.h': No such file or directory
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\normCompat.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\norm.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\scaleInPlace.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\mulMatMadMain.cso
5>Done building project "ComLightLib.vcxproj" -- FAILED.
6>------ Build started: Project: Whisper, Configuration: Debug x64 ------
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\normFixed.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\softMax64.cso
4>C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\normFixed.hlsl(31,3-70): warning X3557: loop only executes for 1 iteration(s), forcing loop to unroll
4>C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\normFixed.hlsl(31,3-70): warning X3557: loop only executes for 1 iteration(s), forcing loop to unroll
4>
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\normFixed64.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\softMax.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\softMaxLong.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\softMaxCompat.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\zeroMemory.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\softMaxFixed.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\mulMatByRowTiled.cso
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\mulMatTiledEx.cso
6>stdafx.cpp
6>C:\Users\smbik\Desktop\Git\Whisper\Whisper\stdafx.h(4,10): fatal error C1083: Cannot open include file: 'assert.h': No such file or directory
6>Done building project "Whisper.vcxproj" -- FAILED.
7>------ Build started: Project: WhisperDesktop, Configuration: Debug x64 ------
8>------ Build started: Project: main, Configuration: Debug x64 ------
9>------ Build started: Project: WhisperNet, Configuration: Debug Any CPU ------
4>compilation object save succeeded; see C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\mulMatTiled.cso
1>PerfSummary -> C:\Users\smbik\Desktop\Git\Whisper\Tools\PerfSummary\bin\Debug\PerfSummary.dll
4>ComputeShaders.cpp
9>WhisperNet -> C:\Users\smbik\Desktop\Git\Whisper\WhisperNet\bin\Debug\WhisperNet.dll
10>------ Build started: Project: MicrophoneCS, Configuration: Debug x64 ------
11>------ Build started: Project: TranscribeCS, Configuration: Debug x64 ------
4>ComputeShaders.vcxproj -> C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\x64\Debug\ComputeShaders.lib
8>useDiscreteGpu.c
7>stdafx.cpp
4>Done building project "ComputeShaders.vcxproj".
12>------ Build started: Project: CompressShaders, Configuration: Debug Any CPU ------
7>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\um\winnt.h(34,10): fatal error C1083: Cannot open include file: 'ctype.h': No such file or directory
7>Done building project "WhisperDesktop.vcxproj" -- FAILED.
8>main.cpp
8>C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.33.31629\include\cstdlib(12,10): fatal error C1083: Cannot open include file: 'math.h': No such file or directory
8>miscUtils.cpp
8>C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.33.31629\include\yvals.h(12,10): fatal error C1083: Cannot open include file: 'crtdbg.h': No such file or directory
8>params.cpp
10>C:\Program Files\Microsoft Visual Studio\2022\Professional\MSBuild\Current\Bin\amd64\Microsoft.Common.CurrentVersion.targets(5097,5): error MSB3030: Could not copy the file "C:\Users\smbik\Desktop\Git\Whisper\x64\Debug\Whisper.dll" because it was not found.
8>C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.33.31629\include\cstdlib(12,10): fatal error C1083: Cannot open include file: 'math.h': No such file or directory
8>textWriter.cpp
8>C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\um\winnt.h(34,10): fatal error C1083: Cannot open include file: 'ctype.h': No such file or directory
8>Generating Code...
8>Done building project "main.vcxproj" -- FAILED.
10>Done building project "MicrophoneCS.csproj" -- FAILED.
11>C:\Program Files\Microsoft Visual Studio\2022\Professional\MSBuild\Current\Bin\amd64\Microsoft.Common.CurrentVersion.targets(5097,5): error MSB3030: Could not copy the file "C:\Users\smbik\Desktop\Git\Whisper\x64\Debug\Whisper.dll" because it was not found.
11>Done building project "TranscribeCS.csproj" -- FAILED.
12>CompressShaders -> C:\Users\smbik\Desktop\Git\Whisper\Tools\CompressShaders\bin\Debug\CompressShaders.dll
========== Build: 4 succeeded, 8 failed, 0 up-to-date, 0 skipped ==========

The error list:

Severity Code Description Project File Line Suppression State
Error C1083 Cannot open include file: 'malloc.h': No such file or directory OldMain C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.33.31629\include\xmmintrin.h 79
Error C1083 Cannot open include file: 'assert.h': No such file or directory compareTraces C:\Users\smbik\Desktop\Git\Whisper\Tools\compareTraces\stdafx.h 3
Warning X3557 loop doesn't seem to do anything, forcing loop to unroll ComputeShaders C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\convolutionMain.hlsl 34
Warning X3557 loop doesn't seem to do anything, forcing loop to unroll ComputeShaders C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\convolutionMain.hlsl 34
Warning X3557 loop only executes for 1 iteration(s), forcing loop to unroll ComputeShaders C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\mulMatByRow64.hlsl 37
Error C1083 Cannot open include file: 'ctype.h': No such file or directory ComLightLib C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\um\winnt.h 34
Warning X3557 loop only executes for 1 iteration(s), forcing loop to unroll ComputeShaders C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\normFixed.hlsl 31
Warning X3557 loop only executes for 1 iteration(s), forcing loop to unroll ComputeShaders C:\Users\smbik\Desktop\Git\Whisper\ComputeShaders\normFixed.hlsl 31
Error C1083 Cannot open include file: 'assert.h': No such file or directory Whisper C:\Users\smbik\Desktop\Git\Whisper\Whisper\stdafx.h 4
Error C1083 Cannot open include file: 'ctype.h': No such file or directory WhisperDesktop C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\um\winnt.h 34
Error C1083 Cannot open include file: 'math.h': No such file or directory main C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.33.31629\include\cstdlib 12
Error C1083 Cannot open include file: 'crtdbg.h': No such file or directory main C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.33.31629\include\yvals.h 12
Error C1083 Cannot open include file: 'math.h': No such file or directory main C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.33.31629\include\cstdlib 12
Error C1083 Cannot open include file: 'ctype.h': No such file or directory main C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\um\winnt.h 34
Error MSB3030 Could not copy the file "C:\Users\smbik\Desktop\Git\Whisper\x64\Debug\Whisper.dll" because it was not found. MicrophoneCS C:\Program Files\Microsoft Visual Studio\2022\Professional\MSBuild\Current\Bin\amd64\Microsoft.Common.CurrentVersion.targets 5097
Error MSB3030 Could not copy the file "C:\Users\smbik\Desktop\Git\Whisper\x64\Debug\Whisper.dll" because it was not found. TranscribeCS C:\Program Files\Microsoft Visual Studio\2022\Professional\MSBuild\Current\Bin\amd64\Microsoft.Common.CurrentVersion.targets 5097

I'll check the errors one by one but it seems odd it can't build out of the box...thanks!

A little advice

I would like to add the ability to implement batch file tasks.
Also if real time speech recognition is implemented with low latency, can we do a desktop captioning? That way we can watch videos with real time translation.

encoding errors on example program

Thank you for your contribution. The directml version of whisper is much faster than the pure cpu version of whisper.cpp. And I had some issues when using it. The first one was the encoding problem. In the debugoutput of the desktop version, the output content sometimes lacked a few characters. I think it was a conversion from utf-8 to CP_ACP (windows936 gb2312-80). Similar encoding errors also happened in the cli, and the output dictation text was almost unreadable '?'
The second issue is that almost every audio that is transcribed will report the error “runFullImpl: failed to generate timestamp token - skipping one second”.
The third problem is similar to #18, it always stops working after recognizing for a period of time, and repeatedly outputs the last sentence of recognition content.
If you plan to track the last two problems, I can open another issue

Adding speaker diarization as a feature

Is it possible in future to add speaker diarization as an option to have labels for who speaks which part of the transcription? Very good project results btw. Thanks

Works with WINE on Linux/Mac thanks to using DirectX 11 instead of 12

For anyone interested: I have tested this library running an app under WINE 8.3 (www.winehq.org) on Linux and it works thanks to NOT requiring DirectX12 (WINE only goes up to 11 at this time). Unless there are performance improvements it might be good to keep targeting the DirectX 11 API for the time being.

The only issue was the use of CreateDecompressor in shaders.cpp because the WINE's cabinet.dll does not support those functions. I had to disable this compression/decompression pair to make things work. Not sure this size matters so much so perhaps it's better to just leave out compression?

Some problem

My laptop is i7+RXT3050, and when I use the software, the GPU is not working, while Intel's integrated graphics are full power

Language Auto Detection

I cant make language autodetection work.

When using the CLI, it seems to default to English, rather than "auto".
(tested with Chinese audio sample).

The UI has a dropdown without "auto" being an option

Translations

Thanks for this very useful application. Do you think it's possible to add the ability to translate into other languages than English? In my case Italian. Ciao

Is it possible to batch transcribe files?

Firstly, thank you for sharing your work. I've been using whisper via command line for a couple of months now and your implementation is considerably faster.

Secondly, is it possible to transcribe files in batches? My typical use case is transcribing several video classes overnight in a batch so that they are ready for me to read in the morning.

MFCreateSourceReaderFromURL failed

\Whisper\Examples\TranscribeCS\bin\x64\Release>TranscribeCS -l zh -ovtt L:\out.vtt -f "L:\1.mp3" -m L:\WhisperDesktop\models\ggml-tiny.bin
Using GPU "NVIDIA GeForce GTX 1070", feature level 12.1, effective flags Wave32 | NoReshapedMatMul
Loaded MEL filters, 62.8 kb RAM
Loaded vocabulary, 51865 strings, 771.3 kb RAM
Loaded 167 GPU tensors, 73.5388 MB VRAM
Computed CPU base frequency: 3.6 GHz
Loaded model from "L:\WhisperDesktop\models\ggml-tiny.bin" to VRAM
StreamFile
False
MFCreateSourceReaderFromURL failed
MFCreateSourceReaderFromURL failed

Audio capture cannot find devices

For some reason the Whisper Desktop application cannot find any audio capture device. I have no problem with other apps like discord, firefox, OBS, android emulators, audacity, etc... I have the correct authorizations in the privacy settings of windows to allow apps to use my microphones.
I have both an USB microphone (Samson Q2U) and a virtual cable (VB Audio Virtual Cable) but none is displayed.

I ran the code in Visual Studio and I noticed I get the error "0xe000020b" returned by the function "MFEnumDeviceSources( attrs, &ppDevices, &count )". I don't know what this error mean. I also checked that MFStartup is correctly executed without error.

I really want to do realtime audio translation, I had great success on translating japanese videos and transcribing english videos on a RX 6800XT. It's honestly impressive.

Output timing info to text files

The transcription in subtitle format and displayed in the terminal shows timing info eg [00:00:00 --> 00:00:02] words. Text files contain only line breaks. It would be great if they could show the same information displayed in the terminal.

Great project btw

Non-deterministic output regarding hyphenation

I transcribed a file and the result included hyphenation. After review of the transcription, it's apparent that hyphens indicate change in speaker. Somehow the model is able to distinguish different voices and separate the back-and-forth of conversation into statements that start with -.

I transcribed a different file, and the result did not have hyphenation.

It's unclear why I got hyphenation in some files, but not others.

Feature request: an option to save all formats

It would be nice to have an option to save all output formats at once (txt with timestamps and various subtitle formats) instead of just one at a time.

Stuck in audio capture

I was able to add a model to Whisper. On the next page I selected audio capture. However there seems to be no way to browse to a file instead. I hit back and it goes back to the model selection screen. Close exits the program.

[Edit: figured it out- it is "transcribe file" That sounds like transcribe to file, ie output subtitles. I suggest rewording to "load audio file"]

Unsafe build tags

I just updated the dependency to 1.2.0 and first time building produced this error: The package product 'whisper' cannot be used as a dependency of this target because it uses unsafe build flags.

Haven't received this on older versions, have you run into this with the latest build?

text output looping/repeating (until end)

Also mentioned here:
#23

I ran a few tests on longer video clips (e.g. 2 hours) and mostly it tends to repeat the sentence from a certain point until the end. E.g. after one hour, you see repeated output forever. The timestamps seem to indicate that there was a new text detected but the text content is the same as before.

https://1drv.ms/u/s!AkS-A9Jqq09FgzEX78lvh7SiMAYu?e=C6f8GW

In this example, after about 2 minutes, the sentence repeats: "Jagt mich mal mit frahmen nudelholz".
The exact command that i use:
C:\dev\whisper\Whisper\x64\Release\main.exe -f C:\temp\test.wav -l de -m C:\temp\whisper\ggml-large.bin

I did different tests and cut portions of some affected file and it turns out that it is not caused by the audio content itself because it the affected area will translate just fine if i e.g. cut 1 minute before it but if i leave 2 minutes before, it happens.

In ContextImpl.cpp, i tried to catch "repeated text" by just copying the latest "text" to the heap as soon as it is complete (about line 740) and before that, compare if the text is the same as last time. If yes, seek a little.
lasttext = new std::string(text);

if (0 == strncmp(lasttext->c_str(), text.c_str(), text.length())) {
 logDebug(u8"last text repeated");
 seek += 100;
}

delete lasttext;
lasttext = new std::string(text);

This seems to "workaround the issue" (still needs lot of testing) but i am really not sure if this is the correct way to do it.
Also, a question. Is it correct to do such workarounds there (there is another workaround a few lines above) or should the cause be searched and fixed somewhere else? (where)

Support for multiple GPU's

My machine has a number of GPU's in it.

NVidia 1050
ATI Radeon RX 580
and the Intel UHD 630 on the CPU

Im not sure what method your software uses to select the GPU, but it doesn't seem I can influence it at all.

Support multiple simultaneous outputs

Would be good to have support of simultaneous -otxt, -ovtt and -osrt as well as the regular text.
It is possible with the original CLI whisper.

const-me / whisper Goto Github PK

whisper's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs