ccoreilly / vosk-browser Goto Github PK

View Code? Open in Web Editor NEW

341.0 19.0 58.0 723.47 MB

A speech recognition library running in the browser thanks to a WebAssembly build of Vosk

License: Apache License 2.0

Makefile 5.56% Dockerfile 2.89% TypeScript 34.56% JavaScript 42.73% C++ 13.61% HTML 0.65%

kaldi vosk wasm webassembly asr stt speech-recognition speech-to-text typescript

vosk-browser's People

Contributors

Stargazers

Watchers

vosk-browser's Issues

Not working

I have downloaded the Zip file from github. Examples are not working from local files or from the glitch

Can't run

I keep getting "invalid base URL" when I try to load any of the example scripts.

use with NeMo or Wav2Vec

Thanks for this great lib.
Can we use it with Wav2Vec or NeMo for online streaming?

Recognizer listens before the event 'result' or 'partialresult' is added

Hello!
If I say "Hello" and then run the code below, I get the result "Hello".

this.recognizer.addEventListener('partialresult', this.getPartialResult);
this.recognizer.addEventListener('result', this.getResult);

Expected: recognizer starts listening when got event listener.

I am creating a feature in which users press and speak.

I thought to write when users don't need a microphone like this, but then the recognizer pauses.

this.mediaStream.getAudioTracks (). forEach (track => {
   track.enabled = false;
});

So if after a long time user will press my button again, code will run track.enabled = true and recognizer will continue to recognize previous (not actual) voice.

Tested on Vue.js

Build broken by kaldi repo

The kaldi repo no longer has an upstream-1.8.0 branch nor a revision 75ecaef39 (thanks, git, for allowing erasing history). Right now, vosk-browser doesn't build because of these issues.

Is the model downloaded or loaded on the browser side??

Hello and have a good time

Is the model downloaded or loaded on the browser side??

In the firefox browser, when I am loading the model, the models are downloaded and remain in the loading mode

https://zlib.net/zlib-1.2.11.tar.gz is not found.

make builder is not runnning.

#23 [19/30] RUN curl --fail -q -L https://zlib.net/zlib-1.2.11.tar.gz |     tar xz --strip-components=1
#23 sha256:466b94055ed6579c06b1fa768fc518b6b5ac4f293f9d91cbcb111eb106a2c522
#23 0.343   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
#23 0.343                                  Dload  Upload   Total   Spent    Left  Speed
  0   315    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
#23 1.454 curl: (22) The requested URL returned error: 404
#23 1.457 
#23 1.457 gzip: stdin: unexpected end of file
#23 1.460 tar: Child returned status 1
#23 1.460 tar: Error is not recoverable: exiting now
#23 ERROR: executor failed running [/bin/sh -c curl --fail -q -L https://zlib.net/zlib-1.2.11.tar.gz |     tar xz --strip-components=1]: exit code: 2
------
 > [19/30] RUN curl --fail -q -L https://zlib.net/zlib-1.2.11.tar.gz |     tar xz --strip-components=1:
------
executor failed running [/bin/sh -c curl --fail -q -L https://zlib.net/zlib-1.2.11.tar.gz |     tar xz --strip-components=1]: exit code: 2
make: *** [builder] Error 1

Models not loading (downloading) in Firefox 96

When load button is pushed, browser console gives the error:

Failed to sync file system: InvalidStateError: A mutation operation was attempted on a database that did not allow mutations. b28a5120-7502-42f6-9dd0-5e7a322b0752:117:21
    error blob:https://ccoreilly.github.io/b28a5120-7502-42f6-9dd0-5e7a322b0752:117
    handleMessage blob:https://ccoreilly.github.io/b28a5120-7502-42f6-9dd0-5e7a322b0752:166

and does not download the model. Tried for English and Catalan.

OS: Ubuntu 20.04

Result event not triggered on file upload

Hello, I am working on a way to pass audio file to the recognizer all at once.

I took the react example and edited file-upload.tsx to send the whole file as buffer to the AudioStreamer "_write" method.
The problem reside on the "result" event of the recognizer not being fired after the process.
The "partialresult" event is called with every words but misses timestamps.

Here is the implementation of the "onChange" function in file-upload.tsx:

const onChange = useCallback(
    async ({ file }: UploadChangeParam<UploadFile<any>>) => {

      if (
        recognizer &&
        file.originFileObj &&
        file.percent === 100
      ) {
        const fileUrl = URL.createObjectURL(file.originFileObj);
        const _audioContext = audioContext ?? new AudioContext();
        const arr = await fetch(fileUrl).then((res) => res.arrayBuffer());

        _audioContext.decodeAudioData(arr, (buffer) => {
          let audioStreamer = new AudioStreamer(recognizer);
          audioStreamer._write(buffer, {
            objectMode: true,
          }, () => {
            console.log('done')
          });
        });
      }
    },
    [audioContext, recognizer]
  );

I have also noticed when uploading a second file it works well, the result event is triggered and includes both files data.

What I am missing? Is there a way to dispatch a "result" event?

how to build and get a new demo address？

16kHz sample rate does not work

From the examples, it looks like the required sample rate is 44.1kHz or 48kHz (they both seem to generate accurate transcriptions, not sure which one is better). I tried setting 16kHz for the microphone, audio context, and recognizer, but the transcriptions were not valid at all. I thought the models work with 16kHz; is there a reason why this sample rate doesn't work? The poster of #48 mentioned having to update from 16k to 48kHz in order for the basic example to work.

keyword spotting mode

Does it support keyword spotting mode，i really need this...

View timing of words/phonemes?

The type for ServerMessageResult hints at timing information being available from Vosk for words.

    export interface ServerMessageResult {
    event: "result";
    recognizerId: string;
    result: {
        result: Array<{.      // This is maybe where I could find word timing.
            conf: number;
            start: number;
            end: number;
            word: string;
        }>;
        text: string;
    };
}

...but the message received in the result callback doesn't have a result.result.* value.

Is there some way to get the timing info? I would do wonderful things with it.

Big fan of vosk-web. Thanks, Ciaran!

can not build vosk-browser

when i run make

#7 [ 4/30] RUN git clone -b vosk --single-branch https://github.com/alphacep/kaldi . &&     git checkout 6417ac1dece94783e80dfbac0148604685d27579
#7 sha256:d72b762a9137ae3da9126377d52f4ac1e5fb4134afc31851f4a093636254bbdc
#7 0.455 Cloning into '.'...
Updating files: 100% (8265/8265), done.5)
#7 15.75 fatal: reference is not a tree: 6417ac1dece94783e80dfbac0148604685d27579
#7 ERROR: executor failed running [/bin/sh -c git clone -b vosk --single-branch https://github.com/alphacep/kaldi . &&     git checkout 6417ac1dece94783e80dfbac0148604685d27579]: exit code: 128

i think commit 6417ac1dece94783e80dfbac0148604685d27579 was removed

Suggestion: Editing text, interactive clickable transcript, export functions

Hello!

Just wanted to say that I love what you're doing here! ❤️ The demo is amazing, and I can't wait to see how this project pans out.

A while ago, I proposed creating a FLOSS version of Otter.ai and Sonix over at Open Source Ideas:
open-source-ideas/ideas#288

I'm not sure if this is what you're envisioning for this project, but it would be interesting to have the ability to play back the audio and have the playback timed with the transcript. Clicking on a word could also toggle the playback to that point. (see Demo #6 of AblePlayer)
Additionally, the ability to edit and export the text would be helpful for people who use transcriptions (for closed captioning, research, journalism, meeting minutes, etc.)

Unable to load model

Hi,

Thanks for this work. I am using Chrome. The model file model.tar.gz is placed in the same folder. It never moves past "Loading..." message!

how do you start the demo locally?

Navigated to modern-vanilla directory and launched python3 -m http.server

can you share the output of the browser console?

ERROR (VoskAPI:Model():src/model.cc:122) Folder '/vosk/model_tar_gz' does not contain model files. Make sure you specified the model path properly in Model constructor. If you are not sure about relative path, use absolute path specification.
put_char @ 82049aad-16de-4cf3-9fcf-0c277f01fe02:41

recognizer.on result never gets called.

I used the react code, and it never calls this piece of code. Is there a parameter to set to enable collecting result?

recognizer.on("result", (message: any) => {
      const { result } = message;
      setUtterances((utt: VoskResult[]) => [...utt, result]);
    });

The model isn't loading

I see there is an model.tar.gz in public folder in react example. I want to use it for testing purposes, but the model isn't loading.

Supported browsers

Demo address The display browser supports Firefox. Is there any way to make it work at Microsoft edge @ccoreilly

Vosk model

I am new to javascript. I want to see how the vosk-browser script worked using the sample script.
I downloaded a vosk model, zipped it as tar.gz and put it in the same folder as the script. I tried to just check for errors using a button onclick event on a html page. I got this on visual studio code:
Setting up persistent storage at /vosk null/4ccd8af6-9ac1-407c-9f6a-436d83146d69:147
File system synced from host to runtime null/4ccd8af6-9ac1-407c-9f6a-436d83146d69:40
Am I to create a folder named "vosk". I really do not understand.
Thank you for responding.

model folder problem, and code modifications

Hi.

While running the code at https://github.com/ccoreilly/vosk-browser/tree/gh-pages using Firefox and local server, the index.html file would not run. I inspected the source code, and the relative path of model given was "vosk-browser\models\vosk-model-small-en-us-0.15.tar.gz", which I corrected to "models\vosk-model-small-en-us-0.15.tar.gz".

I corrected the code in index.html, and the code ran, and generated Speech-to-text through given audio file (Direct audio through mic still didn't worked, though mic worked, but all it did was sending audio chunks and never display the results in the textarea).

If I need to modify the code, I am unable to do that. How do I modify it. Can you provide the source code?

thanks.

Module not found: Error: Can't resolve 'worker_threads' in

I used the vosk-browser in one of my webpack projects and it throws these errors.
Module not found: Error: Can't resolve 'worker_threads' in ....

Steps I followed:

npm run rollup
and then in the lib npm pack .
copied it to my webpack project and did a webpack start.

Online demo created

Not sure if this is of any use, but I created a small online demo using this tool when I was experimenting with it. You can view it online at

https://captioner.richardson.co.nz/

And the source code for it is at: https://github.com/Rodeoclash/captioner

It might be possible to adapt this for an official demo if you're interested (although it is lacking a few things at the moment, i.e. it only works on video and currently the videos have no audio when playing).

The ScriptProcessorNode is deprecated. Use AudioWorkletNode instead.

error in example if use Google chrome.
`async function init() {
const model = await Vosk.createModel('vosk-model-small-ru-0.15.tar.gz');

const recognizer = new model.KaldiRecognizer();
recognizer.on("result", (message) => {
    console.log(`Result: ${message.result.text}`);
    TTS(message.result.text)
});
recognizer.on("partialresult", (message) => {
    console.log(`Partial result: ${message.result.partial}`);
    
});

const mediaStream = await navigator.mediaDevices.getUserMedia({
    video: false,
    audio: {
        echoCancellation: true,
        noiseSuppression: true,
        channelCount: 1,
        sampleRate: 16000
    },
});

const audioContext = new AudioContext();
const recognizerNode = audioContext.createScriptProcessor(4096, 1, 1)
recognizerNode.onaudioprocess = (event) => {
    try {
        recognizer.acceptWaveform(event.inputBuffer)
    } catch (error) {
        console.error('acceptWaveform failed', error)
    }
}
const source = audioContext.createMediaStreamSource(mediaStream);
source.connect(recognizerNode);

}

window.onload = init;`

Cant build example/react

First thanks for an amazing contribution.

Second, trying to build (npm run build), I get that recognizer.tsx can't find vosk-browser. Went to node_modules/vosk-browser and did npm build, which solved the first issue, but this led to others.

Any ideas?

Thanks again!

Chrome support

Hello,
I want to understand why it does not work properly on chrome and what can I do to change this situation or workaround it.

AudioWorklet support via SEPIA Web Audio?

Hi everybody,

I just saw this project and thought it was very interesting and fits quite well to a library I've just released 🙂 .
For my SEPIA Open Assistant project I've built the SEPIA Web Audio Library that can handle custom audio pipelines with AudioWorklet and Web-Worker support. There is pretty good WASM support as well since the resampler for example can use Speex via a WASM module.

The library has a module that interfaces with Vosk via the SEPIA STT-Server (a WebSocket streaming STT server). Currently I prefer to host Vosk on a Raspberry Pi 4 instead of running it on the client, but I'm pretty sure much of the code could be reused 😃 .

Let me know if this sounds interesting to you and I can help to get started!

Delays when transcribing streaming audio

First of all, excellent work. Vosk is great as it is, and this library makes it even better.

I am experiencing a heavy delay on transcription when pulling in a stream from webRTC (partials and fulls).

I suspect maybe it is because of the deprecated "createScriptProcessor" and "onaudioprocess" pieces, but I am unsure.

Here is how I am processing things. If you have any ideas as to why things would be delayed, please let me know. Thank you.

this.recognizeSpeech = async () => {
    console.log("starting recognizeSpeech");
    let audioContext = this.remoteAudioContext;
    let remoteStream = this.incomingAudioStream;
    //
    const recognizerNode = audioContext.createScriptProcessor(4096, 1, 1);
    const model = await createModel("./softphone/model.tar.gz");
    const recognizer = new model.KaldiRecognizer(48000);
    recognizer.setWords(true);
    recognizer.on("partialresult", function (message) {
      console.log("PARTIAL: " + message.result.partial);
    });
    recognizerNode.onaudioprocess = async (event) => {
      try {
        recognizer.acceptWaveform(event.inputBuffer);
      } catch (error) {
        console.error("acceptWaveform failed", error);
      }
    };
    this.remoteTrack.connect(recognizerNode).connect(audioContext.destination);
  };

Unable to load model in nodejs

When I run the following code:

let Vosk = require("vosk-browser");
let url = "model.tar.gz";
async function init() {
  const model = await Vosk.createModel(url);
}
init();

I get this error:

this.worker.addEventListener("message", (event) => this.handleMessage(event));
                   ^

TypeError: this.worker.addEventListener is not a function
    at EventTarget.initialize (/Users/bobby/Desktop/vosk-browser/node_modules/vosk-browser/dist/vosk.js:238:25)
    at new Model (/Users/bobby/Desktop/vosk-browser/node_modules/vosk-browser/dist/vosk.js:235:18)
    at Object.<anonymous> (/Users/bobby/Desktop/vosk-browser/node_modules/vosk-browser/dist/vosk.js:354:27)
    at Generator.next (<anonymous>)
    at /Users/bobby/Desktop/vosk-browser/node_modules/vosk-browser/dist/vosk.js:28:75
    at new Promise (<anonymous>)
    at __awaiter (/Users/bobby/Desktop/vosk-browser/node_modules/vosk-browser/dist/vosk.js:24:16)
    at Object.createModel (/Users/bobby/Desktop/vosk-browser/node_modules/vosk-browser/dist/vosk.js:353:16)
    at init (/Users/bobby/Desktop/vosk-browser/index.js:5:28)
    at Object.<anonymous> (/Users/bobby/Desktop/vosk-browser/index.js:7:1)

My folder structure is

|
|-- index.js
|-- model.tar.gz
|-- node_modules/

So I would thing the program could load the model, but I also get the same error when I set the url to be complete gibberish.

Thanks for your help

hololens2 errors

This example runs could not start audio source in hololens

How can I add a new language?

I have downloaded a Ukrainian model and changed archive type to tar.gz, but I couldn't load it in the browser. I have looked into other archives but noticed they have a file of unknown type corresponding to each folder in archive. Is there a way to add a new language?
Thank you.

Adding Malayalam model to demo website

I have trained Malayalam ASR model for vosk and is available here. Can this be added to the demo website? Is there a way I can help?

Attribution difficult

The NOTICES file doesn't include all dependent software, but every piece of dependent software requires attribution. This makes it extremely difficult for anyone to put together a correct (and legally mandatory) attribution and license notice. I put this one together, which I believe includes all dependencies: https://raw.githubusercontent.com/Yahweasel/ennuicastr/master/src/vosk-browser-license.js .

Moreover, I was surprised to find GSL in the mix. GSL is under the GPL (not the LGPL), so if it's being used, then vosk-browser as a whole is licensed under the GPL. That's no problem for my use, but it should be documented somewhere. Weirdly, though, as far as I can tell, it's not actually using GSL. The kaldi patch seems to add GSL to the configure, but doesn't add any uses of GSL as far as I can tell. If it was some experiment (perhaps from the original porter of vosk?) it should just be removed, to fix this licensing snafu.

createScriptProcessor deprecated

Looks like createScriptProcessor is deprecated. https://developer.mozilla.org/en-US/docs/Web/API/BaseAudioContext/createScriptProcessor

Can we get an updated version that uses modern technique?

Console. log

Hello, how disable this console.log ?

How to reduce delay between results?

Hi, I’ve noticed Vosk wait a few split seconds after the user is done talking before emitting a result, unlike partial results which get fired continuously, but aren’t as reliable. Since our application only respond to short, single-word commands, we’d like to reduce Vosk’s result “de-bounce” time to make our application feels more responsive. Do you have any suggestions?

Can't build

Hello I try to reproduce the work produced by arbdevml for issue #49.

I fork the repository.
I have docker, make installed.

I reproduce the described steps :

on vosk-browser folder

make builder

I have the following output :

#7 [ 4/30] RUN git clone -b vosk --single-branch https://github.com/alphacep/kaldi . &&     git checkout 6417ac1dece94783e80dfbac0148604685d27579
#7 sha256:d72b762a9137ae3da9126377d52f4ac1e5fb4134afc31851f4a093636254bbdc
#7 0.474 Cloning into '.'...
#7 19.32 fatal: reference is not a tree: 6417ac1dece94783e80dfbac0148604685d27579
#7 ERROR: executor failed running [/bin/sh -c git clone -b vosk --single-branch https://github.com/alphacep/kaldi . &&     git checkout 6417ac1dece94783e80dfbac0148604685d27579]: exit code: 128
------
 > [ 4/30] RUN git clone -b vosk --single-branch https://github.com/alphacep/kaldi . &&     git checkout 6417ac1dece94783e80dfbac0148604685d27579:
------
executor failed running [/bin/sh -c git clone -b vosk --single-branch https://github.com/alphacep/kaldi . &&     git checkout 6417ac1dece94783e80dfbac0148604685d27579]: exit code: 128

I think the kaldi project was updated and the git hash does not exists anymore

=> I check the rest of the Dockerfile file and i see that a clone of an inria repository is needed... but this repository seems to be not accessible ?

Someone can help me because i really want to help and have the voice fingerprinting feature with spk model that i have already experiment on python distro ?

Failed to sync file system: Error: FS error

I am getting the following error in both Chrome and Firefox...

Failed to sync file system: Error: FS error
(anonymous) @ fcedf841-34f4-40cb-8bb0-17f857a1d44c:127
Promise.catch (async)
handleMessage @ fcedf841-34f4-40cb-8bb0-17f857a1d44c:126
(anonymous) @ fcedf841-34f4-40cb-8bb0-17f857a1d44c:107

fcedf841-34f4-40cb-8bb0-17f857a1d44c:127 links to the following code:

    class RecognizerWorker {
        constructor() {
            this.recognizers = new Map();
            ctx.addEventListener("message", (event) => this.handleMessage(event));
        }
        handleMessage(event) {
            const message = event.data;
            if (!message) {
                return;
            }
            if (ClientMessage.isLoadMessage(message)) {
                console.debug(JSON.stringify(message));
                const { modelUrl } = message;
                if (!modelUrl) {
                    ctx.postMessage({
                        error: "Missing modelUrl parameter",
                    });
                }
                this.load(modelUrl)
                    .then((result) => {
                    ctx.postMessage({ event: "load", result });
                })
                    .catch((error) => {                                                       // --- IT'S THIS ERROR THAT IS CATCHING  
                    console.error(error);
                    ctx.postMessage({ error: error.message });
                });
                return;

... etc

Do let me know if more details to reproduce the error are needed.

Thankyou!!

Recognizer not ready, ignoring (Browser testing)

Hello team!

I am testing your software using apache2 on a virtual machine running a Ubuntu server on windows.
This is the index and I was trying to test the microphone input.
2
1
3 <script type="application/javascript" src="https://cdn.jsdelivr.net/npm/[email protected]/dist/vosk.js"></script>
1
2
3
4
5 <script>
6 async function init() {
7 const model = await Vosk.createModel('https://ccoreilly.github.io/vosk-browser/models/vosk-model-small-en-us-0.15.tar.gz');
8
9 const recognizer = new model.KaldiRecognizer();
10 recognizer.on("result", (message) => {
11 console.log(Result: ${message.result.text});
12 });
13 recognizer.on("partialresult", (message) => {
14 console.log(Partial result: ${message.result.partial});
15 });
16
17 const mediaStream = await navigator.mediaDevices.getUserMedia({
18 video: false,
19 audio: {
20 echoCancellation: true,
21 noiseSuppression: true,
22 channelCount: 1,
23 sampleRate: 16000
24 },
25 });
26
27 const audioContext = new AudioContext();
28 const recognizerNode = audioContext.createScriptProcessor(4096, 1, 1)
29 recognizerNode.onaudioprocess = (event) => {
30 try {
31 recognizer.acceptWaveform(event.inputBuffer)
32 } catch (error) {
33 console.error('acceptWaveform failed', error)
34 }
35 }
36 const source = audioContext.createMediaStreamSource(mediaStream);
37 source.connect(recognizerNode);
38 }
39
40 window.onload = init;
41 </script>
42
43
44 Hola!
45
46
~

The result on console is the repetition of the following lines:

Recognizer (id: d6562c55-8db5-4918-9c65-fc0d1f061ff2): Sending audioChunk vosk.js:333:29
Recognizer (id: d6562c55-8db5-4918-9c65-fc0d1f061ff2): process audio chunk with sampleRate 192000 94bb588b-5609-4bb8-bd34-b6f9f1c4968e:269:25
Recognizer (id: d6562c55-8db5-4918-9c65-fc0d1f061ff2): process audio chunk with sampleRate 192000 94bb588b-5609-4bb8-bd34-b6f9f1c4968e:269:25
Recognizer not ready, ignoring 94bb588b-5609-4bb8-bd34-b6f9f1c4968e:271:29
Recognizer not ready, ignoring 94bb588b-5609-4bb8-bd34-b6f9f1c4968e:271:29
Recognizer (id: d6562c55-8db5-4918-9c65-fc0d1f061ff2): Sending audioChunk vosk.js:333:29
Recognizer (id: d6562c55-8db5-4918-9c65-fc0d1f061ff2): process audio chunk with sampleRate 192000 94bb588b-5609-4bb8-bd34-b6f9f1c4968e:269:25
Recognizer not ready, ignoring 94bb588b-5609-4bb8-bd34-b6f9f1c4968e:271:29
Recognizer not ready, ignoring 94bb588b-5609-4bb8-bd34-b6f9f1c4968e:271:29

Could you please help me?
Thanks in advance

How does it work

examples Demo how does is work， Please give me some advice ，thank you

Recognizer.removeEventListener

I am currently using Vue Js to run Vosk-browser and manage to call the ASR model and Kaldi recognizer by using

this.recognizer.on("result", (message) => {
    const result = message.result;
    this.full.textContent += result.text + " "
})

The model is working well, however, I am trying to remove the event listener by using:

this.recognizer.removeEventListener("result", (message) => {
    const result = message.result;
    this.full.textContent += result.text + " "
})

Is this the way of doing it?

npm i vosk-browser report errors

When I execute npm run dev

Can you tell me the answer？

How to create an example of the X-vector of the speaker (voice fingerprint)?

Hello.
First of all very big thank you for this project.

I am trying to create an example with a speaker model
to get the X-vector of the speaker (voice fingerprint).

I am using this example: https://github.com/ccoreilly/vosk-browser/blob/master/examples/words-vanilla/index.js

const model = await Vosk.createModel('vosk-model-small-en-in-0.4.tar.gz');
const speakerModel = await Vosk.createSpeakerModel('vosk-model-spk-0.4.zip');

...

const recognizer = new model.KaldiRecognizer(sampleRate, JSON.stringify(['[unk]', 'encen el llum', 'apaga el llum']));
recognizer.setSpkModel(speakerModel);
recognizer.on("result", (message) => {
	const result = message.result;
	if(result.hasOwnProperty('spk'))
		console.info("X-vector:", result.spk);
});

Speaker identification model:
https://alphacephei.com/vosk/models/vosk-model-spk-0.4.zip

Node.js example:
https://github.com/alphacep/vosk-api/blob/master/nodejs/demo/test_speaker.js

Could you offer some advice, please:

How to load vosk-model-spk-0.4.zip
How to implement methods createSpeakerModel and setSpkModel
How to fetch the X-vector of the speaker (voice fingerprint)?
Thank you for your answer.

vosk-browser/tree/master/examples/modern-vanilla

Hi Ciaran, great work, the web demo is very impressive !

Is the tar.gz the correct file format that this library expects ? I can not get the basic demo above to load, or does it need something else, like an extracted version of the language library?

Spent hours trying to figure out what is wrong, it seems to have issues loading the library.

this is the error I get when trying to run it in Chrome (also tried Opera and Firefox)
""""
cb15bbf8-7209-4922-9c12-0b9e258dbd24:127 Error: HTTP error! status: 404
at cb15bbf8-7209-4922-9c12-0b9e258dbd24:41:4212557
at Generator.next ()
at loop (cb15bbf8-7209-4922-9c12-0b9e258dbd24:41:4211624)
at cb15bbf8-7209-4922-9c12-0b9e258dbd24:41:4211805
"""

the directory I run it in (linux webserver running apache) has this file (english):
vosk-model-small-en-us-0.15.tar.gz

and I updated the index.js demo to call it:
from:
const model = await Vosk.createModel('model.tar.gz');
to
const model = await Vosk.createModel('vosk-model-small-en-us-0.15.tar.gz');

I am really excited about getting this to work, so would really appreciate your help with any basic demo anyone could run locally.

thanks!
Emerson

Is it possible to extract text from a large file faster than playback?

This is an awesome project. I was wondering if it was possible to extract the text from a file in a time shorter than the duration of the file? When testing out the hosted demo, it appears the text is extracted about the rate as if it were from a live source.

There's so much that is new and unfamiliar to me that I'm having a hard time understanding the code. I created a gist that has my attempt to make it work. This is a gist of the beforeUpload function on the Upload component...it extracts text, but I can't quite tell if it's any faster.

Assuming the extraction can be done faster, any idea on how to get an approximate timestamp in the audio file?

information available in the User Agent string will be reduced

A page or script is accessing at least one of navigator.userAgent, navigator.appVersion, and navigator.platform. Starting in Chrome 101, the amount of information available in the User Agent string will be reduced.
To fix this issue, replace the usage of navigator.userAgent, navigator.appVersion, and navigator.platform with feature detection, progressive enhancement, or migrate to navigator.userAgentData.
Note that for performance reasons, only the first access to one of the properties is shown

Webpage is not loading

I have little coding experience, but I followed all guidelines to launch the demo app from examples/react folder. I ran npm install, npm build and a few other commands to resolve errors for webpack 5. However, finally when I ran npm run start, the vosk-browser fails to launch, even though no errors are detected. The page is empty.

C:\Users\CNata\Downloads\vosk-browser-master\examples\react>npm run start

> [email protected] start
> react-scripts start

(node:17120) [DEP_WEBPACK_DEV_SERVER_ON_AFTER_SETUP_MIDDLEWARE] DeprecationWarning: 'onAfterSetupMiddleware' option is deprecated. Please use the 'setupMiddlewares' option.
(Use `node --trace-deprecation ...` to show where the warning was created)
(node:17120) [DEP_WEBPACK_DEV_SERVER_ON_BEFORE_SETUP_MIDDLEWARE] DeprecationWarning: 'onBeforeSetupMiddleware' option is deprecated. Please use the 'setupMiddlewares' option.
Starting the development server...
Compiled successfully!

You can now view vosk-browser-react-demo in the browser.

  Local:            http://localhost:3000/vosk-browser
  On Your Network:  http://192.168.56.1:3000/vosk-browser

Note that the development build is not optimized.
To create a production build, use npm run build.

webpack compiled successfully
Files successfully emitted, waiting for typecheck results...
Issues checking in progress...
No issues found.

run project,prompt :"You can run npm install --save worker_threads"

please help me,
use vosk-browser prompt error：

Module not found: Can't resolve imported dependency "worker_threads"
Did you forget to install it? You can run: npm install --save worker_threads

App • WARNING • Compilation succeeded but there are warning(s). Please check the log above.

Improve Error Handling for processAudioChunk and createModel

Hi,

First of all thanks for this wonderful package I really enjoy using it and find it super useful.

I'm working on a project which utilizes vosk-browser, and noticed that the KaldiRecognizer.on method only supports 2 types of events result and partialresult as part of the TS definitions.

While browsing worker.ts code I noticed that the processAudioChunk method also handles an error.
This error is then catched (worker.ts:72) and emitted back to the model.

I want to subscribe to these errors but the current KaldiRecognizer.on interface only allows result and partialresult as input.

This is a small change and I can contribute myself if you would allow me to do so.

vosk-browser/lib/src/interfaces.ts

Line 128 in c7877a6

export type RecognizerMessage =

Add ServerMessageError to RecognizerMessage type:

export type RecognizerMessage =
  | ServerMessagePartialResult
  | ServerMessageResult
  | ServerMessageError;

Updated:
Another issue is that when I tried to reproduce an error during the processAudioChunk method execution, the result from this method is an error object { error: errorMessage }. I believe it should probably throw an error instead of returning an object so the catch in handleMessage method will build the correct error object. The retuned error object doesn't contain an event field so it is dispatched from the model as undefined and can't be catched by the consumers of the npm module.

vosk-browser/lib/src/model.ts

Line 66 in c7877a6

this.dispatchEvent(new CustomEvent(message.event, { detail: message }));

This is also the case for the createModel method which doesn't resolve the promise in case of fetch failure.

Regards,
Barak
Software Engineer @ Microsoft

Build output location

I am able to get the build to complete (when using the modification made in #56), but I cannot find the output files. I run the build by running make in the vosk-browser directory. Where does the build output its files? Do the output files need to be manually extracted from the Docker container?

ccoreilly / vosk-browser Goto Github PK

vosk-browser's People

Contributors

Stargazers

Watchers

Forkers

vosk-browser's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs