asticode / go-astideepspeech Goto Github PK
View Code? Open in Web Editor NEWGolang bindings for Mozilla's DeepSpeech speech-to-text library
License: MIT License
Golang bindings for Mozilla's DeepSpeech speech-to-text library
License: MIT License
It's great that there is a Go binding, however it seems to target v0.1.0. We did a lot of changes now, v0.1.1 is available and current master branch has also a lot of new shining.
Do you have any plan on pushing updates? How can we be of help?
When trying to build the example program with the default models and audio I get the following errors:
# github.com/asticode/go-astideepspeech
deepspeech.cpp:30:13: error: 'Metadata' does not name a type
Metadata* sttWithMetadata(const short* aBuffer, unsigned int aBufferSize, unsigned int aSampleRate)
^~~~~~~~
deepspeech.cpp:60:5: error: 'Metadata' does not name a type
Metadata* STTWithMetadata(ModelWrapper* w, const short* aBuffer, unsigned int aBufferSize, int aSampleRate)
^~~~~~~~
deepspeech.cpp:65:36: error: 'Metadata' was not declared in this scope
double Metadata_GetProbability(Metadata* m)
^~~~~~~~
deepspeech.cpp:65:46: error: 'm' was not declared in this scope
double Metadata_GetProbability(Metadata* m)
^
deepspeech.cpp:70:30: error: 'Metadata' was not declared in this scope
int Metadata_GetNumItems(Metadata* m)
^~~~~~~~
deepspeech.cpp:70:40: error: 'm' was not declared in this scope
int Metadata_GetNumItems(Metadata* m)
^
deepspeech.cpp:75:5: error: 'MetadataItem' does not name a type
MetadataItem* Metadata_GetItems(Metadata* m)
^~~~~~~~~~~~
deepspeech.cpp:80:37: error: 'MetadataItem' was not declared in this scope
char* MetadataItem_GetCharacter(MetadataItem* mi)
^~~~~~~~~~~~
deepspeech.cpp:80:51: error: 'mi' was not declared in this scope
char* MetadataItem_GetCharacter(MetadataItem* mi)
^~
deepspeech.cpp:85:34: error: 'MetadataItem' was not declared in this scope
int MetadataItem_GetTimestep(MetadataItem* mi)
^~~~~~~~~~~~
deepspeech.cpp:85:48: error: 'mi' was not declared in this scope
int MetadataItem_GetTimestep(MetadataItem* mi)
^~
deepspeech.cpp:90:37: error: 'MetadataItem' was not declared in this scope
float MetadataItem_GetStartTime(MetadataItem* mi)
^~~~~~~~~~~~
deepspeech.cpp:90:51: error: 'mi' was not declared in this scope
float MetadataItem_GetStartTime(MetadataItem* mi)
^~
deepspeech.cpp:125:13: error: 'Metadata' does not name a type
Metadata* finishStreamWithMetadata()
^~~~~~~~
deepspeech.cpp:160:5: error: 'Metadata' does not name a type
Metadata* FinishStreamWithMetadata(StreamWrapper* sw)
^~~~~~~~
deepspeech.cpp: In function 'void FreeString(char*)':
deepspeech.cpp:167:9: error: 'DS_FreeString' was not declared in this scope
DS_FreeString(s);
^~~~~~~~~~~~~
deepspeech.cpp:167:9: note: suggested alternative: 'FreeString'
DS_FreeString(s);
^~~~~~~~~~~~~
FreeString
deepspeech.cpp: At global scope:
deepspeech.cpp:170:23: error: variable or field 'FreeMetadata' declared void
void FreeMetadata(Metadata* m)
^~~~~~~~
deepspeech.cpp:170:23: error: 'Metadata' was not declared in this scope
deepspeech.cpp:170:33: error: 'm' was not declared in this scope
void FreeMetadata(Metadata* m)
It appears that the 0.4.0
version of the DeepSpeech native header that's linked to in the readme does not define a type called Metadata
, however the latest version of the header here does.
I went through the commit history but can't seem to find anywhere where the Metadata
type was removed for 0.4.0
, did you perhaps link to the wrong version of the deepspeech native client in the readme?
hi, I just wanna translate pcm data to text in realtime, the pcm data is decoded by ffmpeg from live stream, however I can't get the result successfully. can you fix it? here are the codes, feed the pcm data all the time, and translate to texts in 5 seconds:
var stream *astideepspeech.Stream
func detectVoice(sample []byte){
if stream == nil {
m, _ := astideepspeech.New(model)
if err := m.SetBeamWidth(beamWidth); err != nil {
fmt.Println(fmt.Sprintf("Failed setting beam width: %v", err))
return
}
if err := m.EnableExternalScorer(scorer); err != nil {
fmt.Println(fmt.Sprintf("Failed enabling external scorer: %v", err))
return
}
if err := m.SetScorerAlphaBeta(alpha, beta); err != nil {
fmt.Println(fmt.Sprintf("Failed setting scorer hyperparameters: %v", err))
return
}
var err error
stream,err = m.NewStream()
if err != nil {
fmt.Println(fmt.Sprintf("Failed create stream: %v", err))
return
}
}
var d []int16
for _, v := range sample {
d = append(d, int16(v))
}
stream.FeedAudioContent(d)
}
func init(){
Println("get stt result in 5 seconds..........")
go func(){
var ch chan int
ticker := time.NewTicker(time.Second * 5)
go func() {
for range ticker.C {
if stream!=nil{
result,err := stream.IntermediateDecode()
if err != nil {
fmt.Println(fmt.Sprintf("Failed converting speech to text: %v", err))
return
}
fmt.Println("result: ", result)
}
}
ch <- 1
}()
<-ch
}()
}
Hello I am running this lib against deepspeech.so v0.9 and no problems so far :)
Feel free to close this just wanted to share this as the readme says 0.8.
Thanks for this lib it is amazing!
Mozilla's DeepSpeech has 2 installation method and one of them is using pip3 install deepspeech-gpu
which does utilization of GPU for transcription engine, does this go-binding offers the same in by default nature or any special way?
Hi
I followed your README.md and got the following error when installing astideepspeech at /tmp/deepspeech directory. Please assist. Many thanks.
$ go get -u github.com/asticode/go-astideepspeech/...
go: finding github.com/cryptix/wav latest
go: finding github.com/cheekybits/is latest
github.com/asticode/go-astideepspeech
ld: library not found for -ldeepspeech
clang: error: linker command failed with exit code 1 (use -v to see invocation)
We have released 0.8 release, adding support on that binding might be a good thing. I might be able to send PR.
We recently merged a PR that exposes new information, via new API calls: mozilla/DeepSpeech@a009361#diff-0317a0e76ece10e0dba742af310a2362
This allows access to timing information. We have not yet updated our own bindings by exposing this, I can likely try and take care of that here as well.
Hello,
We will have 0.4.0 soon, do you still maintain those bindings ? We have a streaming API for some time now, it's quite useful
We made some (breaking) changes to the API, this needs to be reflected here.
Hello @asticode we have a newer version available, with some API changes :)
Hello,
1.0 is close, and part of the work for that involves one painful change: we need to rename the project to Mozilla Voice STT.
It also means library itself and the API need to get renamed: libdeepspeech.so
-> libmozilla_voice_stt.so
and API DS_*
becomes STT_*
.
Besides this renaming, there should be no other change. I can help and prepare a PR to update once we have completed some painful renaming on our side (CI, packages everywhere).
For clarity, it might be good if you could rename your binding as well, but we do not want to force you as well.
To be in sync with upstream mozilla/DeepSpeech@fa7cb1a
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.