Comments (2)
We could make the OOV score configurable if that's something that would be helpful for you. The point of that is to, as you pointed out, more or less only allow in-vocabulary words to be emitted unless the acoustic model is VERY confident. With a strong word-language model, this tends to work well to reduce EER. You are correct that it hurts performance on things like names. If that's a use case for you, we could also look into adding a configurable list of unigrams whose probabilities would be boosted (ie a set of names etc) or you could train a more tailored language model.
from ctcdecode.
Hi, any update on the possibility to boost a list of words as some kind of custom vocabulary which would bias the beam search? Or would you have a suggestion on a workaround to achieve this kind of result? I tried changing log probabilities in the lm arpa file but the results are disappointing
from ctcdecode.
Related Issues (20)
- I can't dockerize my project with ctcdecode library HOT 1
- "Symbol not found in flat name space" in MacOS (macbook pro M1) when importing ctcdecode HOT 1
- pip install error on Linux HOT 2
- out_len and out_lens
- Calculation of "beam_scores" and gradient HOT 2
- probelm on installing ctcdecode in window
- probelm when installing ctcdecode in window HOT 1
- You can migrate your code to kaggle, install the package through kaggle. See the website below for installation.
- RuntimeError: Resource temporarily unavailable
- could the ctcdecode be serialized and executed in non-Python environments, such as torchscript?
- #error C++17 or later compatible compiler is required to use PyTorch HOT 1
- pip install . errro on colab HOT 3
- Unable to run the pip install . command HOT 1
- How to use timesteps information to calculate word-level alignments? HOT 2
- Can you list the dependencies like torch version, c++ and others HOT 1
- PIP install issue Windows HOT 1
- Can we just update this repo HOT 1
- Share how to install successful "ctcdecode" HOT 6
- beam scores of shape (batch, num_beams, time_steps)
- Modification for adapting newer C++17 and Pytorch 2.0+ HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ctcdecode.