Comments (4)
... the error message you get when using --force_bytes_output_mode off without passing the checkpoint option is not very helpful ...
I've updated the error message like so:
generate_scorer_package --lm /mnt/lm/lm.binary --vocab /mnt/lm/vocab-500000.txt --package /mnt/lm/kenlm.scorer --default_alpha 0 --default_beta 0
500000 unique words read from vocabulary file.
Doesn't look like a character based (Bytes Are All You Need) model.
--force_bytes_output_mode was not specified, using value infered from vocabulary contents: false
No --checkpoint path specified, not using bytes output mode, can't continue.
Checkpoint path must contain an alphabet.
Start by creating an alphabet for your models using coqui_stt_training.util.check_characters if needed.
python -m coqui_stt_training.util.check_characters \
--csv-files ... \
--alphabet-format | grep -v '^#' | sort -n > models/alphabet.txt
This will create an alphabet models/alphabet.txt.
Now rerun this script by giving models/ as the checkpoint path.
generate_scorer_package \
--checkpoint models/ \
...
It's already on main
but won't be introduced into the stable code base before version 1.5.0.
For those who want this patch early, you'll need to build generate_scorer_package
manually since we pull the pre-built binary file from the latest release.
Lines 84 to 86 in 15bef27
Checkout the docs to build binaries or this comment I made under my logs for #2330 which introduced the reprog.
from stt.
You are probably using an older example here. The --alphabet flag in generate_scorer_package.py is replaced with the --checkpoint flag. Actually, it does not rely on checkpoint data, but the checkpoint directory contains the alphabet and it uses it.
Please see here: https://stt.readthedocs.io/en/latest/playbook/SCORER.html
from stt.
Closing as it’s not an issue and @HarikalarKutusu pointed out the error in op’s command flow
from stt.
Hi, sorry for the late reply. Checking to the --checkpoint
flag indeed helped me out. (I was previously ignoring that option because I didn't have any checkpoint files and the language model itself is passed separately so it felt like it didn't apply)
So there's no bug indeed. Note however that the error message you get when using --force_bytes_output_mode off
without passing the checkpoint option is not very helpful:
No --alphabet file specified, not using bytes output mode, can't continue.
How about "No alphabet file found and bytes output mode is off, can't continue. Did you pass a checkpoint directory?"
from stt.
Related Issues (20)
- Bug: lm_optimize fails due to check failing. HOT 2
- Bug: Segmentation Fault HOT 1
- Bug: Scorer.fill_dictiomary() Python function throws SWIG exception
- Feature request: Multiple Parallel/Concatenatable Models
- Bug: Android couldn`t find libstt-jni.so
- Feature request: Cancel previous workflow actions
- Feature request: Tensorflow 2.0 compatibility HOT 11
- Feature request: add Typescript @types for the WASM bindings
- Update `genrate_scorer_package` error message when not given any `checkpoint` HOT 1
- Bug: Update `Python` inside `Dockerfile.build` HOT 1
- Bug: Illegal Hard Instruction on generate_scorer_package
- Improvment: `NotFoundError`: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for `best_dev_checkpoint` HOT 2
- Bug: stt complains libbz2.so.1.0 not found HOT 6
- Bug: "import stt" works in notebook but not in bash command HOT 1
- Feature request: Replace Scorer.KenLM with Scorer.Transform HOT 18
- Bug: Importer `import_librivox.py` can't render absolute path of WAV files in CSV HOT 2
- Bug: Update `set-output` calls for ci pipeline v3 HOT 1
- Upload missing aarch64 and arm32 wheels to PyPi
- Bug: Model zoo seems to be gone HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stt.