GithubHelp home page GithubHelp logo

Comments (4)

wasertech avatar wasertech commented on June 11, 2024 1

... the error message you get when using --force_bytes_output_mode off without passing the checkpoint option is not very helpful ...

I've updated the error message like so:

generate_scorer_package --lm /mnt/lm/lm.binary --vocab /mnt/lm/vocab-500000.txt --package /mnt/lm/kenlm.scorer --default_alpha 0 --default_beta 0
500000 unique words read from vocabulary file.
Doesn't look like a character based (Bytes Are All You Need) model.
--force_bytes_output_mode was not specified, using value infered from vocabulary contents: false
No --checkpoint path specified, not using bytes output mode, can't continue.
Checkpoint path must contain an alphabet.
Start by creating an alphabet for your models using coqui_stt_training.util.check_characters if needed.

    python -m coqui_stt_training.util.check_characters \
                                --csv-files ... \
                                --alphabet-format | grep -v '^#' | sort -n > models/alphabet.txt

This will create an alphabet models/alphabet.txt.
Now rerun this script by giving models/ as the checkpoint path.

    generate_scorer_package  \
                --checkpoint models/ \
                ...

It's already on main but won't be introduced into the stable code base before version 1.5.0.

For those who want this patch early, you'll need to build generate_scorer_package manually since we pull the pre-built binary file from the latest release.

STT/Dockerfile.train

Lines 84 to 86 in 15bef27

# Pre-built native client tools
RUN LATEST_STABLE_RELEASE=$(curl "https://api.github.com/repos/coqui-ai/STT/releases/latest" | python -c 'import sys; import json; print(json.load(sys.stdin)["tag_name"])') \
bash -c 'curl -L https://github.com/coqui-ai/STT/releases/download/${LATEST_STABLE_RELEASE}/native_client.tflite.Linux.tar.xz | tar -xJvf -'

Checkout the docs to build binaries or this comment I made under my logs for #2330 which introduced the reprog.

from stt.

HarikalarKutusu avatar HarikalarKutusu commented on June 11, 2024

You are probably using an older example here. The --alphabet flag in generate_scorer_package.py is replaced with the --checkpoint flag. Actually, it does not rely on checkpoint data, but the checkpoint directory contains the alphabet and it uses it.

Please see here: https://stt.readthedocs.io/en/latest/playbook/SCORER.html

from stt.

wasertech avatar wasertech commented on June 11, 2024

Closing as it’s not an issue and @HarikalarKutusu pointed out the error in op’s command flow

from stt.

poohsen avatar poohsen commented on June 11, 2024

Hi, sorry for the late reply. Checking to the --checkpoint flag indeed helped me out. (I was previously ignoring that option because I didn't have any checkpoint files and the language model itself is passed separately so it felt like it didn't apply)

So there's no bug indeed. Note however that the error message you get when using --force_bytes_output_mode off without passing the checkpoint option is not very helpful:

No --alphabet file specified, not using bytes output mode, can't continue.

How about "No alphabet file found and bytes output mode is off, can't continue. Did you pass a checkpoint directory?"

from stt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.