GithubHelp home page GithubHelp logo

SWBD Recipe Error about espresso HOT 17 CLOSED

freewym avatar freewym commented on August 15, 2024
SWBD Recipe Error

from espresso.

Comments (17)

freewym avatar freewym commented on August 15, 2024

Does temporally unset LC_ALL as LC_ALL= python3 ../../scripts/spm_encode.py help?

from espresso.

annamine avatar annamine commented on August 15, 2024

still same error unfortunately

from espresso.

freewym avatar freewym commented on August 15, 2024

what if you set LC_ALL= around snippet:
LC_ALL=
cut -f 2- -d" " $text | \ python3 ../../scripts/spm_encode.py --model=${sentencepiece_model}.model --output_format=piece | \ paste -d" " <(cut -f 1 -d" " $text) - > $token_text cut -f 2- -d" " $token_text > $lmdatadir/$dataset.tokens
LC_ALL=C

from espresso.

jinpoon avatar jinpoon commented on August 15, 2024

I had the same issues for librispeech, I did LANG="" cut -f 2- -d" " $text | \ python3 ../../scripts/spm_encode.py .... and it worked for me.

from espresso.

freewym avatar freewym commented on August 15, 2024

Hmm... $LANG in my environment is en_US.UTF-8, and I don't have this problem. Maybe you can check your default $LANG value

from espresso.

annamine avatar annamine commented on August 15, 2024

My $LANG environment is en_GB.UTF-8 and I also tried to set LANG="" cut -f 2- -d" " $text | \ python3 ../../scripts/spm_encode.py .... but both still returning same error message

from espresso.

annamine avatar annamine commented on August 15, 2024

I have the same error for Librispeech recipe too

from espresso.

freewym avatar freewym commented on August 15, 2024

Sorry I am not in your environment so it's not easy for me to debug. I just googled the error message, and all I could find is export LANG=en_US.UTF-8 or export LC_ALL=en_US.UTF-8, or export PYTHONIOENCODING=utf-8.

from espresso.

marthayifiru avatar marthayifiru commented on August 15, 2024

Hi,
Thanks a lot for the wonderful tool. I tried to built a model for African language using the WSJ recipe. Language and acoustic model training finished without error after setting LANG=en_US.UTF-8, LC_ALL=en_US.UTF-8, PYTHONIOENCODING=utf-8.

I have now error during decoding, the error message is
UnicodeEncodeError: 'ascii' codec cant' encode characters in position 11-14: ordinal not in range(128)

I tried to include various online recommendations for the problem in speech_recognize.py, but could not solve the problem.

Could you help.

from espresso.

freewym avatar freewym commented on August 15, 2024

Hi,
Thanks a lot for the wonderful tool. I tried to built a model for African language using the WSJ recipe. Language and acoustic model training finished without error after setting LANG=en_US.UTF-8, LC_ALL=en_US.UTF-8, PYTHONIOENCODING=utf-8.

I have now error during decoding, the error message is
UnicodeEncodeError: 'ascii' codec cant' encode characters in position 11-14: ordinal not in range(128)

I tried to include various online recommendations for the problem in speech_recognize.py, but could not solve the problem.

Could you help.

which line does it happen at?

from espresso.

marthayifiru avatar marthayifiru commented on August 15, 2024

Hi,
Thanks for your prompt reply.

which line does it happen at?

The lines are 297, 293, 39 and 191 as shown in the following message.

loading model(s) from exp/lstm/checkpoint_best.pt:exp/lm_lstm/checkpoint_best.pt
LM fusion with Subword LM
using LM fusion with lm-weight=0.70
0%| | 0/26 [00:00<?, ?it/s]/pytorch/aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
Traceback (most recent call last):
File "/home/myt_002/espresso/examples/Tigrigna_E2E_ASR/../../espresso/speech_recognize.py", line 297, in
cli_main()
File "/home/myt_002/espresso/examples/Tigrigna_E2E_ASR/../../espresso/speech_recognize.py", line 293, in cli_main
main(args)
File "/home/myt_002/espresso/examples/Tigrigna_E2E_ASR/../../espresso/speech_recognize.py", line 39, in main
return _main(args, h)
File "/home/myt_002/espresso/examples/Tigrigna_E2E_ASR/../../espresso/speech_recognize.py", line 191, in _main
print('T-{}\t{}'.format(utt_id, detok_target_str), file=output_file)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 11-14: ordinal not in range(128)

Best regards,

from espresso.

freewym avatar freewym commented on August 15, 2024

I would first print T-{}\t{}'.format(utt_id, detok_target_str) to the screen to see if the string is displayed normally. If yes, then the problem may be when it gets written out to output_file, then I would try add encoding argument at line 38 as open(output_path, 'w', buffering=1, encoding='utf-8')

from espresso.

annamine avatar annamine commented on August 15, 2024

Hi just to give an update, I managed to run the section of code without error now by changing the global paths. Thanks for your help & advice!

from espresso.

marthayifiru avatar marthayifiru commented on August 15, 2024

Thanks a lot. Adding the encoding argument at line 38 solved the problem. I can now decode without a problem. Thanks a lot.

Do you have a recipe for multilingual training?

Best regards.

from espresso.

freewym avatar freewym commented on August 15, 2024

Hi just to give an update, I managed to run the section of code without error now by changing the global paths. Thanks for your help & advice!

Cool. What do you mean by "global paths"?

from espresso.

freewym avatar freewym commented on August 15, 2024

Thanks a lot. Adding the encoding argument at line 38 solved the problem. I can now decode without a problem. Thanks a lot.

Do you have a recipe for multilingual training?

Best regards.

No. I don't have one yet.

from espresso.

annamine avatar annamine commented on August 15, 2024

Hi just to give an update, I managed to run the section of code without error now by changing the global paths. Thanks for your help & advice!

Cool. What do you mean by "global paths"?

I just needed to modify the path script, for my environment, to use the same en_US-UTF8 to stop the sorting error

from espresso.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.