Hi, I am trying to run the SWBD recipe on my local machine. I am getting errors at Sta

Does temporally unset LC_ALL as <code class="notransl

what if you set LC_ALL= around snippet: <code cla

Hmm... $LANG in my environment is <code class="notran

SWBD Recipe Error about espresso HOT 17 CLOSED

freewym commented on August 15, 2024

SWBD Recipe Error

from espresso.

Comments (17)

freewym commented on August 15, 2024

Does temporally unset LC_ALL as LC_ALL= python3 ../../scripts/spm_encode.py help?

from espresso.

annamine commented on August 15, 2024

still same error unfortunately

from espresso.

freewym commented on August 15, 2024

what if you set LC_ALL= around snippet:
LC_ALL=
cut -f 2- -d" " $text | \ python3 ../../scripts/spm_encode.py --model=${sentencepiece_model}.model --output_format=piece | \ paste -d" " <(cut -f 1 -d" " $text) - > $token_text cut -f 2- -d" " $token_text > $lmdatadir/$dataset.tokens
LC_ALL=C

from espresso.

jinpoon commented on August 15, 2024

I had the same issues for librispeech, I did LANG="" cut -f 2- -d" " $text | \ python3 ../../scripts/spm_encode.py .... and it worked for me.

from espresso.

freewym commented on August 15, 2024

Hmm... $LANG in my environment is en_US.UTF-8, and I don't have this problem. Maybe you can check your default $LANG value

from espresso.

annamine commented on August 15, 2024

My $LANG environment is en_GB.UTF-8 and I also tried to set LANG="" cut -f 2- -d" " $text | \ python3 ../../scripts/spm_encode.py .... but both still returning same error message

from espresso.

annamine commented on August 15, 2024

I have the same error for Librispeech recipe too

from espresso.

freewym commented on August 15, 2024

Sorry I am not in your environment so it's not easy for me to debug. I just googled the error message, and all I could find is export LANG=en_US.UTF-8 or export LC_ALL=en_US.UTF-8, or export PYTHONIOENCODING=utf-8.

from espresso.

marthayifiru commented on August 15, 2024

Hi,
Thanks a lot for the wonderful tool. I tried to built a model for African language using the WSJ recipe. Language and acoustic model training finished without error after setting LANG=en_US.UTF-8, LC_ALL=en_US.UTF-8, PYTHONIOENCODING=utf-8.

I have now error during decoding, the error message is
UnicodeEncodeError: 'ascii' codec cant' encode characters in position 11-14: ordinal not in range(128)

I tried to include various online recommendations for the problem in speech_recognize.py, but could not solve the problem.

Could you help.

from espresso.

freewym commented on August 15, 2024

Hi,
Thanks a lot for the wonderful tool. I tried to built a model for African language using the WSJ recipe. Language and acoustic model training finished without error after setting LANG=en_US.UTF-8, LC_ALL=en_US.UTF-8, PYTHONIOENCODING=utf-8.

I have now error during decoding, the error message is
UnicodeEncodeError: 'ascii' codec cant' encode characters in position 11-14: ordinal not in range(128)

I tried to include various online recommendations for the problem in speech_recognize.py, but could not solve the problem.

Could you help.

which line does it happen at?

from espresso.

marthayifiru commented on August 15, 2024

Hi,
Thanks for your prompt reply.

which line does it happen at?

The lines are 297, 293, 39 and 191 as shown in the following message.

loading model(s) from exp/lstm/checkpoint_best.pt:exp/lm_lstm/checkpoint_best.pt
LM fusion with Subword LM
using LM fusion with lm-weight=0.70
0%| | 0/26 [00:00<?, ?it/s]/pytorch/aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
Traceback (most recent call last):
File "/home/myt_002/espresso/examples/Tigrigna_E2E_ASR/../../espresso/speech_recognize.py", line 297, in
cli_main()
File "/home/myt_002/espresso/examples/Tigrigna_E2E_ASR/../../espresso/speech_recognize.py", line 293, in cli_main
main(args)
File "/home/myt_002/espresso/examples/Tigrigna_E2E_ASR/../../espresso/speech_recognize.py", line 39, in main
return _main(args, h)
File "/home/myt_002/espresso/examples/Tigrigna_E2E_ASR/../../espresso/speech_recognize.py", line 191, in _main
print('T-{}\t{}'.format(utt_id, detok_target_str), file=output_file)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 11-14: ordinal not in range(128)

Best regards,

from espresso.

freewym commented on August 15, 2024

I would first print T-{}\t{}'.format(utt_id, detok_target_str) to the screen to see if the string is displayed normally. If yes, then the problem may be when it gets written out to output_file, then I would try add encoding argument at line 38 as open(output_path, 'w', buffering=1, encoding='utf-8')

from espresso.

annamine commented on August 15, 2024

Hi just to give an update, I managed to run the section of code without error now by changing the global paths. Thanks for your help & advice!

from espresso.

marthayifiru commented on August 15, 2024

Thanks a lot. Adding the encoding argument at line 38 solved the problem. I can now decode without a problem. Thanks a lot.

Do you have a recipe for multilingual training?

Best regards.

from espresso.

freewym commented on August 15, 2024

Hi just to give an update, I managed to run the section of code without error now by changing the global paths. Thanks for your help & advice!

Cool. What do you mean by "global paths"?

from espresso.

freewym commented on August 15, 2024

Thanks a lot. Adding the encoding argument at line 38 solved the problem. I can now decode without a problem. Thanks a lot.

Do you have a recipe for multilingual training?

Best regards.

No. I don't have one yet.

from espresso.

annamine commented on August 15, 2024

Hi just to give an update, I managed to run the section of code without error now by changing the global paths. Thanks for your help & advice!

Cool. What do you mean by "global paths"?

I just needed to modify the path script, for my environment, to use the same en_US-UTF8 to stop the sorting error

from espresso.

SWBD Recipe Error about espresso HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs