I am trying to train a mode on a custom dataset. All the data preparation stages a

next(loaded_json.items()) -> <code class="notransl

Error in training stage of run_chain_e2e_bichar.sh: 'odict_items' object is not an iterator about espresso HOT 4 CLOSED

freewym commented on September 17, 2024

Error in training stage of run_chain_e2e_bichar.sh: 'odict_items' object is not an iterator

from espresso.

Comments (4)

freewym commented on September 17, 2024 1

next(loaded_json.items()) -> next(iter(loaded_json.items()))

Hopefully it will fix the issue (pushed)

from espresso.

freewym commented on September 17, 2024 1

great. pushed the fix as well.

from espresso.

mohsen-goodarzi commented on September 17, 2024

UPDATED:

next(loaded_json.items()) -> next(iter(loaded_json.items()))

Hopefully it will fix the issue (pushed)

Thank you! It solved that issue, ...
... but now I stuck with following issue (Although I don't think it is related to above one!):

Traceback (most recent call last): File "../../fairseq_cli/train.py", line 510, in <module> cli_main() File "../../fairseq_cli/train.py", line 503, in cli_main distributed_utils.call_main(cfg, main) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/fairseq/distributed/utils.py", line 369, in call_main main(cfg, **kwargs) File "../../fairseq_cli/train.py", line 86, in main task = tasks.setup_task(cfg.task) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/fairseq/tasks/__init__.py", line 44, in setup_task return task.setup_task(cfg, **kwargs) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/tasks/speech_recognition_hybrid.py", line 423, in setup_task src_dataset = get_asr_dataset_from_json(data_path, split, dictionary, combine=False).src File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/tasks/speech_recognition_hybrid.py", line 262, in get_asr_dataset_from_json text_datasets.append(AsrTextDataset(utt_ids, text, dictionary, append_eos=False)) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/data/feat_text_dataset.py", line 287, in __init__ self.read_text(utt_ids, texts, dictionary) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/data/feat_text_dataset.py", line 302, in read_text self.sizes = [len(tokenize_line(text)) for text in texts] File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/data/feat_text_dataset.py", line 302, in <listcomp> self.sizes = [len(tokenize_line(text)) for text in texts] NameError: free variable 'tokenize_line' referenced before assignment in enclosing scope

from espresso.

mohsen-goodarzi commented on September 17, 2024

UPDATED:

next(loaded_json.items()) -> next(iter(loaded_json.items()))
Hopefully it will fix the issue (pushed)

Thank you! It solved that issue, ...
... but now I stuck with following issue (Although I don't think it is related to above one!):

Traceback (most recent call last): File "../../fairseq_cli/train.py", line 510, in <module> cli_main() File "../../fairseq_cli/train.py", line 503, in cli_main distributed_utils.call_main(cfg, main) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/fairseq/distributed/utils.py", line 369, in call_main main(cfg, **kwargs) File "../../fairseq_cli/train.py", line 86, in main task = tasks.setup_task(cfg.task) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/fairseq/tasks/__init__.py", line 44, in setup_task return task.setup_task(cfg, **kwargs) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/tasks/speech_recognition_hybrid.py", line 423, in setup_task src_dataset = get_asr_dataset_from_json(data_path, split, dictionary, combine=False).src File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/tasks/speech_recognition_hybrid.py", line 262, in get_asr_dataset_from_json text_datasets.append(AsrTextDataset(utt_ids, text, dictionary, append_eos=False)) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/data/feat_text_dataset.py", line 287, in __init__ self.read_text(utt_ids, texts, dictionary) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/data/feat_text_dataset.py", line 302, in read_text self.sizes = [len(tokenize_line(text)) for text in texts] File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/data/feat_text_dataset.py", line 302, in <listcomp> self.sizes = [len(tokenize_line(text)) for text in texts] NameError: free variable 'tokenize_line' referenced before assignment in enclosing scope

I solved it by taking the following import line out of if block(line 296 of espresso/espresso/data/feat_text_dataset.py):

        from fairseq.tokenizer import tokenize_line
        if dictionary is not None:
            

            self.sizes = [
                len(tokenize_line(dictionary.wordpiece_encode(text))) + (1 if self.append_eos else 0)
                for text in texts
            ]
        else:
            self.sizes = [len(tokenize_line(text)) for text in texts]

from espresso.

Error in training stage of run_chain_e2e_bichar.sh: 'odict_items' object is not an iterator about espresso HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs