GithubHelp home page GithubHelp logo

Comments (4)

freewym avatar freewym commented on September 17, 2024 1

next(loaded_json.items()) -> next(iter(loaded_json.items()))

Hopefully it will fix the issue (pushed)

from espresso.

freewym avatar freewym commented on September 17, 2024 1

great. pushed the fix as well.

from espresso.

mohsen-goodarzi avatar mohsen-goodarzi commented on September 17, 2024

UPDATED:

next(loaded_json.items()) -> next(iter(loaded_json.items()))

Hopefully it will fix the issue (pushed)

Thank you! It solved that issue, ...
... but now I stuck with following issue (Although I don't think it is related to above one!):

Traceback (most recent call last): File "../../fairseq_cli/train.py", line 510, in <module> cli_main() File "../../fairseq_cli/train.py", line 503, in cli_main distributed_utils.call_main(cfg, main) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/fairseq/distributed/utils.py", line 369, in call_main main(cfg, **kwargs) File "../../fairseq_cli/train.py", line 86, in main task = tasks.setup_task(cfg.task) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/fairseq/tasks/__init__.py", line 44, in setup_task return task.setup_task(cfg, **kwargs) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/tasks/speech_recognition_hybrid.py", line 423, in setup_task src_dataset = get_asr_dataset_from_json(data_path, split, dictionary, combine=False).src File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/tasks/speech_recognition_hybrid.py", line 262, in get_asr_dataset_from_json text_datasets.append(AsrTextDataset(utt_ids, text, dictionary, append_eos=False)) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/data/feat_text_dataset.py", line 287, in __init__ self.read_text(utt_ids, texts, dictionary) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/data/feat_text_dataset.py", line 302, in read_text self.sizes = [len(tokenize_line(text)) for text in texts] File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/data/feat_text_dataset.py", line 302, in <listcomp> self.sizes = [len(tokenize_line(text)) for text in texts] NameError: free variable 'tokenize_line' referenced before assignment in enclosing scope

from espresso.

mohsen-goodarzi avatar mohsen-goodarzi commented on September 17, 2024

UPDATED:

next(loaded_json.items()) -> next(iter(loaded_json.items()))
Hopefully it will fix the issue (pushed)

Thank you! It solved that issue, ...
... but now I stuck with following issue (Although I don't think it is related to above one!):

Traceback (most recent call last): File "../../fairseq_cli/train.py", line 510, in <module> cli_main() File "../../fairseq_cli/train.py", line 503, in cli_main distributed_utils.call_main(cfg, main) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/fairseq/distributed/utils.py", line 369, in call_main main(cfg, **kwargs) File "../../fairseq_cli/train.py", line 86, in main task = tasks.setup_task(cfg.task) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/fairseq/tasks/__init__.py", line 44, in setup_task return task.setup_task(cfg, **kwargs) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/tasks/speech_recognition_hybrid.py", line 423, in setup_task src_dataset = get_asr_dataset_from_json(data_path, split, dictionary, combine=False).src File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/tasks/speech_recognition_hybrid.py", line 262, in get_asr_dataset_from_json text_datasets.append(AsrTextDataset(utt_ids, text, dictionary, append_eos=False)) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/data/feat_text_dataset.py", line 287, in __init__ self.read_text(utt_ids, texts, dictionary) File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/data/feat_text_dataset.py", line 302, in read_text self.sizes = [len(tokenize_line(text)) for text in texts] File "/mnt/IDMT-WORKSPACE/DATA-STORE/gzi/train-lvcsr/espresso/espresso/data/feat_text_dataset.py", line 302, in <listcomp> self.sizes = [len(tokenize_line(text)) for text in texts] NameError: free variable 'tokenize_line' referenced before assignment in enclosing scope

I solved it by taking the following import line out of if block(line 296 of espresso/espresso/data/feat_text_dataset.py):

        from fairseq.tokenizer import tokenize_line
        if dictionary is not None:
            

            self.sizes = [
                len(tokenize_line(dictionary.wordpiece_encode(text))) + (1 if self.append_eos else 0)
                for text in texts
            ]
        else:
            self.sizes = [len(tokenize_line(text)) for text in texts]

from espresso.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.