GithubHelp home page GithubHelp logo

Data Error ? about nmt-keras HOT 6 CLOSED

VP007-py avatar VP007-py commented on May 29, 2024
Data Error ?

from nmt-keras.

Comments (6)

VP007-py avatar VP007-py commented on May 29, 2024 1

Hey, apologies for not updating...the current version works perfectly fine !

from nmt-keras.

lvapeab avatar lvapeab commented on May 29, 2024

Hi Vinay,

Unfortunately, I'm unable to reproduce the error. Please, attach the config.py file and make sure you are working with the latest version. If you modified data_engine/prepare_data.py, please also share it.

That being said, my guess is that you are calling setInput when you want to call setRawOutput somewhere in data_engine/prepare_data.py. However, note that setRawOutput was removed in 4ba94e2 as it made not sense to keep this in the dataset for the general use case.

from nmt-keras.

VP007-py avatar VP007-py commented on May 29, 2024

Once again, I get a similar error for different datasets. I did check the parallel corpora and there are no issues with it

Using TensorFlow backend.
[11/08/2020 19:49:41] Limited tf.compat.v2.summary API due to missing TensorBoard installation.
[11/08/2020 19:49:44] Running training.
[11/08/2020 19:49:44] Building Newdataset_hien dataset
[11/08/2020 19:49:45] 	Applying tokenization function: "tokenize_none".
[11/08/2020 19:49:45] Creating vocabulary for data with data_id 'target_text'.
[11/08/2020 19:49:46] 	 Total: 97033 unique words in 95000 sentences with a total of 1977052 words.
[11/08/2020 19:49:46] Creating dictionary of all words
[11/08/2020 19:49:47] Loaded "train" set outputs of data_type "text-features" with data_id "target_text" and length 95000.
[11/08/2020 19:49:47] 	Applying tokenization function: "tokenize_none".
[11/08/2020 19:49:47] Loaded "val" set outputs of data_type "text" with data_id "target_text" and length 5000.
[11/08/2020 19:49:47] 	Applying tokenization function: "tokenize_none".
[11/08/2020 19:49:47] Loaded "test" set outputs of data_type "text" with data_id "target_text" and length 2500.
[11/08/2020 19:49:47] 	Applying tokenization function: "tokenize_none".
Traceback (most recent call last):
  File "main.py", line 51, in <module>
    train_model(parameters, args.dataset)
  File "/home/pandramish.vinay/nmt-keras/nmt_keras/training.py", line 74, in train_model
    dataset = build_dataset(params)
  File "/home/pandramish.vinay/nmt-keras/data_engine/prepare_data.py", line 185, in build_dataset
    bpe_codes=params.get('BPE_CODES_PATH', None))
  File "/home/pandramish.vinay/.local/lib/python3.5/site-packages/keras_wrapper/dataset.py", line 1204, in setInput
    use_unk_class=use_unk_class)
  File "/home/pandramish.vinay/.local/lib/python3.5/site-packages/keras_wrapper/dataset.py", line 2097, in preprocessTextFeatures
    '" in order to process the type "text" data. Set "build_vocabulary" to True if you want to use the current data for building the vocabulary.')
Exception: The dataset must include a vocabulary with data_id "source_text" in order to process the type "text" data. Set "build_vocabulary" to True if you want to use the current data for building the vocabulary.

``

from nmt-keras.

lvapeab avatar lvapeab commented on May 29, 2024

Did you set build_vocabulary = True when building the Dataset object?

from nmt-keras.

VP007-py avatar VP007-py commented on May 29, 2024

I did enable build_vocabulary = True in ds.setInput here and the same error occurs somtimes

from nmt-keras.

lvapeab avatar lvapeab commented on May 29, 2024

Sometimes it fails... but other times it works?
Weird

Can you share your config.py file?

from nmt-keras.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.