This is the error log I get while reinstalling and running the model with <code class=

Data Error ? about nmt-keras HOT 6 CLOSED

VP007-py commented on May 29, 2024

Data Error ?

from nmt-keras.

Comments (6)

VP007-py commented on May 29, 2024 1

Hey, apologies for not updating...the current version works perfectly fine !

from nmt-keras.

lvapeab commented on May 29, 2024

Hi Vinay,

Unfortunately, I'm unable to reproduce the error. Please, attach the config.py file and make sure you are working with the latest version. If you modified data_engine/prepare_data.py, please also share it.

That being said, my guess is that you are calling setInput when you want to call setRawOutput somewhere in data_engine/prepare_data.py. However, note that setRawOutput was removed in 4ba94e2 as it made not sense to keep this in the dataset for the general use case.

from nmt-keras.

VP007-py commented on May 29, 2024

Once again, I get a similar error for different datasets. I did check the parallel corpora and there are no issues with it

Using TensorFlow backend.
[11/08/2020 19:49:41] Limited tf.compat.v2.summary API due to missing TensorBoard installation.
[11/08/2020 19:49:44] Running training.
[11/08/2020 19:49:44] Building Newdataset_hien dataset
[11/08/2020 19:49:45] 	Applying tokenization function: "tokenize_none".
[11/08/2020 19:49:45] Creating vocabulary for data with data_id 'target_text'.
[11/08/2020 19:49:46] 	 Total: 97033 unique words in 95000 sentences with a total of 1977052 words.
[11/08/2020 19:49:46] Creating dictionary of all words
[11/08/2020 19:49:47] Loaded "train" set outputs of data_type "text-features" with data_id "target_text" and length 95000.
[11/08/2020 19:49:47] 	Applying tokenization function: "tokenize_none".
[11/08/2020 19:49:47] Loaded "val" set outputs of data_type "text" with data_id "target_text" and length 5000.
[11/08/2020 19:49:47] 	Applying tokenization function: "tokenize_none".
[11/08/2020 19:49:47] Loaded "test" set outputs of data_type "text" with data_id "target_text" and length 2500.
[11/08/2020 19:49:47] 	Applying tokenization function: "tokenize_none".
Traceback (most recent call last):
  File "main.py", line 51, in <module>
    train_model(parameters, args.dataset)
  File "/home/pandramish.vinay/nmt-keras/nmt_keras/training.py", line 74, in train_model
    dataset = build_dataset(params)
  File "/home/pandramish.vinay/nmt-keras/data_engine/prepare_data.py", line 185, in build_dataset
    bpe_codes=params.get('BPE_CODES_PATH', None))
  File "/home/pandramish.vinay/.local/lib/python3.5/site-packages/keras_wrapper/dataset.py", line 1204, in setInput
    use_unk_class=use_unk_class)
  File "/home/pandramish.vinay/.local/lib/python3.5/site-packages/keras_wrapper/dataset.py", line 2097, in preprocessTextFeatures
    '" in order to process the type "text" data. Set "build_vocabulary" to True if you want to use the current data for building the vocabulary.')
Exception: The dataset must include a vocabulary with data_id "source_text" in order to process the type "text" data. Set "build_vocabulary" to True if you want to use the current data for building the vocabulary.

``

from nmt-keras.

lvapeab commented on May 29, 2024

Did you set build_vocabulary = True when building the Dataset object?

from nmt-keras.

VP007-py commented on May 29, 2024

I did enable build_vocabulary = True in ds.setInput here and the same error occurs somtimes

from nmt-keras.

lvapeab commented on May 29, 2024

Sometimes it fails... but other times it works?
Weird

Can you share your config.py file?

from nmt-keras.

Data Error ? about nmt-keras HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs