Comments (6)
Hey, apologies for not updating...the current version works perfectly fine !
from nmt-keras.
Hi Vinay,
Unfortunately, I'm unable to reproduce the error. Please, attach the config.py
file and make sure you are working with the latest version. If you modified data_engine/prepare_data.py
, please also share it.
That being said, my guess is that you are calling setInput
when you want to call setRawOutput
somewhere in data_engine/prepare_data.py
. However, note that setRawOutput
was removed in 4ba94e2 as it made not sense to keep this in the dataset for the general use case.
from nmt-keras.
Once again, I get a similar error for different datasets. I did check the parallel corpora and there are no issues with it
Using TensorFlow backend.
[11/08/2020 19:49:41] Limited tf.compat.v2.summary API due to missing TensorBoard installation.
[11/08/2020 19:49:44] Running training.
[11/08/2020 19:49:44] Building Newdataset_hien dataset
[11/08/2020 19:49:45] Applying tokenization function: "tokenize_none".
[11/08/2020 19:49:45] Creating vocabulary for data with data_id 'target_text'.
[11/08/2020 19:49:46] Total: 97033 unique words in 95000 sentences with a total of 1977052 words.
[11/08/2020 19:49:46] Creating dictionary of all words
[11/08/2020 19:49:47] Loaded "train" set outputs of data_type "text-features" with data_id "target_text" and length 95000.
[11/08/2020 19:49:47] Applying tokenization function: "tokenize_none".
[11/08/2020 19:49:47] Loaded "val" set outputs of data_type "text" with data_id "target_text" and length 5000.
[11/08/2020 19:49:47] Applying tokenization function: "tokenize_none".
[11/08/2020 19:49:47] Loaded "test" set outputs of data_type "text" with data_id "target_text" and length 2500.
[11/08/2020 19:49:47] Applying tokenization function: "tokenize_none".
Traceback (most recent call last):
File "main.py", line 51, in <module>
train_model(parameters, args.dataset)
File "/home/pandramish.vinay/nmt-keras/nmt_keras/training.py", line 74, in train_model
dataset = build_dataset(params)
File "/home/pandramish.vinay/nmt-keras/data_engine/prepare_data.py", line 185, in build_dataset
bpe_codes=params.get('BPE_CODES_PATH', None))
File "/home/pandramish.vinay/.local/lib/python3.5/site-packages/keras_wrapper/dataset.py", line 1204, in setInput
use_unk_class=use_unk_class)
File "/home/pandramish.vinay/.local/lib/python3.5/site-packages/keras_wrapper/dataset.py", line 2097, in preprocessTextFeatures
'" in order to process the type "text" data. Set "build_vocabulary" to True if you want to use the current data for building the vocabulary.')
Exception: The dataset must include a vocabulary with data_id "source_text" in order to process the type "text" data. Set "build_vocabulary" to True if you want to use the current data for building the vocabulary.
``
from nmt-keras.
Did you set build_vocabulary = True
when building the Dataset object?
from nmt-keras.
I did enable build_vocabulary = True
in ds.setInput here and the same error occurs somtimes
from nmt-keras.
Sometimes it fails... but other times it works?
Weird
Can you share your config.py
file?
from nmt-keras.
Related Issues (20)
- Support for Factored Models ? HOT 1
- consume long time for predicting validation output HOT 3
- Confusion with opennmt-tf HOT 1
- Missing auto setup of required packages for running this library HOT 1
- How to use pretrained word2vec embeddings? HOT 1
- Getting error index out of range when training a Transformer model HOT 10
- Using CPU for inference with GPU-trained model HOT 20
- Evaluating perplexity HOT 4
- Getting error when using Tensorboard HOT 2
- Save perplexity on training and validation sets HOT 5
- Regd Rare Words/OOV Tokens ? HOT 9
- Sampling decoding HOT 1
- Strange behavior with plotting metrics for validation HOT 2
- Issue with ensemble scoring method HOT 3
- AssertionError: Reduction function "Noam" unimplemented! HOT 1
- Detecting multiple GPUs HOT 9
- Training Error HOT 1
- Conversion to TFJS HOT 1
- Example Colab Fails HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nmt-keras.