I trained a seq2seq model using CuDNN and plan to use this model on devices without GP

Using CPU for inference with GPU-trained model,about lvapeab/nmt-keras

Comments (20)

philipcori commented on May 28, 2024 1

Thanks. Just want to quickly note that to fix the issue Nam was facing, which I also faced, I needed to first save the CPU model with the saveModel() function provided by multimodal-keras-wrapper. Once it was saved, I could then load this model in a machine without a GPU.

from nmt-keras.

lvapeab commented on May 28, 2024

The CuDNN implementations of RNN layers are incompatible with CPU. So you need to create a new model that uses a regular RNN layer (GRU or LSTM) and load the weights from the CuDNN layer. Setting USE_CUDNN=False in params when creating the model will give you the non-cudnn model. Then, you can load the weights you want similarly as done here (that can be the CuDNN weights).

PS: Checking this, I've spotted a bug when converting CuDNN<->Regular (MarcBS/keras@258fea5). So you'll probably need to update Keras (pip install -e . -U)

from nmt-keras.

NamTran838P commented on May 28, 2024

I tried what you suggested and I got the following error. It seems that the updateModel function requires the epoch_X_weights.h5 file. However, from my previously trained epochs, I notice that I don't get epoch_X_weights.h5 files. How should I proceed?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/DCNFS/users/student/ntran1/Desktop/final_grammar_check/grammar_check.py", line 16, in __init__
    self.nmt_model = self.load_model()
  File "/DCNFS/users/student/ntran1/Desktop/final_grammar_check/grammar_check.py", line 85, in load_mode                               l
    model = updateModel(model, self.params['STORE_PATH'], self.params['RELOAD'], reload_epoch = self.par                               ams['RELOAD_EPOCH'])
  File "/home/ntran1/.local/lib/python3.6/site-packages/keras_wrapper/cnn_model.py", line 276, in update                               Model
    model.model.load_weights(model_path + '_weights.h5')
  File "/DCNFS/users/student/ntran1/Desktop/keras/keras/engine/saving.py", line 492, in load_wrapper
    return load_function(*args, **kwargs)
  File "/DCNFS/users/student/ntran1/Desktop/keras/keras/engine/network.py", line 1222, in load_weights
    with h5py.File(filepath, mode='r') as f:
  File "/home/ntran1/.local/lib/python3.6/site-packages/h5py/_hl/files.py", line 408, in __init__
    swmr=swmr)
  File "/home/ntran1/.local/lib/python3.6/site-packages/h5py/_hl/files.py", line 173, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = '/DCNFS/users/student/ntran1/Desktop/final_gra                               mmar_check/model/epoch_2_weights.h5', errno = 2, error message = 'No such file or directory', flags = 0,                                o_flags = 0)

from nmt-keras.

lvapeab commented on May 28, 2024

Can you please provide the full log of the error?

This error is due to Keras is trying to load a model stored in the h5 format, whic is deprecated. For ensure backward compatibility, the actual error is catched. So it actually occurs before this last one.

from nmt-keras.

NamTran838P commented on May 28, 2024

Here is the code snippet that I used (note that I am not using GPU when loading the model so the code fails in the else case):

 def load_model(self):

        self.params = load_parameters()
        is_transformer = self.params.get('ATTEND_ON_OUTPUT', 'transformer' in self.params['MODEL_TYPE'].lower())
        self.dataset = loadDataset("dataset/Dataset_tutorial_dataset.pkl")
        self.params_prediction = {
            'language': 'en',
            'tokenize_f': eval('self.dataset.' + 'tokenize_basic'),
            'beam_size': 2,
            'optimized_search': True,
            'model_inputs': self.params['INPUTS_IDS_MODEL'],
            'model_outputs': self.params['OUTPUTS_IDS_MODEL'],
            'dataset_inputs':  self.params['INPUTS_IDS_DATASET'],
            'dataset_outputs':  self.params['OUTPUTS_IDS_DATASET'],
            'n_parallel_loaders': 1,
            'maxlen': 50,
            'model_inputs': ['source_text', 'state_below'],
            'model_outputs': ['target_text'],
            'dataset_inputs': ['source_text', 'state_below'],
            'dataset_outputs': ['target_text'],
            'normalize': True,
            'pos_unk': True and not is_transformer,
            'heuristic': 0,
            'state_below_maxlen': -1,
            'predict_on_sets': ['test'],
            'verbose': 0,
            'length_penalty': True,
            'length_norm_factor': 1.0,
            'attend_on_output': is_transformer}

        if tf.test.is_gpu_available(): # GPU available

            self.params['INPUT_VOCABULARY_SIZE'] = self.dataset.vocabulary_len[self.params['INPUTS_IDS_DATASET'][0]]
            self.params['OUTPUT_VOCABULARY_SIZE'] = self.dataset.vocabulary_len[self.params['OUTPUTS_IDS_DATASET'][0]]
            model = loadModel('model', epoch_num)

        else: # GPU not available
           
            #self.params['INPUT_VOCABULARY_SIZE'] = self.dataset.vocabulary_len['source_text']
            #self.params['OUTPUT_VOCABULARY_SIZE'] = self.dataset.vocabulary_len['target_text']
            self.params['USE_CUDNN'] = False
            self.params['RELOAD'] = epoch_num
            self.params['STORE_PATH'] = os.getcwd() + "/dummy"
            self.params['ATTENTION_MODE'] = "add"
            self.params['N_LAYERS_ENCODER'] = 2
            self.params['N_LAYERS_DECODER'] = 2
            self.params['SOURCE_TEXT_EMBEDDING_SIZE'] = 128
            self.params['TARGET_TEXT_EMBEDDING_SIZE'] = 128
            self.params['SKIP_VECTORS_HIDDEN_SIZE'] = 128
            self.params['ATTENTION_SIZE'] = 128
            self.params['ENCODER_HIDDEN_SIZE'] = 128
            self.params['DECODER_HIDDEN_SIZE'] = 128
            self.params['ENCODER_RNN_TYPE'] = "GRU"
            self.params['DECODER_RNN_TYPE'] = "ConditionalGRU"
            self.params['METRICS'] = ['sacrebleu']
            self.params['STOP_METRIC'] = 'sacrebleu'
            self.params['DETOKENIZATION_METHOD'] = 'detokenize_basic'
            self.params['APPLY_DETOKENIZATION'] = True
            self.params['LENGTH_PENALTY'] = True
            self.params['LENGTH_NORM_FACTOR'] = 1.0
            """Now, we create a `TranslationModel` instance:"""
            model = TranslationModel(self.params,
                                         model_type='AttentionRNNEncoderDecoder', 
                                         model_name='tutorial_model',
                                         vocabularies=self.dataset.vocabulary,
                                         store_path=self.params['STORE_PATH'],
                                         verbose=True)

            model = updateModel(model, os.getcwd() + "/model", self.params['RELOAD'], reload_epoch = self.params['RELOAD_EPOCH'])
            model.setParams(self.params)

        return model

Here is the full log of the error encountered:

[12/04/2020 14:30:49] No OpKernel was registered to support Op 'CudnnRNN' used b                                                       y node bidirectional_encoder_GRU_1/CudnnRNN (defined at /DCNFS/users/student/ntr                                                       an1/Desktop/keras/keras/layers/cudnn_recurrent.py:297) with these attrs: [seed=8                                                       7654321, dropout=0, T=DT_FLOAT, input_mode="linear_input", direction="unidirecti                                                       onal", rnn_mode="gru", is_training=true, seed2=0]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
  <no registered kernels>

         [[bidirectional_encoder_GRU_1/CudnnRNN]]

Errors may have originated from an input operation.
Input Source operations connected to node bidirectional_encoder_GRU_1/CudnnRNN:
 bidirectional_encoder_GRU_1/ExpandDims_1 (defined at /DCNFS/users/student/ntran                                                       1/Desktop/keras/keras/layers/cudnn_recurrent.py:273)
 bidirectional_encoder_GRU_1/concat (defined at /DCNFS/users/student/ntran1/Desk                                                       top/keras/keras/layers/cudnn_recurrent.py:60)
 bidirectional_encoder_GRU_1/transpose (defined at /DCNFS/users/student/ntran1/D                                                       esktop/keras/keras/layers/cudnn_recurrent.py:271)
[12/04/2020 14:30:49] <<< Failed -> Loading model from /DCNFS/users/student/ntra                                                       n1/Desktop/final_grammar_check/model/epoch_2_weights.h5' ... >>>
Traceback (most recent call last):
  File "/home/ntran1/.local/lib/python3.6/site-packages/tensorflow/python/client                                                       /session.py", line 1356, in _do_call
    return fn(*args)
  File "/home/ntran1/.local/lib/python3.6/site-packages/tensorflow/python/client                                                       /session.py", line 1339, in _run_fn
    self._extend_graph()
  File "/home/ntran1/.local/lib/python3.6/site-packages/tensorflow/python/client                                                       /session.py", line 1374, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was re                                                       gistered to support Op 'CudnnRNN' used by {{node bidirectional_encoder_GRU_1/Cud                                                       nnRNN}}with these attrs: [seed=87654321, dropout=0, T=DT_FLOAT, input_mode="line                                                       ar_input", direction="unidirectional", rnn_mode="gru", is_training=true, seed2=0                                                       ]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
  <no registered kernels>

         [[bidirectional_encoder_GRU_1/CudnnRNN]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ntran1/.local/lib/python3.6/site-packages/keras_wrapper/cnn_model.                                                       py", line 269, in updateModel
    model.model.set_weights(load_model(model_path + '.h5', compile=False).get_we                                                       ights())
  File "/DCNFS/users/student/ntran1/Desktop/keras/keras/engine/saving.py", line                                                        492, in load_wrapper
    return load_function(*args, **kwargs)
  File "/DCNFS/users/student/ntran1/Desktop/keras/keras/engine/saving.py", line                                                        584, in load_model
    model = _deserialize_model(h5dict, custom_objects, compile)
  File "/DCNFS/users/student/ntran1/Desktop/keras/keras/engine/saving.py", line                                                        336, in _deserialize_model
    K.batch_set_value(weight_value_tuples)
  File "/DCNFS/users/student/ntran1/Desktop/keras/keras/backend/tensorflow_backe                                                       nd.py", line 3041, in batch_set_value
    get_session().run(assign_ops, feed_dict=feed_dict)
  File "/DCNFS/users/student/ntran1/Desktop/keras/keras/backend/tensorflow_backe                                                       nd.py", line 321, in get_session
    [tf.is_variable_initialized(v) for v in candidate_vars])
  File "/home/ntran1/.local/lib/python3.6/site-packages/tensorflow/python/client                                                       /session.py", line 950, in run
    run_metadata_ptr)
  File "/home/ntran1/.local/lib/python3.6/site-packages/tensorflow/python/client                                                       /session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/ntran1/.local/lib/python3.6/site-packages/tensorflow/python/client                                                       /session.py", line 1350, in _do_run
    run_metadata)
  File "/home/ntran1/.local/lib/python3.6/site-packages/tensorflow/python/client                                                       /session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was re                                                       gistered to support Op 'CudnnRNN' used by node bidirectional_encoder_GRU_1/Cudnn                                                       RNN (defined at /DCNFS/users/student/ntran1/Desktop/keras/keras/layers/cudnn_rec                                                       urrent.py:297) with these attrs: [seed=87654321, dropout=0, T=DT_FLOAT, input_mo                                                       de="linear_input", direction="unidirectional", rnn_mode="gru", is_training=true,                                                        seed2=0]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
  <no registered kernels>

         [[bidirectional_encoder_GRU_1/CudnnRNN]]

Errors may have originated from an input operation.
Input Source operations connected to node bidirectional_encoder_GRU_1/CudnnRNN:
 bidirectional_encoder_GRU_1/ExpandDims_1 (defined at /DCNFS/users/student/ntran                                                       1/Desktop/keras/keras/layers/cudnn_recurrent.py:273)
 bidirectional_encoder_GRU_1/concat (defined at /DCNFS/users/student/ntran1/Desk                                                       top/keras/keras/layers/cudnn_recurrent.py:60)
 bidirectional_encoder_GRU_1/transpose (defined at /DCNFS/users/student/ntran1/D                                                       esktop/keras/keras/layers/cudnn_recurrent.py:271)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/DCNFS/users/student/ntran1/Desktop/final_grammar_check/grammar_check.py                                                       ", line 16, in __init__
    self.nmt_model = self.load_model()
  File "/DCNFS/users/student/ntran1/Desktop/final_grammar_check/grammar_check.py                                                       ", line 85, in load_model
    model = updateModel(model, os.getcwd() + "/model", self.params['RELOAD'], re                                                       load_epoch = self.params['RELOAD_EPOCH'])
  File "/home/ntran1/.local/lib/python3.6/site-packages/keras_wrapper/cnn_model.                                                       py", line 276, in updateModel
    model.model.load_weights(model_path + '_weights.h5')
  File "/DCNFS/users/student/ntran1/Desktop/keras/keras/engine/saving.py", line                                                        492, in load_wrapper
    return load_function(*args, **kwargs)
  File "/DCNFS/users/student/ntran1/Desktop/keras/keras/engine/network.py", line                                                        1222, in load_weights
    with h5py.File(filepath, mode='r') as f:
  File "/home/ntran1/.local/lib/python3.6/site-packages/h5py/_hl/files.py", line                                                        408, in __init__
    swmr=swmr)
  File "/home/ntran1/.local/lib/python3.6/site-packages/h5py/_hl/files.py", line                                                        173, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = '/DCNFS/users/student/                                                       ntran1/Desktop/final_grammar_check/model/epoch_2_weights.h5', errno = 2, error m                                                       essage = 'No such file or directory', flags = 0, o_flags = 0)

from nmt-keras.

lvapeab commented on May 28, 2024

So, you need to create a new model and then, load the weights from the trained model:

	self.params = load_parameters()
	is_transformer = ... # Whatever
	self.params_prediction = {...} # Whatever
	...
    if tf.test.is_gpu_available(): # GPU available
		self.params['USE_CUDNN'] = True
    else: # GPU not available
		self.params['USE_CUDNN'] = False

	# This creates a new instance of the model. With the param `USE_CUDNN` properly set. 
    model = TranslationModel(self.params,
                                         model_type='AttentionRNNEncoderDecoder', 
                                         model_name='tutorial_model',
                                         vocabularies=self.dataset.vocabulary,
                                         store_path=self.params['STORE_PATH'],
                                         verbose=True)
	# Now we can load some weights (possibly trained on CuDNN).
   	model = updateModel(model, params['STORE_PATH'], params['RELOAD'], reload_epoch=params['RELOAD_EPOCH'])
    model.setParams(self.params)
    return model

Maybe it is useful for understanding it to visualize the different model summaries.

A non-cudnn model summary:

from config import load_parameters
from nmt_keras.model_zoo import TranslationModel
params = load_parameters()
params['BIDIRECTIONAL_ENCODER'] = False  # Set to False to get the RNN type displayed. 
params['USE_CUDNN'] = False
non_cudnn_model = model = TranslationModel(params, model_type='AttentionRNNEncoderDecoder')
non_cudnn_model.model.summary(line_length=250)

This shows that the encoder RNN is a LSTM layer:

Layer (type)                                                                      Output Shape                                           Param #                       Connected to                                                                       
==========================================================================================================================================================================================================================================================
source_text (InputLayer)                                                          (None, None)                                           0                                                                                                                
__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
source_word_embedding (Embedding)                                                 (None, None, 32)                                       0                             source_text[0][0]                                                                  
__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
src_embedding_batch_normalization (BatchNormalization)                            (None, None, 32)                                       128                           source_word_embedding[0][0]                                                        
__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
remove_mask_1 (RemoveMask)                                                        (None, None, 32)                                       0                             src_embedding_batch_normalization[0][0]                                            
__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
encoder_LSTM (LSTM)                                                               (None, None, 32)                                       8320                          remove_mask_1[0][0]                                                                
__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

And the CuDNN summary:

from config import load_parameters
from nmt_keras.model_zoo import TranslationModel
params = load_parameters()
params['USE_CUDNN'] = True
params['BIDIRECTIONAL_ENCODER'] = False  # Set to False to get the RNN type displayed. 
cudnn_model = model = TranslationModel(params, model_type='AttentionRNNEncoderDecoder')
cudnn_model.model.summary(line_length=250)

Shows that the encoder RNN is a CuDNNLSTM layer:

Layer (type)                                                                      Output Shape                                           Param #                       Connected to                                                                       
==========================================================================================================================================================================================================================================================
source_text (InputLayer)                                                          (None, None)                                           0                                                                                                                
__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
source_word_embedding (Embedding)                                                 (None, None, 32)                                       0                             source_text[0][0]                                                                  
__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
src_embedding_batch_normalization (BatchNormalization)                            (None, None, 32)                                       128                           source_word_embedding[0][0]                                                        
__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
remove_mask_1 (RemoveMask)                                                        (None, None, 32)                                       0                             src_embedding_batch_normalization[0][0]                                            
__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
encoder_LSTM (CuDNNLSTM)                                                          (None, None, 32)                                       8448                          remove_mask_1[0][0]                                                                
__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Hope this helps.

from nmt-keras.

NamTran838P commented on May 28, 2024

Thanks for your response. I have tried what you suggested and I still got the same error. I believe that this is because the .h5 file contains both the model structure and the weights. I found that in MarcBS' Keras saveModel function it is possible to save both the structure and the weights separately as _structure.json and _weights.h5 (Screenshot attached). However, that is part of a try-except clause so it is not possible for me to try this approach. Would it be possible to save the model as separate structure and weights files: _structure.json and _weights.h5?

from nmt-keras.

lvapeab commented on May 28, 2024

I've found a minor bug that prevented the correct update of the model (MarcBS/multimodal_keras_wrapper@32f9a7e). You need to upload the wrapper in order to solve it (pip install -U multimodal-keras-wrapper).

I've also made a notebook that makes this: training a CuDNN model, saving it, creating a non-CuDNN model and loading the CuDNN model. Hope it helps: https://gist.github.com/lvapeab/77c74d85115766aeeb8ad45a6e9a13bc

Finally, regarding the saveing function and the try/catches, those are for backwards compatibility with older Keras version, that saved models in separate files. Currently, we should save and load models as .h5 files.

from nmt-keras.

philipcori commented on May 28, 2024

Hello, I followed the tutorial you linked. I am getting an error regarding mismatch of input shapes, and it seems it is likely because of the fields of 'params' that I pass to the updateModel() function. Is this because my parameters don't match the shape of the model when I trained it? Unfortunately I don't remember the exact parameters of the model I am trying to load... Is this necessary for calling the updateModel() function? Normally I just call loadModel() which takes care of everything.

from nmt-keras.

lvapeab commented on May 28, 2024

That's because the updateModel only does that, to update an existing model.

Here are a couple of ways to retrieve the hyperparameters from a saved model:

You can load it and access to the params attribute of the model. This is a dictionary with the hyperparametres. E.g.:

m = loadModel('.', 1)
m.params

If you trained the model using the scripts (main.py) which I recommend, all the hyperparameters are stored in a pkl file under the saving directory.

from nmt-keras.

philipcori commented on May 28, 2024

So I retrieved the params of the original model using your first approach. Here is the code I am using to do this:

SRC_MODEL_PATH = os.path.join(os.getcwd(), 'models/persona_chat_lstm')
DST_MODEL_PATH = os.path.join(os.getcwd(), 'models/persona_chat_lstm_cpu')
epoch_choice = 5

dataset = loadDataset(os.path.join(SRC_MODEL_PATH, "dataset/Dataset_tutorial_dataset.pkl"))

src_model = loadModel(SRC_MODEL_PATH, epoch_choice)
params = src_model.params
params['USE_CUDNN'] = False
params['BIDIRECTIONAL_ENCODER'] = False  # Set to False to get the RNN type displayed. 
params['MODEL_NAME'] = 'CPU'
params['STORE_PATH'] = DST_MODEL_PATH
params['MODE'] = 'sampling'
params['RELOAD'] = epoch_choice

cpu_model = TranslationModel(params,
                             model_type=params['MODEL_TYPE'],
                             verbose=params['VERBOSE'],
                             model_name=params['MODEL_NAME'],
                             vocabularies=dataset.vocabulary,
                             store_path=params['STORE_PATH'],
                             set_optimizer=True,
                             clear_dirs=True)

cpu_model = updateModel(cpu_model, 
                        SRC_MODEL_PATH, 
                        params['RELOAD'], 
                        reload_epoch=True)

However I am still getting the following error:

[26/04/2020 12:38:14] <<< Updating model /data/home/pcori/Chatbot/models/persona_chat_lstm/epoch_5 from /data/home/pcori/Chatbot/models/persona_chat_lstm ... >>>
[26/04/2020 12:38:14] <<< Updating model from /data/home/pcori/Chatbot/models/persona_chat_lstm/epoch_5.h5 ... >>>
[26/04/2020 12:38:18] Invalid bias shape: (64,)
[26/04/2020 12:38:18] <<< Failed -> Loading model from /data/home/pcori/Chatbot/models/persona_chat_lstm/epoch_5_weights.h5' ... >>>
Traceback (most recent call last):
  File "/data/home/pcori/Chatbot/chat-env/lib/python3.6/site-packages/keras_wrapper/saving.py", line 237, in updateModel
    model.model.set_weights(load_model(model_name + '.h5', compile=False).get_weights())
  File "/data/home/pcori/Chatbot/chat-env/lib/python3.6/site-packages/keras/engine/network.py", line 524, in set_weights
    layer_weights = preprocess_weights_for_loading(layer, layer_weights)
  File "/data/home/pcori/Chatbot/chat-env/lib/python3.6/site-packages/keras/engine/saving.py", line 859, in preprocess_weights_for_loading
    weights = convert_nested_bidirectional(weights)
  File "/data/home/pcori/Chatbot/chat-env/lib/python3.6/site-packages/keras/engine/saving.py", line 802, in convert_nested_bidirectional
    original_backend)
  File "/data/home/pcori/Chatbot/chat-env/lib/python3.6/site-packages/keras/engine/saving.py", line 987, in preprocess_weights_for_loading
    weights = _convert_rnn_weights(layer, weights)
  File "/data/home/pcori/Chatbot/chat-env/lib/python3.6/site-packages/keras/engine/saving.py", line 1073, in _convert_rnn_weights
    raise ValueError('Invalid bias shape: ' + str(bias_shape))
ValueError: Invalid bias shape: (64,)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "cpu_chatbot.py", line 40, in <module>
    reload_epoch=True)
  File "/data/home/pcori/Chatbot/chat-env/lib/python3.6/site-packages/keras_wrapper/saving.py", line 244, in updateModel
    model.model.load_weights(model_name + '_weights.h5')
  File "/data/home/pcori/Chatbot/chat-env/lib/python3.6/site-packages/keras/engine/saving.py", line 492, in load_wrapper
    return load_function(*args, **kwargs)
  File "/data/home/pcori/Chatbot/chat-env/lib/python3.6/site-packages/keras/engine/network.py", line 1222, in load_weights
    with h5py.File(filepath, mode='r') as f:
  File "/data/home/pcori/Chatbot/chat-env/lib/python3.6/site-packages/h5py/_hl/files.py", line 408, in __init__
    swmr=swmr)
  File "/data/home/pcori/Chatbot/chat-env/lib/python3.6/site-packages/h5py/_hl/files.py", line 173, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = '/data/home/pcori/Chatbot/models/persona_chat_lstm/epoch_5_weights.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

For the second approach, is this the config.pkl file? Or the epoch_x_Model_Wrapper.pkl file? How can I go about loading the params from this file? Thanks again for the help.

from nmt-keras.

lvapeab commented on May 28, 2024

I think you are trying to load a bidirectional model (the saved one) into a unidirectional model (the one you're creating). So the shapes don't match.

You should not modify architectural hyperparameters, such as params['BIDIRECTIONAL_ENCODER'] when updating the model.

In the examples above, I set that to visualize the architectures. But if you are loading a bidirectional model, you need to also create a bidirectional one.

These ojls the config files are loaded/saved in the standard training procedure.

from nmt-keras.

philipcori commented on May 28, 2024

Oh thanks, that fixed updating the model. The last part that still doesn't work is using the model for inference. I use the following code continued from the one posted above:

cpu_model = updateModel(cpu_model, 
                        SRC_MODEL_PATH, 
                        params['RELOAD'], 
                        reload_epoch=True)


params_prediction = {
    'language': 'en',
    'tokenize_f': eval('dataset.' + 'tokenize_basic'),
    'beam_size': 6,
    'optimized_search': True,
    'model_inputs': params['INPUTS_IDS_MODEL'],
    'model_outputs': params['OUTPUTS_IDS_MODEL'],
    'dataset_inputs':  params['INPUTS_IDS_DATASET'],
    'dataset_outputs':  params['OUTPUTS_IDS_DATASET'],
    'n_parallel_loaders': 1,
    'maxlen': 50,
    'model_inputs': ['source_text', 'state_below'],
    'model_outputs': ['target_text'],
    'dataset_inputs': ['source_text', 'state_below'],
    'dataset_outputs': ['target_text'],
    'normalize': True,
    'pos_unk': True,
    'heuristic': 0,
    'state_below_maxlen': -1,
    'predict_on_sets': ['test'],
    'verbose': 0,
  }

user_inputs = []
bot_responses = []
while True:
    user_input = input()
    if (user_input == 'exit()'):
    	break
    user_inputs.append(user_input)
    with open(os.path.join(DST_MODEL_PATH, 'user_input.txt'), 'w') as f:
        f.write(user_input)
    dataset.setInput(os.path.join(DST_MODEL_PATH, 'user_input.txt'),
            'test',
            type='text',
            id='source_text',
            pad_on_batch=True,
            tokenization='tokenize_basic',
            fill='end',
            max_text_len=30,
            min_occ=0,
            overwrite_split=True)

    dataset.setInput(None,
                'test',
                type='ghost',
                id='state_below',
                required=False,
                overwrite_split=True)

    dataset.setRawInput(os.path.join(DST_MODEL_PATH, 'user_input.txt'),
                  'test',
                  type='file-name',
                  id='raw_source_text',
                  overwrite_split=True)

    
    vocab = dataset.vocabulary['target_text']['idx2words']
    predictions = cpu_model.predictBeamSearchNet(dataset, params_prediction)['test']

Just want to note that when I do inference using src_model rather than cpu_model it works fine, which confuses me because the error seems to point to an error with the dataset:

[27/04/2020 01:51:53]   Applying tokenization function: "tokenize_basic".
[27/04/2020 01:51:53] Loaded "test" set inputs of data_type "text" with data_id "source_text" and length 1.
[27/04/2020 01:51:53] Loaded "test" set inputs of data_type "ghost" with data_id "state_below" and length 1.
[27/04/2020 01:51:53] Loaded "test" set inputs of type "file-name" with id "raw_source_text".


[27/04/2020 01:51:53] <<< Predicting outputs of test set >>>
Traceback (most recent call last):
  File "cpu_chatbot.py", line 101, in <module>
    predictions = cpu_model.predictBeamSearchNet(dataset, params_prediction)['test']
  File "/data/home/pcori/Chatbot/chat-env/lib/python3.6/site-packages/keras_wrapper/cnn_model.py", line 1314, in predictBeamSearchNet
    X[input_id] = data[input_id]
KeyError: 'source_text'

Besides this, we are trying to use this model in the backend of a mobile application, hence trying to get it to work on a CPU. If you have any suggestions for how this library was intended to be used for this purpose they are welcome! Could the Interactive NMT branch be used for this?

from nmt-keras.

lvapeab commented on May 28, 2024

You need to also tell the CPU model what are the mappings, as done here:

nmt-keras/nmt_keras/training.py

Lines 95 to 108 in 3f97677

 # Define the inputs and outputs mapping from our Dataset instance to our model 

 inputMapping = dict() 

 for i, id_in in enumerate(params['INPUTS_IDS_DATASET']): 

 pos_source = dataset.ids_inputs.index(id_in) 

 id_dest = nmt_model.ids_inputs[i] 

 inputMapping[id_dest] = pos_source 

 nmt_model.setInputsMapping(inputMapping) 

 outputMapping = dict() 

 for i, id_out in enumerate(params['OUTPUTS_IDS_DATASET']): 

 pos_target = dataset.ids_outputs.index(id_out) 

 id_dest = nmt_model.ids_outputs[i] 

 outputMapping[id_dest] = pos_target 

 nmt_model.setOutputsMapping(outputMapping)

The primary scope of this library was research, so it's not really optimized towards CPU deployment/serving. I suggest you to take a look at tensorflow/model-optimization to this end. Of course, contributions are welcome! :)

In the interactive NMT branch are implemented different methods for interactive-predictive predictions and incremental training (see e.g. this paper).

from nmt-keras.

philipcori commented on May 28, 2024

Great, thanks a lot for the help!

from nmt-keras.

lvapeab commented on May 28, 2024

I'm closing this issue. Feel free to reopen it/open a new one if something is not clear.

from nmt-keras.

NamTran838P commented on May 28, 2024

As Philip suggested, I use the following code to convert my CuDNN-GRU model to regular GRU model. The code works well for Philip as he uses CuDNN-LSTM. A quick Google search reveals that to convert CuDNN-GRU to regular GRU in Keras, reset_after should be False and recurrent_activation should be sigmoid. My understanding is this needs modifications in the library. Would it be possible for you to have a look into this? Thanks a lot for your help.

The links to these can be found at:
https://gist.github.com/bzamecnik/bd3786a074f8cb891bc2a397343070f1

https://stackoverflow.com/questions/57551650/unable-to-use-deep-learning-rnn-models-trained-on-gpu-instance-for-the-inferenc

The error I get:

Traceback (most recent call last):
  File "/WAVE/users/unix/nvtran/.local/lib/python3.7/site-packages/keras_wrapper/cnn_model.py", line 259, in updateModel
    model.model.set_weights(load_model(model_path + '.h5', compile=False).get_weights())
  File "/WAVE/users/unix/nvtran/keras/keras/engine/network.py", line 524, in set_weights
    layer_weights = preprocess_weights_for_loading(layer, layer_weights)
  File "/WAVE/users/unix/nvtran/keras/keras/engine/saving.py", line 859, in preprocess_weights_for_loading
    weights = convert_nested_bidirectional(weights)
  File "/WAVE/users/unix/nvtran/keras/keras/engine/saving.py", line 802, in convert_nested_bidirectional
    original_backend)
  File "/WAVE/users/unix/nvtran/keras/keras/engine/saving.py", line 987, in preprocess_weights_for_loading
    weights = _convert_rnn_weights(layer, weights)
  File "/WAVE/users/unix/nvtran/keras/keras/engine/saving.py", line 1131, in _convert_rnn_weights
    raise ValueError('%s is not compatible with %s' % types)
ValueError: CuDNNGRU is not compatible with GRU(reset_after=False)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "convert_to_cpu.py", line 35, in <module>
    reload_epoch=True)
  File "/WAVE/users/unix/nvtran/.local/lib/python3.7/site-packages/keras_wrapper/cnn_model.py", line 266, in updateModel
    model.model.load_weights(model_path + '_weights.h5')
  File "/WAVE/users/unix/nvtran/keras/keras/engine/saving.py", line 492, in load_wrapper
    return load_function(*args, **kwargs)
  File "/WAVE/users/unix/nvtran/keras/keras/engine/network.py", line 1222, in load_weights
    with h5py.File(filepath, mode='r') as f:
  File "/WAVE/apps/eb/software/h5py/2.9.0-fosscuda-2019a/lib/python3.7/site-packages/h5py-2.9.0-py3.7-linux-x86_64.egg/h5py/_hl/files.py", line 394, in __init__
    swmr=swmr)
  File "/WAVE/apps/eb/software/h5py/2.9.0-fosscuda-2019a/lib/python3.7/site-packages/h5py-2.9.0-py3.7-linux-x86_64.egg/h5py/_hl/files.py", line 170, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 85, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = '/WAVE/users/unix/nvtran/grammar_check/model/epoch_3_weights.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

The code I use:

from keras_wrapper.cnn_model import loadModel, updateModel, saveModel
from keras_wrapper.dataset import loadDataset
from nmt_keras.model_zoo import TranslationModel
import os

SRC_MODEL_PATH = os.path.join(os.getcwd(), 'model')
DST_MODEL_PATH = os.path.join(os.getcwd(), 'model_cpu')
epoch_choice = 3

dataset = loadDataset("dataset/Dataset_tutorial_dataset.pkl")

src_model = loadModel(SRC_MODEL_PATH, epoch_choice)
params = src_model.params
params['USE_CUDNN'] = False
# params['BIDIRECTIONAL_ENCODER'] = False  # Set to False to get the RNN type displayed. 
params['MODEL_NAME'] = 'CPU'
params['STORE_PATH'] = DST_MODEL_PATH
params['MODE'] = 'sampling'
params['RELOAD'] = epoch_choice
# params['INPUT_VOCABULARY_SIZE'] = dataset.vocabulary_len[params['INPUTS_IDS_DATASET'][0]]
# params['OUTPUT_VOCABULARY_SIZE'] = dataset.vocabulary_len[params['OUTPUTS_IDS_DATASET'][0]]

cpu_model = TranslationModel(params,
                             model_type=params['MODEL_TYPE'],
                             verbose=params['VERBOSE'],
                             model_name=params['MODEL_NAME'],
                             vocabularies=dataset.vocabulary,
                             store_path=params['STORE_PATH'],
                             set_optimizer=True,
                             clear_dirs=True)

cpu_model = updateModel(cpu_model, 
                        SRC_MODEL_PATH, 
                        params['RELOAD'], 
                        reload_epoch=True)

saveModel(cpu_model, update_num=epoch_choice, path=DST_MODEL_PATH, full_path=True)

exit()

from nmt-keras.

lvapeab commented on May 28, 2024

I added an option to the config.py to set reset_after to the desired value (2854b7e).

You can set recurrent_activation to glorot_uniform also from the config (INNER_INIT option).

You'll need to also update Keras (MarcBS/keras@82ee090) (I added a workaround for setting GRU/LSTM calls compatible).

from nmt-keras.

NamTran838P commented on May 28, 2024

Unfortunately, I updated keras, multimodal_keras_wrapper and nmt-keras and still got the same error. I did set GRU_RESET_AFTER to True and INNER_INIT to "glorot_uniform" in config.py. I think the GRU_RESET_AFTER flag for some reason does not actually set the GRU's reset_after flag (in keras' saving.py script - line 1122 in the screenshot). Line 1122 should get executed but it does not actually get executed. This leads to the incompatibility complaint (terminal screenshot). Would it be possible for you to look into this?

Also, from the links I found in my previous post, it is suggested that the recurrent activation be set to "sigmoid." How do I set the recurrent activation to "sigmoid"? Thanks a lot.

from nmt-keras.

lvapeab commented on May 28, 2024

Have you modified the models at model_zoo? I've been able to train a CuDNN-GRU, save it, load it as a regular GRU and continue its training without problems. Here's a notebook.

Also, from the links I found in my previous post, it is suggested that the recurrent activation be set to "sigmoid." How do I set the recurrent activation to "sigmoid"? Thanks a lot.

The default recurrent activation is "sigmoid" (implementation). Moreover, I would carefuly think about changing this activation, as it is intended to work as a gate, that decides the amount of information that passes through the unit (squashing the values to [0, 1]).

from nmt-keras.

Using CPU for inference with GPU-trained model about nmt-keras HOT 20 CLOSED

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

	# Define the inputs and outputs mapping from our Dataset instance to our model
	inputMapping = dict()
	for i, id_in in enumerate(params['INPUTS_IDS_DATASET']):
	pos_source = dataset.ids_inputs.index(id_in)
	id_dest = nmt_model.ids_inputs[i]
	inputMapping[id_dest] = pos_source
	nmt_model.setInputsMapping(inputMapping)

	outputMapping = dict()
	for i, id_out in enumerate(params['OUTPUTS_IDS_DATASET']):
	pos_target = dataset.ids_outputs.index(id_out)
	id_dest = nmt_model.ids_outputs[i]
	outputMapping[id_dest] = pos_target
	nmt_model.setOutputsMapping(outputMapping)