strongio / keras-bert Goto Github PK
View Code? Open in Web Editor NEWA simple technique to integrate BERT from tf hub to keras
A simple technique to integrate BERT from tf hub to keras
i just want to run this code , but when i change the code as another issue, remove the pooling layers from trainable weights but i still get the same error, how can i slove this problem?
I got this error when I run the code in google colaboratory
TypeError: The following are legacy tf.layers.Layers:
<main.BertLayer object at 0x7fa94d2f5048>
To use keras as a framework (for instance using the Network, Model, or Sequential classes), please use the tf.keras.layers implementation instead. (Or, if writing custom layers, subclass from tf.keras.layers rather than tf.layers)
Hello, I am trying to save a model built exactly from your code example. However, I get the below error. Any advice?
`---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
in ()
----> 1 tf.keras.models.save_model( model, 'model', overwrite=True, include_optimizer=True )
2
5 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py in save_model(model, filepath, overwrite, include_optimizer, save_format, signatures)
107 'or using save_weights
.')
108 hdf5_format.save_model_to_hdf5(
--> 109 model, filepath, overwrite, include_optimizer)
110 else:
111 saved_model_save.save(model, filepath, overwrite, include_optimizer,
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py in save_model_to_hdf5(model, filepath, overwrite, include_optimizer)
91
92 try:
---> 93 model_metadata = saving_utils.model_metadata(model, include_optimizer)
94 for k, v in model_metadata.items():
95 if isinstance(v, (dict, list, tuple)):
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/saving_utils.py in model_metadata(model, include_optimizer, require_config)
158 except NotImplementedError as e:
159 if require_config:
--> 160 raise e
161
162 metadata = dict(
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/saving_utils.py in model_metadata(model, include_optimizer, require_config)
155 model_config = {'class_name': model.class.name}
156 try:
--> 157 model_config['config'] = model.get_config()
158 except NotImplementedError as e:
159 if require_config:
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py in get_config(self)
884 for layer in self.layers: # From the earliest layers on.
885 layer_class_name = layer.class.name
--> 886 layer_config = layer.get_config()
887
888 filtered_inbound_nodes = []
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py in get_config(self)
578 # or that get_config
has been overridden:
579 if len(extra_args) > 1 and hasattr(self.get_config, '_is_default'):
--> 580 raise NotImplementedError('Layers with arguments in __init__
must '
581 'override get_config
.')
582 # TODO(reedwm): Handle serializing self._dtype_policy.
NotImplementedError: Layers with arguments in __init__
must override get_config
.`
ValueError Traceback (most recent call last)
in ()
1 #Training the model
----> 2 model = build_model(max_seq_length)
3
4 # Instantiate variables
5 initialize_vars(sess)
in build_model(max_seq_length)
7 bert_inputs = [in_id, in_mask, in_segment]
8
----> 9 bert_output = BertLayer(n_fine_tune_layers=3)(bert_inputs)
~\Anaconda\lib\site-packages\tensorflow\python\layers\base.py in call(self, inputs, *args, **kwargs)
372
373 # Actually call layer
--> 374 outputs = super(Layer, self).call(inputs, *args, **kwargs)
375
376 if not context.executing_eagerly():
~\Anaconda\lib\site-packages\tensorflow\python\keras\engine\base_layer.py in call(self, inputs, *args, **kwargs)
744 # the user has manually overwritten the build method do we need to
745 # build it.
--> 746 self.build(input_shapes)
747 # We must set self.built since user defined build functions are not
748 # constrained to set self.built.
in build(self, input_shape)
16 for var in self.bert.variables:
17 if "encoder" in var.name:
---> 18 layer_no = int(var.name.split("/")[3])
19 layer_no = inti(layer_no.split("_")[-1])
20 if layer_no >= 12 - self.n_fine_tune_layers:
ValueError: invalid literal for int() with base 10: 'encoder'
hello,
I modified a part of the code,
bert_outputs = self.bert(inputs=bert_inputs, signature="tokens", as_dict=True)['sequence_output']
return (max_seq_length, self.output_size)
but when i run the code, i have the following problem!
"F tensorflow/core/framework/tensor_shape.cc:44] Check failed: NDIMS == dims() (2 vs. 4)Asking for tensor of 2 dimensions from a tensor of 4 dimensions
Aborted (core dumped)"
may I know what is the reason?
Hi. I was trying to repro your results before implementing for my research to make sure that I didn't miss anything. But, when the model was built, the number of trainable and non-trainable params were not the same but the total params was tallied. Also, the accuracy over one epoch was reported as around 85.56% in your notebook but I observed only around 50.10%. The predictions with training and with loading the saved model printed True for same predictions anyways.
Could you please help why I am not able to repro your results?
Please find the setup details below:
Ubuntu 18.04 running docker image of tensorflow-1.14.0 with NVIDIA GPU.
TypeError Traceback (most recent call last)
in ()
----> 1 model = build_model(max_seq_length)
2
3 # Instantiate variables
4 initialize_vars(sess)
5
3 frames
in build_model(max_seq_length)
10
11 #model=tf.keras.layers(inputs=bert_inputs, outputs=pred)
---> 12 model = tf.keras.models.Model(inputs=bert_inputs, outputs=pred)
13
14 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in init(self, *args, **kwargs)
127
128 def init(self, *args, **kwargs):
--> 129 super(Model, self).init(*args, **kwargs)
130 # initializing _distribution_strategy here since it is possible to call
131 # predict on a model without compiling it.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/network.py in init(self, *args, **kwargs)
165 self._init_subclassed_network(**kwargs)
166
--> 167 tf_utils.assert_no_legacy_layers(self.layers)
168
169 # Several Network methods have "no_automatic_dependency_tracking"
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/tf_utils.py in assert_no_legacy_layers(layers)
397 'classes), please use the tf.keras.layers implementation instead. '
398 '(Or, if writing custom layers, subclass from tf.keras.layers rather '
--> 399 'than tf.layers)'.format(layer_str))
400
401
TypeError: The following are legacy tf.layers.Layers:
<main.BertLayer object at 0x7fa4e1b239b0>
To use keras as a framework (for instance using the Network, Model, or Sequential classes), please use the tf.keras.layers implementation instead. (Or, if writing custom layers, subclass from tf.keras.layers rather than tf.layers)
Can you please suggest a way I can visualize the attention mechanism of Bert while using your code ( I mean while using Tensorflow Hub for the weights ) ?
It isn't indeed issue, but it would be nice to have requirements.txt here
I am getting the following error
RuntimeError: variable_scope module_1/ was unused but the corresponding name_scope was already taken.
Could you please help?
Thank you very much for helping us know how to do transfer learning with Bert by using Keras!
I have a small question about the shape of output tensor from Bert layer.
I see the following compute_output_shape
function in BertLayer
class:
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_size)
It seems the function is attempting to indicate that the output has shape of (batch_size, output_size)
, but it is not true IMHO.
The input is a list which contains 3 tensors of shape (batch_size, max_sequence_length)
, which means input_shape[0]
has value of (batch_size, max_sequence_length)
and eventually causes the output shape ends up with ((batch_size, max_sequence_length), output_size)
, not the expected value (batch_size, output_size)
.
Please correct me if I am wrong. Thanks!
Hi, I wonder if it is possible to modify the code, to get embeddings at word level instead of at the sentence level.
I don't know if I am get this correct. According to the BERT paper, author mentioned to use the first vector to do a classification ("[CLS]"). I saw you are using "pooled" vector in your code. Is there any reason?
Thanks,
Li Sun
Greetings. When i try to run last cell, which is model save and load. i get this error.
NotImplementedError: Layers with arguments in __init__
must override get_config
.
Thank you very much for the article. After that, I wanted to understand BERT more deeply and found the following thing in your code.
For fine tune, you use the following line of code:
trainable_vars = self.bert.variables
trainable_vars = trainable_vars [-self.n_fine_tune_layers:]
However, self.bert.variables returns the list sorted by variable names, and therefore the 11th block of the transformer goes before 9. And with fine tune, intermediate layers are trained when the others are completely frozen.
bert.variables return
<tf.Variable 'BERT_module_1/bert/embeddings/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/embeddings/position_embeddings:0' shape=(512, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/embeddings/word_embeddings:0' shape=(119547, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/key/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/query/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/value/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_0/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_0/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/key/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/query/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/value/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_1/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_1/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/key/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/query/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/value/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_10/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_10/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/key/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/query/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/value/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_11/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_11/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/key/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/query/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/value/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_2/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_2/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/key/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/query/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/value/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_3/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_3/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/key/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/query/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/value/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_4/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_4/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/key/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/query/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/value/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_5/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_5/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/key/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/query/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/value/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_6/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_6/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/key/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/query/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/value/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_7/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_7/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/key/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/query/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/value/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_8/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_8/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/key/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/query/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/value/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_9/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_9/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/pooler/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/bert/pooler/dense/kernel:0' shape=(768, 768) dtype=float32>,
<tf.Variable 'BERT_module_1/cls/predictions/output_bias:0' shape=(119547,) dtype=float32>,
<tf.Variable 'BERT_module_1/cls/predictions/transform/LayerNorm/beta:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/cls/predictions/transform/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/cls/predictions/transform/dense/bias:0' shape=(768,) dtype=float32>,
<tf.Variable 'BERT_module_1/cls/predictions/transform/dense/kernel:0' shape=(768, 768) dtype=float32>]```
I am running the script on my machine with the following configuration
TF = 1.14
OS = Windows 10
Python = 3.7
Here is the full error
File "keras-bert.py", line 354, in <module>
main()
File "keras-bert.py", line 339, in main
batch_size=32,
File "C:\Users\Urvish\Envs\tf_1_14\lib\site-packages\tensorflow\python\keras\engine\training.py", line 780, in fit
steps_name='steps_per_epoch')
File "C:\Users\Urvish\Envs\tf_1_14\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 363, in model_iteration
batch_outs = f(ins_batch)
File "C:\Users\Urvish\Envs\tf_1_14\lib\site-packages\tensorflow\python\keras\backend.py", line 3292, in __call__
run_metadata=self.run_metadata)
File "C:\Users\Urvish\Envs\tf_1_14\lib\site-packages\tensorflow\python\client\session.py", line 1458, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.FailedPreconditionError: 2 root error(s) found.
(0) Failed precondition: Error while reading resource variable bert_layer_module/bert/encoder/layer_9/output/LayerNorm/gamma from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/bert_layer_module/bert/encoder/layer_9/output/LayerNorm/gamma/class tensorflow::Var does not exist.
[[{{node bert_layer/bert_layer_module_apply_tokens/bert/encoder/layer_9/output/LayerNorm/batchnorm/mul/ReadVariableOp}}]]
[[loss/mul/_489]]
(1) Failed precondition: Error while reading resource variable bert_layer_module/bert/encoder/layer_9/output/LayerNorm/gamma from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/bert_layer_module/bert/encoder/layer_9/output/LayerNorm/gamma/class tensorflow::Var does not exist.
[[{{node bert_layer/bert_layer_module_apply_tokens/bert/encoder/layer_9/output/LayerNorm/batchnorm/mul/ReadVariableOp}}]]
0 successful operations.
0 derived errors ignored.
I've trained a model that retunes a bertlayer but I can't seem to get it to export as a saved model properly... Any ideas?
Hi,
i would like to use the output of a BertLayer as Input for a YookKimCNN. This is implemented in keras_text. I already realized that mixing tf and keras imports is not a good idea. However, the YoonKimCNN is currently "only" available in keras_text, leading to the following error:
AttributeError: 'Node' object has no attribute 'output_masks'
After several changes, I currently use tensorflow 1.12.0 and keras 2.2.4.
Thanks for any suggestions in advance
Hello ! Thanks for the notebook, it is really helpful! I am trying to make it work for multiclass classification but I have some difficulties. My dataset its strings with multiple labels, which I one-hot encode before I train/test split them and feed them into the class 'Inputexample'. It seems to work after that, but when I try to call the model later on it gives me the following error.
"Input arrays should have the same number of samples as target arrays. Found 10251 input samples and 51255 target samples."
I suspect it has something to do with how it converts y to features since 10251 x 5 = 51255 and I have 5 classes. Is there something inherent to binary classification in your code that would raise this error?
I ran the code from 'keras-bert.ipynb' as it is and observed that the number of trainable parameters in my run is '22,051,329' instead of '3,147,009' in your run of the notebook. Also my accuracy is just about 0.53. Can you please help me out. Thanks!
Has anyone tried replicating this in keras with SQUAD dataset? It is not clear how can we prepare the input data for a custom bert model like having a LSTM on top of bert-base.
When I try to build the model, I get an error saying - 'Module' object has no attribute 'variables'
This occurs specifically in the build function of the BertLayer class when I try to access self.bert.variables.
I tried a dir(self.bert) to get all the attributes of the object and it indeed did not have an attribute called variables. These are the attributes I obtained:
['_call_', '_class_', '_delattr_', '_dict_', '_dir_', '_doc_', '_eq_', '_format_', '_ge_', '_getattribute_', '_gt_', '_hash_', '_init_', '_init_subclass_', '_le_', '_lt_', '_module_', '_ne_', '_new_', '_reduce_', '_reduce_ex_', '_repr_', '_setattr_', '_sizeof_', '_str_', '_subclasshook_', '_weakref_', '_graph', '_impl', '_name', '_spec', '_tags', '_trainable', 'export', 'get_input_info_dict', 'get_output_info_dict', 'get_signature_names', 'variable_map']
I'm using tf version 1.13.0 with Python3.5 on Win10.
Where is the bert folder, because it is not able to access it and I checked all your repos but could not find the code for actual bert. I would extremely appreciate it if you could send me a replt about this issue.
Thanks
The error is : ModuleNotFoundError: No module named 'bert'
Has anyone encountered this error?
Traceback (most recent call last):
File "keras-bert.py", line 336, in
main()
File "keras-bert.py", line 333, in main
model.fit([train_input_ids, train_input_masks, train_segment_ids],train_labels,validation_data=([test_input_ids, test_input_masks, test_segment_ids],test_labels,),epochs=1,batch_size=32,)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py", line 643, in fit
use_multiprocessing=use_multiprocessing)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training_arrays.py", line 664, in fit
steps_name='steps_per_epoch')
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training_arrays.py", line 383, in model_iteration
batch_outs = f(ins_batch)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/backend.py", line 3353, in call
run_metadata=self.run_metadata)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1458, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable bert_layer_module/bert/encoder/layer_9/attention/self/query/bias from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/bert_layer_module/bert/encoder/layer_9/attention/self/query/bias/N10tensorflow3VarE does not exist.
[[{{node bert_layer/bert_layer_module_apply_tokens/bert/encoder/layer_9/attention/self/query/BiasAdd/ReadVariableOp}}]]
The code is basically the same, just some minor changes in the process to overcome other errors.
when i try to implement the same code in google colab it was throwing an error while alling the create_tokenizer_from_hub_module() function .
I would like to use BERT for a multi-class multi-task classification. For each sentence (let's say with a fixed number of n tokens) to classify, BERT would (when I got it right) provide a vector of 768 elements, i.e., (n,768). When batches are involved, I would expect to have (None, n, 768). With keras-bert, I obtain ((None, n), 768). For feeding this tensor to keras' text YoonKimCNN, I have to add a further dimension here, but the nested structure remains, so that also the final layer have this ((None, n), m), even though I would expect to obtain (None,m) in the end. Structure:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_ids (InputLayer) (None, 256) 0
__________________________________________________________________________________________________
input_masks (InputLayer) (None, 256) 0
__________________________________________________________________________________________________
segment_ids (InputLayer) (None, 256) 0
__________________________________________________________________________________________________
bert_layer_1 (BertLayer) ((None, 256), 768) 110104890 input_ids[0][0]
input_masks[0][0]
segment_ids[0][0]
__________________________________________________________________________________________________
reshape_1 (Reshape) ((None, 256), 768, 1 0 bert_layer_1[0][0]
__________________________________________________________________________________________________
consume_mask_1 (ConsumeMask) ((None, 256), 768, 1 0 reshape_1[0][0]
__________________________________________________________________________________________________
conv1d_1 (Conv1D) ((None, 256), 766, 1 512 consume_mask_1[0][0]
__________________________________________________________________________________________________
conv1d_2 (Conv1D) ((None, 256), 765, 1 640 consume_mask_1[0][0]
__________________________________________________________________________________________________
conv1d_3 (Conv1D) ((None, 256), 764, 1 768 consume_mask_1[0][0]
__________________________________________________________________________________________________
global_max_pooling1d_1 (GlobalM ((None, 256), 128) 0 conv1d_1[0][0]
__________________________________________________________________________________________________
global_max_pooling1d_2 (GlobalM ((None, 256), 128) 0 conv1d_2[0][0]
__________________________________________________________________________________________________
global_max_pooling1d_3 (GlobalM ((None, 256), 128) 0 conv1d_3[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) ((None, 256), 384) 0 global_max_pooling1d_1[0][0]
global_max_pooling1d_2[0][0]
global_max_pooling1d_3[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout) ((None, 256), 384) 0 concatenate_1[0][0]
__________________________________________________________________________________________________
dense_4 (Dense) ((None, 256), 256) 98560 dropout_1[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) ((None, 256), 128) 49280 dropout_1[0][0]
__________________________________________________________________________________________________
dropout_4 (Dropout) ((None, 256), 256) 0 dense_4[0][0]
__________________________________________________________________________________________________
dropout_2 (Dropout) ((None, 256), 128) 0 dense_1[0][0]
__________________________________________________________________________________________________
dense_5 (Dense) ((None, 256), 128) 32896 dropout_4[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) ((None, 256), 64) 8256 dropout_2[0][0]
__________________________________________________________________________________________________
dropout_5 (Dropout) ((None, 256), 128) 0 dense_5[0][0]
__________________________________________________________________________________________________
dropout_3 (Dropout) ((None, 256), 64) 0 dense_2[0][0]
__________________________________________________________________________________________________
dense_6 (Dense) ((None, 256), 25) 3225 dropout_5[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) ((None, 256), 1) 65 dropout_3[0][0]
==================================================================================================
Total params: 110,299,092
Trainable params: 71,072,922
Non-trainable params: 39,226,170
__________________________________________________________________________________________________
This looks different from what we can see here. Any suggestions how to get rid of the nested structure are welcome.
I am having the following issue...
The model compiles and prints the following output. However, on model.fit() nothing happens, despite verbose mode being turned on.
When i look at my hardware utilisation, my GPU has memory allocated to the process however utilisation is 0-2%. On my CPU, only one core is getting worked by the process at 100% utilisation.
To test my tensorflow-gpu install, I ran the CNN example on tensorflow and got 20% GPU utilisation.
I don't think it is a preprocessing bottleneck as I load my training data into memory.
Thanks.
Code:
` bert_path = "https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"
max_seq_length = 256
corpus = MyDocs("datasets/bbc/raw", bert_path, max_seq_length)
ids = []
masks = []
segment_ids = []
for id, mask, segment, label in corpus:
ids.append(id)
masks.append(masks)
segment_ids.append(segment)
X = [ids, masks, segment_ids]
labels = corpus.labels
label_encoder = OneHotEncoder()
y = label_encoder.fit_transform(np.array(labels).reshape(-1, 1)).todense()
print('Building model...')
model = build_model(bert_path, max_seq_length)
print('Training model...')
history = model.fit(X, y,
validation_split=0.2,
epochs=1,
batch_size=1,
verbose=2,
use_multiprocessing=True)`
Output:
Building model...
W0709 21:57:53.871020 140194145126208 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0709 21:57:53.922768 140194145126208 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Training model...
I have installed tensorflow-hub using pip install tensorflow-hub
I am using tensorflow==1.13.1 and python==3.7.4 (64)
Could anyone help me with this issue?
I am trying to use your keras embeddings layer wrapper to use it for WSD, however I have this error every time
Traceback (most recent call last):
File "D:/SVC/GitLab/ahmed_elsheikh_1873337_nlp19project/code/model_bert_prova.py", line 234, in <module>
model = baseline_model(output_size, visualize=True)
File "D:/SVC/GitLab/ahmed_elsheikh_1873337_nlp19project/code/model_bert_prova.py", line 61, in baseline_model
)(bert_embedding)
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\layers\wrappers.py", line 473, in __call__
return super(Bidirectional, self).__call__(inputs, **kwargs)
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 746, in __call__
self.build(input_shapes)
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\layers\wrappers.py", line 612, in build
self.forward_layer.build(input_shape)
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\utils\tf_utils.py", line 149, in wrapper
output_shape = fn(instance, input_shape)
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\layers\recurrent.py", line 552, in build
self.cell.build(step_input_shape)
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\utils\tf_utils.py", line 149, in wrapper
output_shape = fn(instance, input_shape)
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\layers\recurrent.py", line 1934, in build
constraint=self.kernel_constraint)
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 609, in add_weight
aggregation=aggregation)
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\training\checkpointable\base.py", line 639, in _add_variable_with_custom_getter
**kwargs_for_getter)
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1977, in make_variable
aggregation=aggregation)
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\ops\variables.py", line 183, in __call__
return cls._variable_v1_call(*args, **kwargs)
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\ops\variables.py", line 146, in _variable_v1_call
aggregation=aggregation)
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\ops\variables.py", line 125, in <lambda>
previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\ops\variable_scope.py", line 2437, in default_variable_creator
import_scope=import_scope)
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\ops\variables.py", line 187, in __call__
return super(VariableMetaclass, cls).__call__(*args, **kwargs)
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 297, in __init__
constraint=constraint)
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 409, in _init_from_args
initial_value() if init_from_fn else initial_value,
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1959, in <lambda>
shape, dtype=dtype, partition_info=partition_info)
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\ops\init_ops.py", line 473, in __call__
scale /= max(1., (fan_in + fan_out) / 2.)
TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'
Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x000001D506B45358>>
Traceback (most recent call last):
File "C:\Users\Sheikh\AppData\Local\Programs\Python\Python36\Lib\site-packages\tensorflow\python\client\session.py", line 738, in __del__
TypeError: 'NoneType' object is not callable
import tensorflow as tf
import tensorflow_hub as hub
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Layer
class BertEmbeddingLayer(Layer):
'''
Integrate BERT Embeddings from tensorflow hub into a
custom Keras layer.
references:
1. https://github.com/strongio/keras-bert
2. https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1
'''
def __init__(self, n_fine_tune_layers=10, pooling="mean",
bert_path="https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1",
**kwargs,):
self.n_fine_tune_layers = n_fine_tune_layers
self.trainable = True
self.output_size = 768
self.pooling = pooling
self.bert_path = bert_path
if self.pooling not in ["first", "mean"]:
raise NameError(
f"Undefined pooling type (must be either first or mean, but is {self.pooling}")
super(BertEmbeddingLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.bert = hub.Module(self.bert_path,
trainable=self.trainable,
name=f"{self.name}_module")
# Remove unused layers
trainable_vars = self.bert.variables
if self.pooling == "first":
trainable_vars = [var for var in trainable_vars if not "/cls/" in var.name]
trainable_layers = ["pooler/dense"]
elif self.pooling == "mean":
trainable_vars = [var for var in trainable_vars if not "/cls/" in var.name and not "/pooler/" in var.name]
trainable_layers = []
else:
raise NameError(f"Undefined pooling type (must be either first or mean, but is {self.pooling}")
# Select how many layers to fine tune
for i in range(self.n_fine_tune_layers):
trainable_layers.append(f"encoder/layer_{str(11 - i)}")
# Update trainable vars to contain only the specified layers
trainable_vars = [
var
for var in trainable_vars
if any([l in var.name for l in trainable_layers])
]
# Add to trainable weights
for var in trainable_vars:
self._trainable_weights.append(var)
for var in self.bert.variables:
if var not in self._trainable_weights:
self._non_trainable_weights.append(var)
super(BertEmbeddingLayer, self).build(input_shape)
def call(self, inputs):
inputs = [K.cast(x, dtype="int32") for x in inputs]
input_ids, input_mask, segment_ids = inputs
bert_inputs = dict(input_ids=input_ids,
input_mask=input_mask,
segment_ids=segment_ids
)
if self.pooling == "first":
pooled = self.bert(inputs=bert_inputs,
signature="tokens",
as_dict=True)["pooled_output"]
elif self.pooling == "mean":
result = self.bert(inputs=bert_inputs,
signature="tokens",
as_dict=True)["sequence_output"]
def mul_mask(x, m):
return x * tf.expand_dims(m, axis=-1)
def masked_reduce_mean(x, m):
return tf.reduce_sum(mul_mask(x, m), axis=1) / (tf.reduce_sum(m, axis=1, keepdims=True) + 1e-10)
input_mask = tf.cast(input_mask, tf.float32)
pooled = masked_reduce_mean(result, input_mask)
else:
raise NameError(f"Undefined pooling type (must be either first or mean, but is {self.pooling}")
return pooled
def compute_output_shape(self, input_shape):
return input_shape[0][0], input_shape[0][1], self.output_size
import os
import yaml
import numpy as np
from argparse import ArgumentParser
import tensorflow as tf
import tensorflow_hub as hub
from tensorflow.keras.layers import (LSTM, Add, Bidirectional, Dense, Input, TimeDistributed, Embedding)
from tensorflow.keras.preprocessing.sequence import pad_sequences
try:
from bert.tokenization import FullTokenizer
except ModuleNotFoundError:
os.system('pip install bert-tensorflow')
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
from tqdm import tqdm
from keras_bert import BertEmbeddingLayer
from model_utils import visualize_plot_mdl
from parsing_dataset import load_dataset
from utilities import configure_tf, initialize_logger
def parse_args():
parser = ArgumentParser(description="WSD")
parser.add_argument("--model_type", default='baseline', type=str,
help="""Choose the model: baseline: BiLSTM Model.
attention: Attention Stacked BiLSTM Model.
seq2seq: Seq2Seq Attention.""")
return vars(parser.parse_args())
def train_model(mdl, data, epochs=1, batch_size=32):
[train_input_ids, train_input_masks, train_segment_ids], train_labels = data
history = mdl.fit([train_input_ids, train_input_masks, train_segment_ids],
train_labels, epochs=epochs, batch_size=batch_size)
return history
def baseline_model(output_size):
hidden_size = 128
max_seq_len = 64
in_id = Input(shape=(max_seq_len,), name="input_ids")
in_mask = Input(shape=(max_seq_len,), name="input_masks")
in_segment = Input(shape=(max_seq_len,), name="segment_ids")
bert_inputs = [in_id, in_mask, in_segment]
bert_embedding = BertEmbeddingLayer()(bert_inputs)
embedding_size = 768
bilstm = Bidirectional(LSTM(hidden_size,
return_sequences=True,
input_shape=(None, None, embedding_size)
),
merge_mode='sum'
)(bert_embedding)
output = TimeDistributed(Dense(output_size, activation="softmax"))(bilstm)
mdl = Model(inputs=bert_inputs, outputs=output, name="Bert_BiLSTM")
mdl.compile(loss="sparse_categorical_crossentropy",
optimizer='adadelta', metrics=["acc"])
return mdl
def initialize_vars(sess):
sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())
sess.run(tf.tables_initializer())
K.set_session(sess)
class PaddingInputExample(object):
"""Fake example so the num input examples is a multiple of the batch size.
When running eval/predict on the TPU, we need to pad the number of examples
to be a multiple of the batch size, because the TPU requires a fixed batch
size. The alternative is to drop the last batch, which is bad because it means
the entire output data won't be generated.
We use this class instead of `None` because treating `None` as padding
batches could cause silent errors.
"""
class InputExample(object):
"""A single training/test example for simple sequence classification."""
def __init__(self, guid, text_a, text_b=None, label=None):
"""Constructs a InputExample.
Args:
guid: Unique id for the example.
text_a: string. The un-tokenized text of the first sequence. For single
sequence tasks, only this sequence must be specified.
text_b: (Optional) string. The un-tokenized text of the second sequence.
Only must be specified for sequence pair tasks.
label: (Optional) string. The label of the example. This should be
specified for train and dev examples, but not for test examples.
"""
self.guid = guid
self.text_a = text_a
self.text_b = text_b
self.label = label
def create_tokenizer_from_hub_module(bert_path="https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"):
"""Get the vocab file and casing info from the Hub module."""
bert_module = hub.Module(bert_path)
tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
vocab_file, do_lower_case = sess.run(
[
tokenization_info["vocab_file"],
tokenization_info["do_lower_case"],
]
)
return FullTokenizer(vocab_file=vocab_file, do_lower_case=do_lower_case)
def convert_single_example(tokenizer, example, max_seq_length=256):
"""Converts a single `InputExample` into a single `InputFeatures`."""
if isinstance(example, PaddingInputExample):
input_ids = [0] * max_seq_length
input_mask = [0] * max_seq_length
segment_ids = [0] * max_seq_length
label = [0] * max_seq_length
return input_ids, input_mask, segment_ids, label
tokens_a = tokenizer.tokenize(example.text_a)
if len(tokens_a) > max_seq_length - 2:
tokens_a = tokens_a[0: (max_seq_length - 2)]
tokens = []
segment_ids = []
tokens.append("[CLS]")
segment_ids.append(0)
example.label.append(0)
for token in tokens_a:
tokens.append(token)
segment_ids.append(0)
tokens.append("[SEP]")
segment_ids.append(0)
example.label.append(0)
input_ids = tokenizer.convert_tokens_to_ids(tokens)
# The mask has 1 for real tokens and 0 for padding tokens. Only real
# tokens are attended to.
input_mask = [1] * len(input_ids)
# Zero-pad up to the sequence length.
while len(input_ids) < max_seq_length:
input_ids.append(0)
input_mask.append(0)
segment_ids.append(0)
example.label.append(0)
assert len(input_ids) == max_seq_length
assert len(input_mask) == max_seq_length
assert len(segment_ids) == max_seq_length
return input_ids, input_mask, segment_ids, example.label
def convert_examples_to_features(tokenizer, examples, max_seq_length=256):
"""Convert a set of `InputExample`s to a list of `InputFeatures`."""
input_ids, input_masks, segment_ids, labels = [], [], [], []
for example in tqdm(examples, desc="Converting examples to features"):
input_id, input_mask, segment_id, label = convert_single_example(tokenizer, example, max_seq_length)
input_ids.append(np.array(input_id))
input_masks.append(np.array(input_mask))
segment_ids.append(np.array(segment_id))
labels.append(np.array(label))
return np.array(input_ids), np.array(input_masks), np.array(segment_ids), np.array(labels).reshape(-1, 1)
def convert_text_to_examples(texts, labels):
"""Create InputExamples"""
InputExamples = []
for text, label in zip(texts, labels):
InputExamples.append(
InputExample(guid=None, text_a=" ".join(text), text_b=None, label=label)
)
return InputExamples
# Initialize session
sess = tf.Session()
params = parse_args()
initialize_logger()
configure_tf()
# Load our config file
config_file_path = os.path.join(os.getcwd(), "config.yaml")
config_file = open(config_file_path)
config_params = yaml.load(config_file)
# This parameter allow that train_x to be in form of words, to allow using of your keras-elmo layer
elmo = config_params["use_elmo"]
dataset = load_dataset(elmo=elmo)
vocabulary_size = dataset.get("vocabulary_size")
output_size = dataset.get("output_size")
# Parse data in Bert format
max_seq_length = 64
train_x = dataset.get("train_x")
train_text = [' '.join(x) for x in train_x]
train_text = [' '.join(t.split()[0:max_seq_length]) for t in train_text]
train_text = np.array(train_text, dtype=object)[:, np.newaxis]
# print(train_text.shape) # (37184, 1)
train_labels = dataset.get("train_y")
# Instantiate tokenizer
tokenizer = create_tokenizer_from_hub_module()
# Convert data to InputExample format
train_examples = convert_text_to_examples(train_text, train_labels)
# Extract features
(train_input_ids, train_input_masks, train_segment_ids, train_labels) = convert_examples_to_features(tokenizer, train_examples, max_seq_length=max_seq_length)
bert_inputs = [train_input_ids, train_input_masks, train_segment_ids]
data = bert_inputs, train_labels
del dataset
model = baseline_model(output_size)
# Instantiate variables
initialize_vars(sess)
history = train_model(model, data)
Can you please let me know how to solve it?
In keras-bert.ipynb, I see the following:
def convert_text_to_examples(texts, labels):
"""Create InputExamples"""
InputExamples = []
for text, label in zip(texts, labels):
InputExamples.append(
InputExample(guid=None, text_a=" ".join(text), text_b=None, label=label)
)
return InputExamples
It is believed that " ".join(text) actually splits the words into characters. This in turn causes BERT to tokenize based on character as opposed to the whole or partial word.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.