keras-team / tf-keras Goto Github PK
View Code? Open in Web Editor NEWThe TensorFlow-specific implementation of the Keras API, which was the default Keras from 2019 to 2023.
License: Apache License 2.0
The TensorFlow-specific implementation of the Keras API, which was the default Keras from 2019 to 2023.
License: Apache License 2.0
TF 2.4.1 - 2.7
When using a metric that accepts complex numbers, tf keras fit ends up discarding the imaginary part of the complex number and then casts the array to real when inputting y_pred and y_true to a metric function.
It works correctly when used as a loss function. The casting only occurs when the loss function is used as a metric.
The error is because of casting done in metrics.py on line 609. See the snippet below from this file.
def update_state(self, y_true, y_pred, sample_weight=None):
"""Accumulates metric statistics.
`y_true` and `y_pred` should have the same shape.
Args:
y_true: Ground truth values. shape = `[batch_size, d0, .. dN]`.
y_pred: The predicted values. shape = `[batch_size, d0, .. dN]`.
sample_weight: Optional `sample_weight` acts as a
coefficient for the metric. If a scalar is provided, then the metric is
simply scaled by the given value. If `sample_weight` is a tensor of size
`[batch_size]`, then the metric for each sample of the batch is rescaled
by the corresponding element in the `sample_weight` vector. If the shape
of `sample_weight` is `[batch_size, d0, .. dN-1]` (or can be broadcasted
to this shape), then each metric element of `y_pred` is scaled by the
corresponding value of `sample_weight`. (Note on `dN-1`: all metric
functions reduce by 1 dimension, usually the last axis (-1)).
Returns:
Update op.
"""
y_true = math_ops.cast(y_true, self._dtype) # THIS IS WRONG!
y_pred = math_ops.cast(y_pred, self._dtype)
[y_true, y_pred], sample_weight = \
metrics_utils.ragged_assert_compatible_and_get_flat_values(
[y_true, y_pred], sample_weight)
y_pred, y_true = losses_utils.squeeze_or_expand_dimensions(
y_pred, y_true)
ag_fn = autograph.tf_convert(self._fn, ag_ctx.control_status_ctx())
matches = ag_fn(y_true, y_pred, **self._fn_kwargs)
return super(MeanMetricWrapper, self).update_state(
matches, sample_weight=sample_weight)
Setting a break point at the start of this function, we see the following vars in debug:
y_true
<tf.Tensor 'IteratorGetNext:3' shape=(None, 256, 256, 1) dtype=complex64>
self._dtype
'float32'
y_true should remain of type complex64 but instead is casted to float32 by ignoring the imaginary part and keeping the real part.
A correct solution will make an additional check for complex data in addition to the system floatx() type and cast accordingly (i.e. if floatx() is float32, complex128 becomes complex64; float64 goes to float32).
I searched for this pattern, tf.cast(y_true, self._dtype), and it occurs quite a bit throughout this file. From what I can tell, all cases are incorrect and will cast complex data to float data and discard the imaginary part. I also confirmed this bug exists in tf 2.7, reference https://github.com/keras-team/keras/blob/v2.7.0/keras/metrics.py#L1096-L1141
Describe the feature and the current behavior/state.
I'm a recommendation system engineer and I know there are many Keras Preprocessing Layer
added since tf 2.3.0
.
Now I can use tf.keras.layers.StringLookup
+ tf.keras.layers.Embedding
to convert a string to an embedding tensor .But it fix the range of string and the range of embedding , which can't add new string and its embedding automatically when training new data . It leads that I can't use keras
to do online daily training because the training data contains new user_id
feature and item_id
feature .
So I want to know wether keras
can support it ?
Will this change the current api? How?
Who will benefit from this feature?
I was asked to cross-post this issue here,
tensorflow/tensorflow#54197
Thank you
Please go to TF Forum for help and support:
https://discuss.tensorflow.org/tag/keras
If you open a GitHub issue, here is our policy:
It must be a bug, a feature request, or a significant problem with the documentation (for small docs fixes please send a PR instead).
The form below must be filled out.
Here's why we have that policy:.
Keras developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.
System information.
You can collect some of this information using our environment capture script:
https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh
You can obtain the TensorFlow version with:
python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
Describe the problem.
Describe the problem clearly here. Be sure to convey here why it's a bug in Keras or why the requested feature is needed.
Describe the current behavior.
Describe the expected behavior.
Standalone code to reproduce the issue.
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/Jupyter/any notebook.
Source code / logs.
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.
Moving user issue from: tensorflow/tensorflow#45231
Describe the problem.
**When I run the example provided by official tensorflow Basic text classification, everything runs fine until model save. But when I load the model it gives me this error.
RuntimeError: Unable to restore a layer of class TextVectorization. Layers of class TextVectorization require that the class be provided to the model loading code, either by registering the class using @keras.utils.register_keras_serializable on the class def and including that file in your program, or by passing the class in a keras.utils.CustomObjectScope that wraps this load call.
**
Model should be loaded successfully and process raw input
Example Link: https://tensorflow.google.cn/tutorials/keras/text_classification
Please go to TF Forum for help and support:
https://discuss.tensorflow.org/tag/keras
If you open a GitHub issue, here is our policy:
It must be a bug, a feature request, or a significant problem with the documentation (for small docs fixes please send a PR instead).
The form below must be filled out.
Here's why we have that policy:.
Keras developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.
System information.
You can collect some of this information using our environment capture script:
https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh
You can obtain the TensorFlow version with:
python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
Describe the problem.
We have a model that consumes multiple ragged tensors in a batch. Our model runs perfectly fine on a single GPU. But the moment we introduce distributed training, its evaluation fails.
Note that the training during the distributed settings proceeds smoothly but it's during the evaluation it fails. Since we cannot provide the original data and model, we are using we are providing a minimal snippet in the following notebook that reproduces the issue. You can use Colab to reproduce the issue as well as a multi-GPU machine. We have verified on both and the issue persists.
Describe the current behavior.
Model consuming RaggedTensors fails during evaluation in a distributed setting.
Describe the expected behavior.
The model should run during evaluation without any errors when exposed to a distributed setting.
Standalone code to reproduce the issue.
Colab Notebook: https://colab.research.google.com/drive/1U9oeed5OMAH1KvN5T455kAsB2Nsh1-KF?usp=sharing.
Source code / logs.
ValueError: in user code:
/usr/local/lib/python3.7/dist-packages/keras/engine/training.py:1330 test_function *
return step_function(self, iterator)
/usr/local/lib/python3.7/dist-packages/keras/engine/training.py:1319 step_function **
data = next(iterator)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/input_lib.py:693 __next__
return self.get_next()
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/input_lib.py:744 get_next
self, self._strategy, return_per_replica=False)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/input_lib.py:611 _get_next_as_optional
iterator._iterators[i].get_next_as_list()) # pylint: disable=protected-access
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/input_lib.py:1990 get_next_as_list
strict=True,
/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:206 wrapper
return target(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/deprecation.py:549 new_func
return func(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/control_flow_ops.py:1254 cond
return cond_v2.cond_v2(pred, true_fn, false_fn, name)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/cond_v2.py:95 cond_v2
op_return_value=pred)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py:1007 func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/input_lib.py:1989 <lambda>
lambda: _dummy_tensor_fn(data.element_spec),
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/input_lib.py:1853 _dummy_tensor_fn
return nest.map_structure(create_dummy_tensor, value_structure)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/nest.py:869 map_structure
structure[0], [func(*x) for x in entries],
/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/nest.py:869 <listcomp>
structure[0], [func(*x) for x in entries],
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/input_lib.py:1849 create_dummy_tensor
dummy_tensor, (row_splits,) * spec._ragged_rank, validate=False)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:206 wrapper
return target(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/ragged/ragged_tensor.py:745 from_nested_row_splits
result = cls.from_row_splits(result, splits, validate=validate)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:206 wrapper
return target(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/ragged/ragged_tensor.py:454 from_row_splits
return cls._from_row_partition(values, row_partition, validate=validate)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/ragged/ragged_tensor.py:348 _from_row_partition
return cls(values=values, internal=True, row_partition=row_partition)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/ragged/ragged_tensor.py:294 __init__
values.shape.with_rank_at_least(1)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/tensor_shape.py:1078 with_rank_at_least
raise ValueError("Shape %s must have rank at least %d" % (self, rank))
ValueError: Shape () must have rank at least 1
Cc: @Nilabhra
**This issue is copied from tensorflow/tensorflow#34491. **
The issue is moved here for better tracking since the keras code has been moved to keras-team/keras repo.
System information.
TensorFlow version (you are using):
master
Are you willing to contribute it (Yes/No) :
I need more detail
Describe the feature and the current behavior/state.
I think that we need to cover core image processing transformation with TF native ops.
Currently a core transformation in preprocessing still rely on numpy/scipy impl.
https://github.com/keras-team/keras/blob/master/keras/preprocessing/image.py#L2622
Describe the feature clearly here. Be sure to convey here why the requested feature is needed. Any brief description about the use-case would help.
Will this change the current api? How?
Who will benefit from this feature?
Hello Keras-Team :),
I use Tf2.8 with tf.keras.
For a web app I need to precalculate the shapes of the layers a user is using. So I have derived formulars for nearly all layers so far. But I am stuck on the convolutional layers.
For convolutional layers it is not allowed to have a dilation_rate > 1 and strides > 1. Why is that and why is it possible for other convolutional layers like SeperableConv or DepthwiseConv?
From my understanding defining a dilation rate > 1 can be understood as greater kernel size with gaps in it. So it should also be possible to jump with that „greater kernel“ a given stepsize (which is the strides) or not?
So far I came up with the following formular which works for any convolutional layer (except transpose of course) as long as one of the parameters dilation_rate or the strides are 1. (You don't have to get really into the following formula, what would really help me out is just the correct formula, but for completness sake I paste it here).
/*
For a given dimension:
p - is previus_shape dimension value
k - is kernel_size
d - dilation rate
s - strides rate
*/
if(p < k) return "invalid";
if(d > 1) k = k + (k-1) * (d-1);
if(padding === "valid") {
const kernel_poses = p-(k-1)-s; // the theoretical amount of positions we can place the kernel if strides (s) === 1
if(s===1) return Math.ceil(kernel_poses);
else return Math.ceil(kernel_poses/s); // here the returned value differs sometimes from what I get when I call model.summarize()
} else if(padding === "same") {
const kernel_poses = p; // the theoretival amount of positions we can place the kernel if strides (s) === 1
if(s===1) return Math.ceil(kernel_poses);
else return Math.ceil(kernel_poses/s);
}
To Summarize my questions and needs here:
Thx in advance <3
System information.
tf.keras.utils.plot_model(model,
to_file='model_dir/model.png'),
show_shapes=True,
show_dtype=True,
show_layer_names=True)
Describe the problem.
When I try to plot a model that contains layers with dictionary inputs, I get an error in the
model_to_dot
function in vis_utils.py
that says: Error: invalid label format
.
This error comes from having an invalid graphviz label name (defined here: https://github.com/keras-team/keras/blob/master/keras/utils/vis_utils.py#L293)
My input shape is a dictionary, and is in the form: {'a': (None, 1), 'b': (None, 2)}
.
If I call plot_model
with show_shapes=True
, then the shape will be added to the label name for graphbiz.
The problem is that the brackets in the inputlabels
and outputlabels
need to be escaped so that the node's label can be interpreted correctly by graphviz. (otherwise, graphviz interprets it as a nested label: https://graphviz.org/doc/info/shapes.html#record)
I fixed the issue by adding:
inputlabels = inputlabels.replace('{', '\{')
inputlabels = inputlabels.replace('}', '\}')
outputlabels = outputlabels.replace('{', '\{')
outputlabels = outputlabels.replace('}', '\}')
before https://github.com/keras-team/keras/blob/master/keras/utils/vis_utils.py#L293
but there must be a more elegant way of fixing this.
Describe the current behavior.
Plotting the model fails.
Describe the expected behavior.
The model gets plotted correctly
(Moving an issue from the tf repo)
System information
yes, mostly based on the example from https://www.tensorflow.org/guide/keras/save_and_serialize
Linux 59a52e5448f6 5.4.104+ keras-team/keras#1 SMP Sat Jun 5 09:50:34 PDT 2021 x86_64 x86_64 x86_64 GNU/Linux
)no
google colab version
v2.6.0-0-g919f693420e 2.6.0
3.7.12 (default, Sep 10 2021, 00:21:48) [GCC 7.5.0]
no
no
11.2
Tesla K80, 11441MiB
Describe the current behavior
When restoring a keras model with keras.models.load_model
, the returned model's optimizer is in the reset state (e.g. its weights
attribute is empty).
Describe the expected behavior
The original call:
reconstructed_model = tf.keras.models.load_model("my_model")
should have restored and kept the optimizer's weights.
Standalone code to reproduce the issue
import tensorflow as tf
import numpy as np
def get_model():
# Create a simple model.
inputs = tf.keras.Input(shape=(32,))
outputs = tf.keras.layers.Dense(1)(inputs)
model = tf.keras.Model(inputs, outputs)
model.compile(optimizer="adam", loss="mean_squared_error")
return model
model = get_model()
# Train the model.
test_input = np.random.random((128, 32))
test_target = np.random.random((128, 1))
model.fit(test_input, test_target)
# Calling `save('my_model')` creates a SavedModel folder `my_model`.
model.save("my_model")
# It can be used to reconstruct the model identically.
reconstructed_model = tf.keras.models.load_model("my_model")
print(reconstructed_model.optimizer.weights)
output:
4/4 [==============================] - 1s 4ms/step - loss: 0.1829
INFO:tensorflow:Assets written to: my_model/assets
[]
If we additionally provide a compile=False
argument, the optimizer's weights are restored:
reconstructed_model = tf.keras.models.load_model("my_model", compile=False)
for w in reconstructed_model.optimizer.weights:
print(w.shape)
output:
(32, 1)
(1,)
(32, 1)
(1,)
However, trying to use the restored optimizer fails with an exception:
reconstructed_model.compile(reconstructed_model.optimizer, loss="mean_squared_error")
reconstructed_model.fit(test_input, test_target)
output:
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-3-22a4ff24818b> in <module>()
1 reconstructed_model.compile(reconstructed_model.optimizer, loss="mean_squared_error")
----> 2 reconstructed_model.fit(test_input, test_target)
9 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
992 except Exception as e: # pylint:disable=broad-except
993 if hasattr(e, "ag_error_metadata"):
--> 994 raise e.ag_error_metadata.to_exception(e)
995 else:
996 raise
NotImplementedError: in user code:
/usr/local/lib/python3.7/dist-packages/keras/engine/training.py:853 train_function *
return step_function(self, iterator)
/usr/local/lib/python3.7/dist-packages/keras/engine/training.py:842 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:1286 run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:2849 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:3632 _call_for_each_replica
return fn(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/keras/engine/training.py:835 run_step **
outputs = model.train_step(data)
/usr/local/lib/python3.7/dist-packages/keras/engine/training.py:791 train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
/usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/optimizer_v2.py:522 minimize
return self.apply_gradients(grads_and_vars, name=name)
/usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/optimizer_v2.py:660 apply_gradients
apply_state)
/usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/optimizer_v2.py:707 _distributed_apply
var, apply_grad_to_update_var, args=(grad,), group=False)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:2595 update
var, fn, args=args, kwargs=kwargs, group=group)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:2473 _replica_ctx_update
return replica_context.merge_call(merge_fn, args=args, kwargs=kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:3064 merge_call
return self._merge_call(merge_fn, args, kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:3071 _merge_call
return merge_fn(self._strategy, *args, **kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:2471 merge_fn **
return self.update(var, fn, merged_args, merged_kwargs, group=group)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:2592 update
return self._update(var, fn, args, kwargs, group)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:3646 _update
return self._update_non_slot(var, fn, (var,) + tuple(args), kwargs, group)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:3652 _update_non_slot
result = fn(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/optimizer_v2.py:689 apply_grad_to_update_var **
update_op = self._resource_apply_dense(grad, var, **apply_kwargs)
/usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/optimizer_v2.py:1241 _resource_apply_dense
raise NotImplementedError("Must be implemented in subclasses.")
NotImplementedError: Must be implemented in subclasses.
TensorFlow version (you are using): 2.7.0
Are you willing to contribute it (Yes/No): YES
I am reopening the issue tensorflow/tensorflow#33675 to assess community interest in that feature and to discuss possible implementations.
Quoting the original issue, the requested feature can be described as:
Regarding tf.keras.callbacks.ReduceLROnPlateau: The min_delta parameter is currently an absolute number which indicates when a meaningful reduction in the monitored loss has accrued. It makes no sense to use an absolute number for two reasons -
- Every loss has a different dynamic range and hence a different definition for a meaningful reduction
- A "meaningful reduction" decreases as the training progresses. The higher the epoch the smaller of a change in loss is expected.
For these two reasons I think that a percentage of change in the monitored loss is much more useful.
I do not have access to the code that I used anymore, but the problem that I was trying to solve was:
[...] the loss is ~1e5 at the beginning of training, while the goal is to achieve a loss as low as 10 at the end. One can easily see that min_delta=10 has very different meanings in the beginning and in the end.
I was able to solve this by implementing a custom version of ReduceLROnPlateau
that accepted relative min_delta
s.
Just for the record, PyTorch supports this feature by accepting the parameters threshold
(equivalent to Kera's min_delta
) and threshold_mode
, that specifies whether threshold
should be considered an absolute or a relative change.
Yes: a new parameter should be added to the initializer of ReduceLROnPlateau
. This addition can be done in a backward-compatible manner with a sensible choice of default values.
Anyone who uses the ReduceLROnPlateau
callback, especially people working with models whose loss varies a lot during training.
I currently have two candidate solutions:
min_delta_mode: Literal['absolute', 'relative']
: passing min_delta_mode='absolute'
(the default behavior) instructs Keras to consider min_delta
as an absolute change, as in the current behavior; passing min_delta_mode='relative'
instructs Keras to consider min_delta
as a relative change.min_delta_rel: Optional[float]
: the user must pass either min_delta
or min_delta_rel
(but not both) - passing min_delta
is the current option; passing min_delta_rel
achieves the new behavior.Note that both candidates are equivalent, it's just a matter of choosing the best interface. Supposing that we choose option 1, the ReduceLROnPlateau._reset
method would be changed so that self.monitor_op
is defined depending on self.mode
and self.min_delta_mode
according to the following table:
mode |
min_delta_mode |
monitor_op |
---|---|---|
'min' |
'absolute' |
lambda current, best: np.less(current, best - self.min_delta) |
'max' |
'absolute' |
lambda current, best: np.greater(current, best + self.min_delta) |
'min' |
'relative' |
lambda current, best: np.less(current, (1 - self.min_delta)*best) |
'max' |
'relative' |
lambda current, best: np.greater(current, (1 + self.min_delta)*best) |
System information.
import tensorflow as tf
cat = ["Paris", "Singapore", "Auckland"]
str_lookup_layer = tf.keras.layers.StringLookup()
str_lookup_layer.adapt(cat)
lookup_and_embed = tf.keras.Sequential([
str_lookup_layer,
tf.keras.layers.Embedding(input_dim=str_lookup_layer.vocabulary_size(),
output_dim=2)
])
lookup_and_embed(tf.constant([["Paris"], ["Singapore"], ["Auckland"]])) # ERROR!
This code is available in this gist.
Describe the problem.
Since TF 2.8.0, using a tf.keras.layers.StringLookup
layer as the first layer in a Sequential
model raises an exception when calling the model: UnimplementedError: Exception encountered when calling layer "sequential" (type Sequential). Cast string to int64 is not supported [Op:Cast]
. The problem did not exist in TF 2.7.1.
Full stacktrace:
---------------------------------------------------------------------------
UnimplementedError Traceback (most recent call last)
[<ipython-input-2-ee4b4b94a15e>](https://localhost:8080/#) in <module>()
7 output_dim=2)
8 ])
----> 9 lookup_and_embed(tf.constant([["Paris"], ["Singapore"], ["Auckland"]]))
1 frames
[/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py](https://localhost:8080/#) in error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
[/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py](https://localhost:8080/#) in raise_from_not_ok_status(e, name)
7184 def raise_from_not_ok_status(e, name):
7185 e.message += (" name: " + name if name is not None else "")
-> 7186 raise core._status_to_exception(e) from None # pylint: disable=protected-access
7187
7188
UnimplementedError: Exception encountered when calling layer "sequential" (type Sequential).
Cast string to int64 is not supported [Op:Cast]
Call arguments received:
• inputs=tf.Tensor(shape=(3, 1), dtype=string)
• training=None
• mask=None
Describe the expected behavior.
In TF 2.7.1, the code works and gives an output similar to this one:
<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[-0.02887753, -0.01268407],
[ 0.04601531, -0.02668235],
[ 0.03409723, -0.03205377]], dtype=float32)>
I have written a custom Keras CNN-based GAN for synthesizing tabular datasets. The code works fine when I use reasonable batch size (generally 64 to 1024). However, users are allowed to specify a batch size and when they use large ones, I try to handle by catching ResourceExhaustedErrors and step the batch size down. I found that doing this, eventually leads to Check failed error in the post title and I can't catch the exception, the process just dies. This occurs using the following environments:
Windows
Tensorflow 2.6.0 (pip install)
Cuda 11.3
Titan RTX 24GB Founders Edition card
Driver: 465.89
AWS Linux (RHEL7)
Tensorflow 2.6.1 (pip install)
Cuda 11.3 and now 11.5
V100
Driver: 495.29.05
Batch shape is (10240, 328) so at least 2 samples (per related post).
train_step function:
@tf.function()
def _train_step(self, real_data):
noise = tf.random.normal([self._batch_size, self._noise_dim])
with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
synthetic_data = self._generator(noise, training=True)
real_data_pred = self._discriminator(real_data, training=True)
synth_data_pred = self._discriminator(synthetic_data, training=True)
gen_loss = self.generator_loss(synth_data_pred)
disc_loss = self.discriminator_loss(real_data_pred, synth_data_pred)
gradients_of_generator = gen_tape.gradient(gen_loss, self._generator.trainable_variables)
gradients_of_discriminator = disc_tape.gradient(disc_loss, self._discriminator.trainable_variables)
self.generator_optimizer.apply_gradients(zip(gradients_of_generator, self._generator.trainable_variables))
self.discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator,
self._discriminator.trainable_variables))
Also, happens with and without mixed precision.
Any help would be greatly appreciated.
System information.
Describe the current behavior
When restoring the model from config getting
ValueError: Got 0 inputs for equation "bmhwf,bmoh->bmowf", expecting 2
Although if the tf.einsum
op is wrapped as a Keras Lambda layer, it works (able to dump to config and restore).
Describe the expected behavior
Should be able to restore the model from config.
Standalone code to reproduce the issue
https://colab.research.google.com/drive/10X2dDb_EGLL64w-MyMU4g9dSrHLw9PvI?usp=sharing
import tensorflow as tf
from tensorflow import keras
x1 = keras.Input(shape=(2, 4, 4, 1))
x2 = keras.Input(shape=(2, 2, 4))
x = tf.einsum('bmhwf,bmoh->bmowf', x1, x2)
model = keras.Model(inputs=[x1, x2], outputs=x)
model = tf.keras.Model.from_config(model.get_config())
Source code / logs.
Log from colab
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-5a94aa47793c> in <module>()
7 x = tf.einsum('bmhwf,bmoh->bmowf', x1, x2)
8 model = keras.Model(inputs=[x1, x2], outputs=x)
----> 9 model = tf.keras.Model.from_config(model.get_config())
4 frames
/usr/local/lib/python3.7/dist-packages/keras/engine/training.py in from_config(cls, config, custom_objects)
2446 with generic_utils.SharedObjectLoadingScope():
2447 input_tensors, output_tensors, created_layers = (
-> 2448 functional.reconstruct_from_config(config, custom_objects))
2449 # Initialize a model belonging to `cls`, which can be user-defined or
2450 # `Functional`.
/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py in reconstruct_from_config(config, custom_objects, created_layers)
1336 while layer_nodes:
1337 node_data = layer_nodes[0]
-> 1338 if process_node(layer, node_data):
1339 layer_nodes.pop(0)
1340 else:
/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py in process_node(layer, node_data)
1280 input_tensors = (
1281 base_layer_utils.unnest_if_single_tensor(input_tensors))
-> 1282 output_tensors = layer(input_tensors, **kwargs)
1283
1284 # Update node index map.
/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/special_math_ops.py in _einsum_v2_parse_and_resolve_equation(equation, input_shapes)
1279 if len(input_shapes) != len(input_labels):
1280 raise ValueError('Got {} inputs for equation "{}", expecting {}'.format(
-> 1281 len(input_shapes), equation, len(input_labels)))
1282
1283 # Special case: if there are no '->', then we create output subscripts from
ValueError: Exception encountered when calling layer "tf.einsum" (type TFOpLambda).
Got 0 inputs for equation "bmhwf,bmoh->bmowf", expecting 2
Call arguments received:
• equation='bmhwf,bmoh->bmowf'
• inputs=<class 'inspect._empty'>
• kwargs=<class 'inspect._empty'>
System information.
Describe the problem.
My network model works well without specifying class_weight
in model.fit()
.
However, when I specify class_weight
in model.fit()
, no matter what weight values I give, keras/tensorflow failed with the following error:
File "/opt/local/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 55, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:
indices[9] = 16 is not in [0, 10)
[[{{node GatherV2}}]]
[[IteratorGetNext]] [Op:__inference_train_function_43205]
Keras/tensorflow failed with the above error even when I give all classes an equal weight 1.0 (which is equivalent to no class weights), as the following (I have 10 classes):
class_weights_dict = {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0}
history = model.fit(train_input,
train_true_labels,
class_weight=class_weights_dict,
validation_split=validation_split,
shuffle=True,
epochs=epochs,
batch_size=batch_size)
And I verified that my true labels array train_true_labels
contains only integers 0-9, as the following:
values = np.unique(train_true_labels)
print(values)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
However, when I do not specify class_weight
in model.fit()
, the training for my model works just fine.
So it looks like I just cannot use class_weight
in training. But my classes are highly imbalanced; not using class weights would train a useless model.
I would greatly appreciate any solution for this issue.
Thank you very much!
Could anyone please teach me how to convert the Functional API to Model subclassing in this TensorFlow Official Tutorial? I suppose an elegant chuck of code should be what combines the Normalization layer with the remaining layers. I went through the tutorial, trying to reproduced the process to fit my purpose. I have only 1 predictor (age
) and 1 target variable (se
), both are continues variables.
Here's how I refactored the code:
from tensorflow.data import Dataset
from tensorflow.keras import Input
from tensorflow.keras.backend import clear_session
from tensorflow.keras.layers import Normalization
def df_to_dataset(dataframe, batch_size, shuffle=True):
dataframe = dataframe.copy()
labels = dataframe.pop("se")
ds = Dataset.from_tensor_slices((dict(dataframe), labels))
if shuffle:
ds = ds.shuffle(buffer_size=len(dataframe))
ds = ds.batch(batch_size)
ds = ds.prefetch(batch_size)
return ds
def get_normalization_layer(name, dataset):
normalizer = Normalization(axis=None)
feature_ds = dataset.map(lambda x, y: x[name])
normalizer.adapt(feature_ds)
return normalizer
test_ds = df_to_dataset(test, batch_size=5, shuffle=False)
[(train_features, label_batch)] = test_ds.take(1)
print(f"Every Feature: {list(train_features.keys())}")
print(f"A batch of ages: {train_features['age']}")
print(f"A batch of targets: {label_batch}")
batch_size = 128
train_ds = df_to_dataset(train, batch_size=batch_size)
val_ds = df_to_dataset(val, batch_size=batch_size, shuffle=False)
test_ds = df_to_dataset(test, batch_size=batch_size, shuffle=False)
all_inputs = []
encoded_features = []
# Numeric features
clear_session()
for header in ["age"]:
numeric_col = Input(shape=(1, ), name=header)
normalization_layer = get_normalization_layer(header, train_ds)
encoded_numeric_col = normalization_layer(numeric_col)
all_inputs.append(numeric_col)
encoded_features.append(encoded_numeric_col)
Below the Functional API part, which worked as expected, like in the tutorial:
import tensorflow as tf
from tensorflow import math
from tensorflow.keras import Model, Input
from tensorflow.keras.backend import clear_session
from tensorflow.keras.losses import Loss
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import Dense, Normalization, Concatenate, Dropout
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
class QuantileLoss(Loss):
def __init__(self, quantiles):
super().__init__()
self.quantiles = tf.convert_to_tensor(quantiles)
def call(self, y_true, y_pred):
y_true = tf.convert_to_tensor(y_true)
y_pred = tf.convert_to_tensor(y_pred)
errors = math.subtract(y_true, y_pred)
loss = math.reduce_mean(
math.maximum(
math.multiply(self.quantiles, errors),
math.multiply(
math.subtract(
self.quantiles, 1
),
errors
)
),
axis=-1
)
return loss
clear_session()
earlystopping = EarlyStopping(patience=10)
lr_schedule = ReduceLROnPlateau(
patience=5,
monitor="val_loss",
verbose=1
)
callbacks = [lr_schedule, earlystopping]
quantiles = [0.021, 0.157, 0.5, 0.841, 0.977, 0.998]
all_features = Concatenate()(encoded_features)
print(all_features.shape)
x = Dense(256, activation="selu")(all_features)
x = Dropout(0.3)(x)
x = Dense(64, activation="selu")(x)
output = Dense(len(quantiles))(x)
model = Model(all_inputs, output)
quantile_loss = QuantileLoss(quantiles)
model.compile(optimizer=Adam(learning_rate=0.001), loss=quantile_loss)
history = model.fit(train_ds, validation_data=val_ds, epochs=100, callbacks=callbacks)
I was trying to refactor the Functional implementation to make the code look more elegant.
class QuantileRegressor(Model):
def __init__(self, quantiles, hidden_units):
super().__init__()
self.quantiles = quantiles
self.concatenate = Concatenate()
self.normalizer = Normalization(axis=None)
self.hidden_dense = Dense(hidden_units, activation="selu")
self.dropout = Dropout(0.3)
self.output_dense = Dense(len(quantiles))
def call(self, inputs):
self.normalizer.adapt(inputs["age"])
# The line above gave me an error!
# Is it a good idea to place encoded_features, all_inputs here?
# inputs here seemed to be a dictionary.
return None
earlystopping = EarlyStopping(patience=10)
lr_schedule = ReduceLROnPlateau(
patience=5,
monitor="val_loss",
verbose=1
)
callbacks = [lr_schedule, earlystopping]
quantiles = [0.021, 0.157, 0.5, 0.841, 0.977, 0.998]
hidden_units = 256
clear_session()
model = QuantileRegressor(quantiles, hidden_units)
quantile_loss = QuantileLoss(quantiles)
model.compile(optimizer=Adam(learning_rate=0.001), loss=quantile_loss)
history = model.fit(train_ds, validation_data=val_ds, epochs=10, callbacks=callbacks)
self.normalizer.adapt(inputs["age"])
in the call
method resulted in
RuntimeError: in user code:
/usr/local/lib/python3.7/dist-packages/keras/engine/training.py:853 train_function *
return step_function(self, iterator)
<ipython-input-24-3f4f9f8eec72>:18 call *
self.normalizer.adapt(inputs["age"])
/usr/local/lib/python3.7/dist-packages/keras/engine/base_preprocessing_layer.py:230 adapt **
_disallow_inside_tf_function('adapt')
/usr/local/lib/python3.7/dist-packages/keras/engine/base_preprocessing_layer.py:591 _disallow_inside_tf_function
raise RuntimeError(error_msg)
RuntimeError: Detected a call to `PreprocessingLayer.adapt` inside a `tf.function`. `PreprocessingLayer.adapt is a high-level endpoint that manages its own `tf.function`. Please move the call to `PreprocessingLayer.adapt` outside of all enclosing `tf.function`s. Note that you can call a `PreprocessingLayer` directly on `Tensor`s inside a `tf.function` like: `layer(x)`, or update its state like: `layer.update_state(x)`.
That's why I am asking the question. What is the standard or preferred way using Model Subclassing, when adapting a Normalization
layer?
System information
Describe the current behavior
When implementing custom prediction logic for Keras models using predict_step
as explained here, saving and restoring the Keras model with the saved model format ignores the custom prediction logic. Unfortunately the code silently fails and doesn't inform the user that this is not supported, which could lead to detrimental bugs.
The issue is explained in detail with a minimal example in this colab notebook.
I know I can save a custom serving function using
class MyModel(tf.keras.Model):
@tf.function(input_signature=[tf.TensorSpec(shape=[None], dtype=tf.string)])
def serve(self, data):
...
as described here.
But I feel the current behaviour breaks with user expectations since the saved model format is now the default saving format but doesn't support all of the features and might silently fail resulting in unexpected behaviour.
This makes it necessary for users to break the abstraction and start using low level TF APIs instead, which I think doesn't fit well with the progressive disclosure of complexity that Keras tends to strive for.
Describe the expected behavior
Keras models should preserve custom predict_step
logic when saving and restoring models.
Standalone code to reproduce the issue
import tensorflow as tf
import numpy as np
class FullyConnectedModel(tf.keras.Model):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.dense = tf.keras.layers.Dense(10)
def predict_step(self, data):
logits = self(data, training=False)
return tf.argmax(logits, axis=-1)
def call(self, inputs):
return self.dense(inputs)
x, y = np.random.uniform(size=(128, 20)).astype(np.float32), np.random.randint(0, 10, size=(128))
model = FullyConnectedModel()
model.compile(optimizer="sgd", loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))
model.fit(x, y, epochs=2, batch_size=32)
model.save("/tmp/model", save_traces=True)
reloaded_model = tf.keras.models.load_model("/tmp/model")
y_pred = model.predict(x)
reloaded_y_pred = reloaded_model.predict(x)
np.testing.assert_allclose(reloaded_y_pred, y_pred)
See this notebook for more information.
Also checkout tensorflow/tensorflow#48149 which was originally posted to TF before the move to keras-team/keras.
Cross post: tensorflow/tensorflow#50721
System information
Describe the current behavior
The logic to test if a structure is nested is wrong. Example of failing case: {"a": {"b": 1}}
if (isinstance(self._nested_inputs, (dict, list, tuple)) and
len(self._nested_inputs) != len(self.inputs)):
to:
if max([len(path) for path in nest.yield_flat_paths(
self._nested_inputs)]) > 1:
Standalone code to reproduce the issue
import tensorflow as tf
import numpy as np
input_tensor_shape = [16]
random_tensor = np.random.random([1]+input_tensor_shape)
def sequential():
layers = [tf.keras.layers.InputLayer(input_shape=input_tensor_shape),
tf.keras.layers.Dense(8)]
return tf.keras.Sequential(layers=layers)
network = sequential()
network2 = sequential()
nested_input = {'input': {'sub_input1': network.input,
'sub_input2': network2.input}}
model = tf.keras.Model(inputs=nested_input, outputs=network.output)
input = {'input': {'sub_input1': random_tensor,
'sub_input2': random_tensor}}
# Works
model(input)
fail_nested_input = {'input': {'sub_input': network.input}}
fail_model = tf.keras.Model(inputs=fail_nested_input, outputs=network.output)
input = {'input': {'sub_input': random_tensor}}
# Fails
fail_model(input)
System information.
import tensorflow as tf
import numpy as np
num = 10
embedding_size = 5
window_size = 2
emb = tf.keras.layers.Embedding(
num, embedding_size, input_length=1, mask_zero=True
)
td = tf.keras.layers.TimeDistributed(emb)
inp = tf.constant(
np.array([
[0,1],[3,0],[4,0]
])
)
inp = tf.keras.layers.Reshape((window_size, 1))(inp)
print(inp)
out = td(inp)
print(out.shape,out._keras_mask)
out2 = tf.keras.layers.Reshape((window_size,embedding_size ))(out)
print(out2.shape)
#The following throws an error, because Reshape dropped the mask from the previous tensor:
print(out2._keras_mask)
Error from last line:
AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute '_keras_mask'
Describe the problem.
Applying the Reshape layer to a tensor that has a mask (e.g., from the Embedding Layer) gets rid of the mask, instead of reshaping it.
This is a problem because in order to re-use an Embedding Layer for different outputs, I need to apply reshape after certain transformations, before passing to other layers (e.g., LSTM), but instead of also reshaping the mask of the tensor (as a user would expect), Reshape apparently discards the mask entirely, so I cannot use it.
To resolve this I would have to write a custom version of the Reshape layer to replace it.
Describe the current behavior.
Reshape layer applied to a tensor with a mask discards the mask instead of reshaping it.
Describe the expected behavior.
Reshape layer should not discard the mask of a tensor when reshaping it, but instead should correspondingly reshape the mask as well (so the mask now correctly applies to the reshaped tensor and can be passed to subsequent layers like LSTM).
I.e., in the code snippet above, for 'out', I have a tensor with shape (3, 2, 1, 5) and mask =
tf.Tensor(
[[[False]
[ True]]
[[ True]
[False]]
[[ True]
[False]]], shape=(3, 2, 1), dtype=bool)
After the Reshape application above, the output is shape (3,2,5) - I would expect the mask to be correspondingly reshaped like:
new_mask = tf.keras.layers.Reshape((window_size,))(out._keras_mask)
So that the new mask is now shape (3,2) =
<tf.Tensor: shape=(3, 2), dtype=bool, numpy=
array([[False, True],
[ True, False],
[ True, False]])>
Standalone code to reproduce the issue.
See the code block at the beginning.
Source code / logs.
N/A
System information.
Describe the problem.
MobileNetV3 models can't estimate output shape of the intermediate layers because some functions (activations like hard_swith, i suppose) did not wrapped with layers.
Describe the current behavior.
Exception raised when compute_output_shape
executed.
Describe the expected behavior.
Just like ALL other models in keras.applications
, MobileNetV3* models should be able to compute their output shapes.
Standalone code to reproduce the issue.
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/Jupyter/any notebook.
Source code / logs.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in compute_output_shape(self, input_shape)
782 try:
--> 783 outputs = self(inputs, training=False)
784 except TypeError as e:
9 frames
/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
976 return self._functional_construction_call(inputs, args, kwargs,
--> 977 input_list)
978
/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in _functional_construction_call(self, inputs, args, kwargs, input_list)
1114 outputs = self._keras_tensor_symbolic_call(
-> 1115 inputs, input_masks, args, kwargs)
1116
/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in _keras_tensor_symbolic_call(self, inputs, input_masks, args, kwargs)
847 else:
--> 848 return self._infer_output_signature(inputs, args, kwargs, input_masks)
849
/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in _infer_output_signature(self, inputs, args, kwargs, input_masks)
887 inputs = self._maybe_cast_inputs(inputs)
--> 888 outputs = call_fn(inputs, *args, **kwargs)
889
/usr/local/lib/python3.7/dist-packages/keras/layers/core.py in _call_wrapper(*args, **kwargs)
1349 def _call_wrapper(*args, **kwargs):
-> 1350 return self._call_wrapper(*args, **kwargs)
1351 self.call = tf.__internal__.decorator.make_decorator(function, _call_wrapper)
/usr/local/lib/python3.7/dist-packages/keras/layers/core.py in _call_wrapper(self, *args, **kwargs)
1381 kwargs.pop('name', None)
-> 1382 result = self.function(*args, **kwargs)
1383 self._check_variables(created_variables, tape.watched_variables())
/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
205 try:
--> 206 return target(*args, **kwargs)
207 except (TypeError, ValueError):
TypeError: _add_dispatch() missing 1 required positional argument: 'y'
The above exception was the direct cause of the following exception:
NotImplementedError Traceback (most recent call last)
<ipython-input-3-fe10d8214bfb> in <module>()
1 base_model = mobilenet_v3.MobileNetV3Large(include_top=False, weights=None)
----> 2 base_model.compute_output_shape(input_shape=[224, 224, 3])
/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py in compute_output_shape(self, input_shape)
468 layer_input_shapes = tf_utils.convert_shapes(
469 layer_input_shapes, to_tuples=True)
--> 470 layer_output_shapes = layer.compute_output_shape(layer_input_shapes)
471 # Convert back to TensorShapes.
472 layer_output_shapes = tf_utils.convert_shapes(
/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in compute_output_shape(self, input_shape)
787 'layer\'s output. Please implement the '
788 '`compute_output_shape` method on your layer (%s).' %
--> 789 self.__class__.__name__) from e
790 return tf.nest.map_structure(lambda t: t.shape, outputs)
791 raise NotImplementedError(
NotImplementedError: We could not automatically infer the static shape of the layer's output. Please implement the `compute_output_shape` method on your layer (TFOpLambda).
System information.
Describe the problem.
Keras model does not converge after saving and loading.
Describe the current behavior.
After calling model.save(...)
and model = tf.keras.models.load_model(...)
, the model failed to converge.
Describe the expected behavior.
Adding model.save(...)
and model = tf.keras.models.load_model(...)
should not effect the training process.
Standalone code to reproduce the issue.
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
########### added lines ###########
model.save("/tmp/mnist_model")
model = tf.keras.models.load_model('/tmp/mnist_model')
################################
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)
Source code / logs.
After adding saving and loading, the model does not converge:
2022-02-08 09:03:19.403489: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-02-08 09:03:19.698194: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
Epoch 1/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.2983 - accuracy: 0.1004
Epoch 2/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.1451 - accuracy: 0.0992
Epoch 3/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.1069 - accuracy: 0.0990
Epoch 4/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.0881 - accuracy: 0.0991
Epoch 5/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.0754 - accuracy: 0.0989
313/313 [==============================] - 1s 2ms/step - loss: 0.0775 - accuracy: 0.0992
Remove the saving and loading code, the model converges as expected:
2022-02-08 09:05:18.683461: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Epoch 1/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.3010 - accuracy: 0.9133
Epoch 2/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.1452 - accuracy: 0.9574
Epoch 3/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.1099 - accuracy: 0.9666
Epoch 4/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.0878 - accuracy: 0.9729
Epoch 5/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.0746 - accuracy: 0.9766
313/313 [==============================] - 1s 2ms/step - loss: 0.0744 - accuracy: 0.9770
I want to test the loss function, mse in keras by myself. However, the calculated answers are different. The definition of mse is below: https://en.wikipedia.org/wiki/Mean_squared_error
The test code is below:
from keras.datasets import boston_housing
import numpy as np
(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()
x_train = train_data.astype(np.float32)
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(13,)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop',loss='mse', metrics=['mae'])
y_train = train_targets.astype(np.float32)
# y_test = test_targets.astype(np.float32)
model.fit(x_train,y_train,epochs=1,batch_size=404)
print(np.mean((y_train - model.predict(x_train).ravel()) ** 2))
It shows that the loss function is around 816 in keras. However, from the definition of mse, the results is around 704. Why are the results different here?
System information
Describe the feature and the current behavior/state.
Currently tensorflow throws an error if we input temporal sample_weights for a model that's fitting inputs/outputs that are in the format of ragged tensors. Example:
#input in general has shape (N_inputs, variable length, N_input_channels)
X = [[[4.,3,2],[2,1,3],[-1,2,1]],
[[1,2,3],[3,2,4]]]
X = tf.ragged.constant(X, ragged_rank=1, dtype=tf.float64)
#output in general has shape (N_inputs, variable but same as corresponding input, N_classification_classes)
Y = [[[0,0,1],[0,1,0],[1,0,0]],
[[0,0,1],[1,0,0]]]
Y = tf.ragged.constant(Y, ragged_rank=1)
#Documentation says for temporal data we can pass 2D array with shape (samples, sequence_length)
weights = [[100,1,1],
[100,1]]
weights = np.array(weights)
model = SimpleModel(width=16, in_features=3, out_features=3)
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.fit(X,Y) #works fine
model.fit(X,Y, sample_weight=weights) #throws error
Where the error thrown is ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type list)
. If we do the equivalent operator for a non-ragged tensors
#input in general has shape (N_inputs, 2, N_input_channels)
X = [[[4.,3,2],[2,1,3]],
[[1,2,3],[3,2,4]]]
X = tf.constant(X, dtype=tf.float64)
#output in general has shape (N_inputs, 2, N_classification_classes)
Y = [[[0,0,1],[0,1,0]],
[[0,0,1],[1,0,0]]]
Y = tf.constant(Y)
#Documentation says for temporal data we can pass 2D array with shape (samples, sequence_length)
weights = [[100,1],
[100,1]]
weights=np.array(weights)
model = SimpleModel(width=16, in_features=3, out_features=3)
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.fit(X,Y) #works fine
model.fit(X,Y, sample_weight=weights) #also works fine
Everything works fine. The desired feature would allow passing of sample_weights for ragged tensors in the same way we could pass sample_weights for non-ragged tensors
Will this change the current api? How?
This would change the tf.keras.Model.fit
api so that ragged sample_weights are supported
Who will benefit with this feature?
People working with variable length data. This occurs in areas like computer vision and applications of deep learning to particle physics. This feature would allow people working with ragged tensors to deal with underrepresented classes in temporal data via reweighing.
Any Other info.
Definition of SimpleLayer and SimpleModel used above
class SimpleLayer(tf.keras.layers.Layer):
"""Just dummy layer to illustrate sample_weight for layer"""
def __init__(self, in_features, out_features, n):
super(SimpleLayer, self).__init__()
self.out_features = out_features
self.in_features = in_features
self.Gamma = self.add_weight(name='Gamma'+str(n),
shape=(in_features, out_features),
initializer='glorot_normal', trainable=True)
def call(self, inputs):
#uses ragged map_flat_values for Ragged tensors to handle
#variable number of jet
xG = tf.ragged.map_flat_values(tf.matmul, inputs, self.Gamma)
return xG
class SimpleModel(tf.keras.Model):
"""Composes SimpleLayer above to create simple network for ragged tensors"""
def __init__(self, width, in_features, out_features, Sigma=tf.nn.leaky_relu):
super(SimpleModel, self).__init__()
self.in_features = in_features
self.out_features = out_features
self.width = width
self.first_layer = SimpleLayer(self.in_features, self.width, 0)
self.hidden = SimpleLayer(self.width, self.width, 1)
self.last_layer = SimpleLayer(self.width, self.out_features, 2)
self.Sigma = Sigma
def call(self, inputs):
#use map_flat_values to apply activation to ragged tensor
x = tf.ragged.map_flat_values(self.Sigma, self.first_layer(inputs))
x = tf.ragged.map_flat_values(self.Sigma, self.hidden(x))
x = tf.ragged.map_flat_values(tf.nn.softmax, self.last_layer(x))
return x
System information.
TensorFlow version: 2.8.0-rc1
Describe the feature and the current behavior/state.
https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/TensorBoard says:
histogram_freq
: frequency (in epochs) at which to compute activation and weight histograms for the layers of the model. If set to 0, histograms won't be computed. Validation data (or split) must be specified for histogram visualizations.
But I am fairly certain activation histograms are not written, see also tensorflow/tensorflow#39755 and tensorflow/tensorflow#42027.
Solutions:
Will this change the current api? How?
1 will not change the API; 2 will change the docs.
Who will benefit from this feature?
Everyone
System information.
Describe the problem.
According to https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/TensorBoard, the value update_freq=N
should collect logs every N
batches. However, no logs are generated after batches, only after an epoch.
Describe the current behavior.
No logs are generated after N
training batches.
Describe the expected behavior.
Logs should be generated after N
training batches.
Details
The feature for collecting batch_*
summaries was removed in keras-team/keras@7d06227 -- see how write_scalar_summaries
was removed from keras/engine/training.py
.
It should be enough to just revert the given commit.
Hello,
When I created a keras model, the name of the layers were modified comparing to the tensorflow operators. Moreover, the naming in tensorflow seems not work as expected.
I have an example code to reproduce the issue. I installed tensorflow 2.7 and keras 2.7 on a windows 10 machine (version 21H1, build 19043.1348). I expected the operators/layers named as "Space2Depth", "Multiplication", "Depth2Space", but it's not the case.
Can you have a look on this issue? I also open an issue in the tensorflow github: tensorflow/tensorflow#53045
Thank you very much
import tensorflow as tf
def sample_network(input_layer):
s2d = tf.nn.space_to_depth(input_layer, block_size=2, name="Space2Depth")
mul = tf.multiply(s2d, 10.0, name="Multiplication")
d2s = tf.nn.depth_to_space(mul, block_size=2, name="Depth2Space")
print(s2d.name, d2s.name, mul.name)
return d2s
if __name__ == "__main__":
input_net = tf.keras.Input(shape=(64, 64, 3), dtype=tf.float32, name="inputLayer")
output = sample_network(input_net)
model = tf.keras.Model(inputs=input_net, outputs=output)
for layer in model.layers:
print("keras:", layer.name)
It prints out:
tf.nn.space_to_depth/SpaceToDepth:0 tf.nn.depth_to_space/DepthToSpace:0 tf.math.multiply/Mul:0
keras: inputLayer
keras: tf.nn.space_to_depth
keras: tf.math.multiply
keras: tf.nn.depth_to_space
Please go to TF Forum for help and support:
https://discuss.tensorflow.org/tag/keras
If you open a GitHub issue, here is our policy:
It must be a bug, a feature request, or a significant problem with the documentation (for small docs fixes please send a PR instead).
The form below must be filled out.
Here's why we have that policy:.
Keras developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.
System information.
You can collect some of this information using our environment capture script:
https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh
You can obtain the TensorFlow version with:
python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
Describe the problem.
When I create a simple model with a dummy Concatenate layer (i.e. the concatenation receives one single element), I am able to save it successfully, but the subsequent model loading fails.
Describe the current behavior.
Loading a trained model fails.
Describe the expected behavior.
The model loading should finish without errors.
Standalone code to reproduce the issue.
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/Jupyter/any notebook.
import tensorflow as tf
if __name__ == "__main__":
input_layer = tf.keras.Input(shape=[100])
dense_layer = tf.keras.layers.Dense(1)(input_layer)
concatenate_layer = tf.keras.layers.Concatenate()([dense_layer])
model = tf.keras.Model([input_layer], [concatenate_layer])
model.compile(optimizer="adam", loss="mean_absolute_error")
model.save("model.h5")
loaded_model = tf.keras.models.load_model("model.h5")
Source code / logs.
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.
Full traceback:
Traceback (most recent call last):
File "/Users/stefan/workspace/tierra/bug.py", line 10, in <module>
loaded_model = tf.keras.models.load_model("model.h5")
File "/Users/stefan/workspace/tierra/.env/lib/python3.9/site-packages/keras/saving/save.py", line 200, in load_model
return hdf5_format.load_model_from_hdf5(filepath, custom_objects,
File "/Users/stefan/workspace/tierra/.env/lib/python3.9/site-packages/keras/saving/hdf5_format.py", line 180, in load_model_from_hdf5
model = model_config_lib.model_from_config(model_config,
File "/Users/stefan/workspace/tierra/.env/lib/python3.9/site-packages/keras/saving/model_config.py", line 52, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "/Users/stefan/workspace/tierra/.env/lib/python3.9/site-packages/keras/layers/serialization.py", line 208, in deserialize
return generic_utils.deserialize_keras_object(
File "/Users/stefan/workspace/tierra/.env/lib/python3.9/site-packages/keras/utils/generic_utils.py", line 674, in deserialize_keras_object
deserialized_obj = cls.from_config(
File "/Users/stefan/workspace/tierra/.env/lib/python3.9/site-packages/keras/engine/functional.py", line 662, in from_config
input_tensors, output_tensors, created_layers = reconstruct_from_config(
File "/Users/stefan/workspace/tierra/.env/lib/python3.9/site-packages/keras/engine/functional.py", line 1283, in reconstruct_from_config
process_node(layer, node_data)
File "/Users/stefan/workspace/tierra/.env/lib/python3.9/site-packages/keras/engine/functional.py", line 1231, in process_node
output_tensors = layer(input_tensors, **kwargs)
File "/Users/stefan/workspace/tierra/.env/lib/python3.9/site-packages/keras/engine/base_layer.py", line 976, in __call__
return self._functional_construction_call(inputs, args, kwargs,
File "/Users/stefan/workspace/tierra/.env/lib/python3.9/site-packages/keras/engine/base_layer.py", line 1114, in _functional_construction_call
outputs = self._keras_tensor_symbolic_call(
File "/Users/stefan/workspace/tierra/.env/lib/python3.9/site-packages/keras/engine/base_layer.py", line 848, in _keras_tensor_symbolic_call
return self._infer_output_signature(inputs, args, kwargs, input_masks)
File "/Users/stefan/workspace/tierra/.env/lib/python3.9/site-packages/keras/engine/base_layer.py", line 886, in _infer_output_signature
self._maybe_build(inputs)
File "/Users/stefan/workspace/tierra/.env/lib/python3.9/site-packages/keras/engine/base_layer.py", line 2659, in _maybe_build
self.build(input_shapes) # pylint:disable=not-callable
File "/Users/stefan/workspace/tierra/.env/lib/python3.9/site-packages/keras/utils/tf_utils.py", line 259, in wrapper
output_shape = fn(instance, input_shape)
File "/Users/stefan/workspace/tierra/.env/lib/python3.9/site-packages/keras/layers/merge.py", line 489, in build
raise ValueError('A `Concatenate` layer should be called '
ValueError: A `Concatenate` layer should be called on a list of at least 1 input.
System information
Description of the problem
To solve a binary classification problem, I have a keras model that processes categorical input (as as well as numeric input).
I need to save (model.save
) and load (tf.keras.models.load_model
) the model multiple times (performig training of the model inbetween).
I expect that the model consumes constant disk space and constant RAM everytime I load the model since the architecture does not change (only the parameter values change).
This does not happen when the model contains an IntegerLookup
layer followed by a CategoryEncoding
layer.
The issue can be reproduced without training the model at all.
Here is a minimal code example that creates a model and saves it to disk:
import tensorflow as tf
import numpy as np
input_layer = tf.keras.Input(shape=(1,), dtype="int32")
index = tf.keras.layers.IntegerLookup(max_values=2)
index.adapt(np.array(range(2)))
encoder = tf.keras.layers.CategoryEncoding(max_tokens=index.vocab_size())
encoded_layer = encoder(index(input_layer))
output_layer = tf.keras.layers.Dense(2, activation=tf.keras.activations.softmax)(encoded_layer)
model = tf.keras.Model(input_layer, output_layer)
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy")
model.save("model")
Every time I execute
model = tf.keras.models.load_model("model")
model.save("model")
the space that the model consumes on disk increases by approx. 8 kB.
The even worse: When I load the model, the RAM useage increases by approx. 9 MB in each iteration.
So after 100 iterations, the model needs approx. 1 MB on disk and 950 MB RAM (which is problematic).
This also happens if I start a new python process in each iteration.
In my application, the memory consumption grows even faster because the model has several input layers and also several inner layers.
This makes the model unusable after some iterations because I cannot load it anymore.
Additionaly, of course, the load and save cycles are getting slower with each repetition.
So far, I could reproduce this issue on tensorflow versions 2.6 and 2.7 running on python 3.7, 3.8 or 3.9. The behavior is identical.
I originally posted this problem as a tensorflow issue.
System information.
Describe the problem.
In real case I use tf.data.Dataset (based on tensorflow_datasets) instance to train model.
One big difference from default examples of keras.Model.fit + Dataset is unknown (variable) dataset length.
In my case dataset length is variable (+- 20%) because i make some random augmentations with filtering out some of them. See provided colab link to see what i mean.
As result when first epoch is finished (dataset has reached OutOfRangeError), keras remembers current step an if the same dataset on the next epoch has smaller length, all model training will be stopped.
Describe the current behavior.
Model stops training if second/third/etc dataset iterator has length smaller then first one.
Describe the expected behavior.
Model should not stop training. It can print warning, but not stop it.
Standalone code to reproduce the issue.
https://colab.research.google.com/drive/1fY4v9WBRxfsywDyKKidu-lmFpaPdAn9D?usp=sharing
Source code / logs.
model.fit(dataset, epochs=100)
# Epoch 1/15
# 819/819 [==============================] - 2s 1ms/step - loss: 1.3987
# Epoch 2/15
# 819/819 [==============================] - 1s 1ms/step - loss: 1.0563
# Epoch 3/15
# 819/819 [==============================] - 1s 1ms/step - loss: 1.0262
# Epoch 4/15
# 819/819 [==============================] - 1s 1ms/step - loss: 1.0156
# Epoch 5/15
# 782/819 [===========================>..] - ETA: 0s - loss: 1.0146WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 12285 batches). You may need to use the repeat() function when building your dataset.
# 819/819 [==============================] - 1s 1ms/step - loss: 1.0161
System information.
Describe the problem.
Note that I have previously reported this issue here for TF2.0. Back then the tensorflow team suggested a solution that worked under 2.0 but now does not work anymore.
Here is the problem: Using the functional API one can build an intermediate model starting and ending at any of the original models layers. This however does not work when layers are encapsulated in an inner model (lets say, some tf.keras.Sequential
). The graph will differ due to the additional Input layer, but the computations should be the same. However, when trying to build intermediate model of a nested model up to an inner layer, a "Graph disconnected" error is thrown (see below). Previously, one could circumvent this by not building to final_model.get_layer("inner_model").get_layer("id_1").output
but final_model.get_layer("inner_model").get_layer("id_1").get_output_at(1)
(full example see below).
Standalone code to reproduce the issue.
import os
import tensorflow as tf
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
# NOT NESTED
inp = tf.keras.Input((4,))
y = tf.keras.layers.Dense(4, name="od_1")(inp)
y = tf.keras.layers.Dense(2, name="od_2")(y)
y = tf.keras.layers.Dense(4, name="id_1")(y)
y = tf.keras.layers.Dense(10, name="od_3")(y)
y = tf.keras.layers.Dense(10, name="od_4")(y)
final_model = tf.keras.Model(inputs=[inp], outputs=[y])
final_model.summary()
sub_model = tf.keras.Model(inputs=[final_model.input], outputs=[final_model.get_layer("id_1").output])
sub_model.summary()
# NESTED
inp_1 = tf.keras.Input(shape=(2,))
x = tf.keras.layers.Dense(4, name="id_1")(inp_1)
inner_model = tf.keras.Model(inputs=[inp_1], outputs=[x], name="inner_model")
inp_outer = tf.keras.Input((4,))
y = tf.keras.layers.Dense(4, name="od_1")(inp_outer)
y = tf.keras.layers.Dense(2, name="od_2")(y)
y = inner_model(y)
y = tf.keras.layers.Dense(10, name="od_3")(y)
y = tf.keras.layers.Dense(10, name="od_4")(y)
final_model = tf.keras.Model(inputs=[inp_outer], outputs=[y])
final_model.summary()
sub_model = tf.keras.Model(inputs=[final_model.input], outputs=[final_model.get_layer("inner_model").get_layer("id_1").output])
previously_working_sub_model = tf.keras.Model(
inputs=[final_model.input],
outputs=[final_model.get_layer("inner_model").get_layer("id_1").get_output_at(1)])
This throws ValueError: Asked to get output at node 1, but the layer has only 1 inbound nodes.
whereas only the sub_model line throws ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 2), dtype=tf.float32, name='input_2'), name='input_2', description="created by layer 'input_2'") at layer "id_1". The following previous layers were accessed without issue: []
Expected behavior.
To allow for accessing intermediate activations, it is crucial to be able to build intermediate models to (and preferably from) anywhere within the model.
System information.
TensorFlow version (you are using): 2.4.1 (also re-produced in 2.6)
Are you willing to contribute it (Yes/No) : No
Describe the feature and the current behavior/state.
When used with padding, Tokenizer.sequences_to_texts()
converts padding tokens to oov_token
when oov_token
is not None
. This does not happen when oov_token = None
, so sequences_to_texts()
function skips padding integers as well as oov integers.
This behaviour is perhaps expected since padding value is not part of the vocabulary.
However I think it would make more sense if sequences_to_texts()
function takes an optional padding_value
argument and does not encode back these integers as oov_token
.
To produce:
import tensorflow as tf
vocab_size = 5
seq_len = 5
text = "hello world test"
oov_token = "<OOV>"
tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words = vocab_size, oov_token = oov_token)
tokenizer.fit_on_texts([text])
tokenized = tokenizer.texts_to_sequences([text])
padded = tf.keras.preprocessing.sequence.pad_sequences(tokenized, maxlen = seq_len, value = 0)
print('Non padded tokenization result:', tokenized)
print("Non padded de-tokenization result:", tokenizer.sequences_to_texts(tokenized))
print("\n")
print('Padded tokenization result:', padded)
print("Padded de-tokenization result:", tokenizer.sequences_to_texts(padded))
Non padded tokenization result: [[2, 3, 4]]
Non padded de-tokenization result: ['hello world test']
Padded tokenization result: [[0 0 2 3 4]]
Padded de-tokenization result: ['<OOV> <OOV> hello world test']
What it will de-tokenize to with this feature implemented:
Feature implemented padded de-tokenization result: ['hello world test']
Will this change the current api? How?
tf.keras.preprocessing.text.Tokenizer.sequences_to_texts()
function will take an optional padding_value
argument, which is None
by default.
Who will benefit from this feature?
Those who use tf.keras.text.Tokenizer
to tokenize strings with padding, and de-tokenize padded sequences to words.
Hi there,
System information.
I observed the behavior both in a Colab notebook (TF v2.8.0-0-g3f878cff5b6 2.8.0) and in a custom Docker image (Ubuntu 20.04, TF v2.5.1-97-g957590ea15c 2.5.2). The issue can easily be reproduced by using validation sets of different sizes and I made an example notebook (see below).
Describe the problem.
The time/step reported when calling fit
is not the time per training step. The phrasing can be misleading, for instance when trying to design your training scheme based on time constraints.
Describe the current behavior.
Currently, the time for a full epoch (including validation) divided by the number of training steps is reported. If the validation takes a significant amount of time, the time per training step might be way smaller than reported.
Describe the expected behavior.
I feel that reporting the time per training step (excluding validation) would be more informative.
For instance for the following output (from the Colab notebook linked below):
Epoch 1/3
5/5 [==============================] - 2s 613ms/step - loss: 0.3212 - val_loss: 0.1224
I would expect that using a single step instead of 5 would make each epoch take around 613ms. However each epoch would still take 2s as most of it is spent on the validation set.
Standalone code to reproduce the issue.
See this colab notebook:
https://colab.research.google.com/drive/1YGWstYcbFwkPY4ezZ2C-krDPahS36nnm?usp=sharing
Cheers.
System information.
Describe the problem.
When I am trying to run the Keras unit-tests, one test fails. That unit test is //keras/distribute:minimize_loss_test_gpu
Describe the current behavior.
......
//keras/optimizer_v2:rmsprop_test_gpu PASSED in 25.5s
//keras/tests:saver_test_gpu PASSED in 9.1s
//keras/utils:multi_gpu_utils_test_gpu PASSED in 16.3s
//keras/distribute:minimize_loss_test_gpu FAILED in 360.7s
/root/.cache/bazel/_bazel_root/0b555d6a82cf650cedde1ae5c5212680/execroot/org_keras/bazel-out/k8-opt/testlogs/keras/distribute/minimize_loss_test_gpu/test.log
Executed 72 out of 72 tests: 71 tests pass and 1 fails locally.
Describe the expected behavior.
All tests should pass successfully. Am I missing something in launching tests and setting the environemnt?
Standalone code to reproduce the issue.
Run the following in a container from tensorflow/tensorflow:2.8.0-gpu
set -eux
pip3 uninstall keras
git clone -b r2.8 https://github.com/keras-team/keras.git
cd keras
sed -i "s/tf-nightly/#tf-nightly/g" requirements.txt
pip3 install -r requirements.txt
TF_TESTS_PER_GPU=1
N_BUILD_JOBS=$(grep -c ^processor /proc/cpuinfo)
N_TEST_JOBS=1
bazel test \
--jobs=${N_BUILD_JOBS} \
--local_test_jobs=${N_TEST_JOBS} \
--test_output=errors \
--test_sharding_strategy=disabled \
--test_timeout 300,450,1200,3600 \
--test_output=errors \
--keep_going \
--define=use_fast_cpp_protos=false \
--build_tests_only \
--build_tag_filters=-no_oss \
--test_tag_filters=gpu,-no_oss,-oss_serial,-no_rocm,-no-gpu,-benchmark-test,-v1only \
keras/...
Source code / logs.
Part of the /root/.cache/bazel/_bazel_root/0b555d6a82cf650cedde1ae5c5212680/execroot/org_keras/bazel-out/k8-opt/testlogs/keras/distribute/minimize_loss_test_gpu/test.log.
[ RUN ] MinimizeLossStepTest.testRunStepsWithOutputContext_test_distribution_Mirrored2GPUsNoMergeCall_optimizerfn_AdagradV1_mode_graph_istpu_False
2022-02-09 05:30:19.457428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /device:GPU:0 with 15401 MB memory: -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:86:00.0, compute capability: 6.0
2022-02-09 05:30:19.457729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /device:GPU:1 with 15401 MB memory: -> device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:af:00.0, compute capability: 6.0
INFO:tensorflow:Using MirroredStrategy with devices ('/replica:0/task:0/device:GPU:0', '/replica:0/task:0/device:GPU:1')
I0209 05:30:19.461880 139974298908480 mirrored_strategy.py:374] Using MirroredStrategy with devices ('/replica:0/task:0/device:GPU:0', '/replica:0/task:0/device:GPU:1')
2022-02-09 05:30:19.557960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 15401 MB memory: -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:86:00.0, compute capability: 6.0
2022-02-09 05:30:19.558183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 15401 MB memory: -> device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:af:00.0, compute capability: 6.0
2022-02-09 05:30:19.582256: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'MultiDeviceIteratorFromStringHandle' OpKernel for GPU devices compatible with node {{node MultiDeviceIteratorFromStringHandle}}
. Registered: device='CPU'
2022-02-09 05:30:19.583421: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'MultiDeviceIteratorGetNextFromShard' OpKernel for GPU devices compatible with node {{node MultiDeviceIteratorGetNextFromShard}}
. Registered: device='CPU'
2022-02-09 05:30:19.595629: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'MultiDeviceIteratorFromStringHandle' OpKernel for GPU devices compatible with node {{node MultiDeviceIteratorFromStringHandle}}
. Registered: device='CPU'
2022-02-09 05:30:19.596083: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'MultiDeviceIteratorGetNextFromShard' OpKernel for GPU devices compatible with node {{node MultiDeviceIteratorGetNextFromShard}}
. Registered: device='CPU'
INFO:tensorflow:Collective all_reduce tensors: 2 all_reduces, num_devices = 2, group_size = 2, implementation = CommunicationImplementation.NCCL, num_packs = 1
I0209 05:30:19.736172 139974298908480 cross_device_ops.py:1152] Collective all_reduce tensors: 2 all_reduces, num_devices = 2, group_size = 2, implementation = CommunicationImplementation.NCCL, num_packs = 1
INFO:tensorflow:Collective all_reduce tensors: 1 all_reduces, num_devices = 2, group_size = 2, implementation = CommunicationImplementation.RING, num_packs = 1
I0209 05:30:19.804530 139974298908480 cross_device_ops.py:1152] Collective all_reduce tensors: 1 all_reduces, num_devices = 2, group_size = 2, implementation = CommunicationImplementation.RING, num_packs = 1
INFO:tensorflow:Collective all_reduce tensors: 1 all_reduces, num_devices = 2, group_size = 2, implementation = CommunicationImplementation.RING, num_packs = 1
I0209 05:30:19.831649 139974298908480 cross_device_ops.py:1152] Collective all_reduce tensors: 1 all_reduces, num_devices = 2, group_size = 2, implementation = CommunicationImplementation.RING, num_packs = 1
INFO:tensorflow:Collective all_reduce tensors: 1 all_reduces, num_devices = 2, group_size = 2, implementation = CommunicationImplementation.RING, num_packs = 1
I0209 05:30:19.884937 139974298908480 cross_device_ops.py:1152] Collective all_reduce tensors: 1 all_reduces, num_devices = 2, group_size = 2, implementation = CommunicationImplementation.RING, num_packs = 1
2022-02-09 05:30:20.068069: E tensorflow/core/common_runtime/base_collective_executor.cc:249] BaseCollectiveExecutor::StartAbort INTERNAL: NCCL: unhandled cuda error. Set NCCL_DEBUG=WARN for detail.
2022-02-09 05:30:20.068157: W tensorflow/core/nccl/nccl_manager.cc:858] NcclManager already aborted, ignoring subsequent StartAbort with CANCELLED: op cancelled
INFO:tensorflow:time(__main__.MinimizeLossStepTest.testRunStepsWithOutputContext_test_distribution_Mirrored2GPUsNoMergeCall_optimizerfn_AdagradV1_mode_graph_istpu_False): 0.62s
I0209 05:30:20.068961 139974298908480 test_util.py:2373] time(__main__.MinimizeLossStepTest.testRunStepsWithOutputContext_test_distribution_Mirrored2GPUsNoMergeCall_optimizerfn_AdagradV1_mode_graph_istpu_False): 0.62s
[ FAILED ] MinimizeLossStepTest.testRunStepsWithOutputContext_test_distribution_Mirrored2GPUsNoMergeCall_optimizerfn_AdagradV1_mode_graph_istpu_False
System information.
Describe the problem.
tf.keras.losses.binary_crossentropy
behaves inconsistently when broadcasting is used.
Describe the current behavior.
Following is an example that works:
y = tf.random.uniform((10, 1))
tf.keras.losses.binary_crossentropy(0.5, y)
Whereas this one fails:
tf.keras.losses.binary_crossentropy(0.5, y, from_logits=True)
The reason is that the latter calls internally tf.nn.sigmoid_cross_entropy_with_logits
that does not support broadcasting.
Describe the expected behavior.
I would expect seamless broadcasting in both cases.
System information.
Describe the problem.
The random values got by GlorotUniform are different between the V1 and V2 APIs using the same method seed, as a consequence, it is not possible to get the same exact results obtained by the V1 version in V2.
Describe the current behavior.
Setting the operation
seed returns different tensors when using GlorotUniform when imported from tf.compat.v1
and tensorflow.keras.initializers
Describe the expected behavior.
Both tensors must be equal
Standalone code to reproduce the issue.
Colab link
Source code / logs.
NA
P.S: Submitting bug here in the Keras repo due to comments in the TF repo, specifically: comment in issue 52294 in TF
Hello,
I created weights at build()
:
def build(self, input_shape):
super(PositionalEmbedding, self).build(input_shape)
print(input_shape)
self.position = self.add_weight(
name="position",
shape=(1, input_shape[1], input_shape[2], self.units),
initializer=TruncatedNormal(stddev=0.02),
trainable=True,
)
The input's shape: (64, 7, 25, 81) # (batches, timesteps, patches, features)
input_shape of build()
: (None, None, 25, 81) # (batches, timesteps, patches, features)
ValueError: in user code:
File "/Users/martin/miniforge3/lib/python3.9/site-packages/keras/engine/training.py", line 1021, in train_function *
return step_function(self, iterator)
File "/Users/martin/miniforge3/lib/python3.9/site-packages/keras/engine/training.py", line 1010, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/Users/martin/miniforge3/lib/python3.9/site-packages/keras/engine/training.py", line 1000, in run_step **
outputs = model.train_step(data)
File "/var/folders/7v/fqqcktvs23qc8fwgftjpz_gh0000gn/T/ipykernel_6464/643090246.py", line 97, in train_step
y_pred, _ = self([inputs, targets_inputs], training=True)
File "/Users/martin/miniforge3/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
ValueError: Exception encountered when calling layer "transformer_111" (type Transformer).
in user code:
File "/var/folders/7v/fqqcktvs23qc8fwgftjpz_gh0000gn/T/ipykernel_6464/643090246.py", line 52, in call *
x_e = self.pos_embs_0(x_e, training=training)
File "/Users/martin/miniforge3/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler **
raise e.with_traceback(filtered_tb) from None
File "/var/folders/7v/fqqcktvs23qc8fwgftjpz_gh0000gn/T/ipykernel_6464/3588400932.py", line 14, in build
self.position = self.add_weight(
ValueError: Can't convert Python sequence with mixed types to Tensor.
Call arguments received:
• inputs=['tf.Tensor(shape=(None, None, 25, 81), dtype=float32)']
• training=True
Describe the problem.
I have implemented a recurrent cell which is to be wrapped within a tf.keras.layers.RNN.
The cell has a state whose data type is not tf.float32
but tf.complex64
. However, each time when layer.reset_states()
is invoked, the data type of the state is changed to tf.float32
. As a result, a value error is thrown during the initial symbolic call. See attached stack trace.
Describe the current behavior.
The programm crashes at the construction of the RNN layer. See attached stack trace.
I assume, a reason for this issue is line 933, 934 in function reset_states
in class RNN
in file keras/layers/recurrent.py
flat_states_variables = tf.nest.map_structure(
backend.variable, flat_init_state_values)
Here, the initialized state values are stored in flat_init_state_values
and backend.variable
is called on each of the states. However, no dtype
argument is passed to backend.variable
. As a consequence it defaults to tf.float32
for all states. T
I would recommend the following patch, which solves the issue for me
flat_states_variables = tf.nest.map_structure(
lambda var: backend.variable(var, var.dtype), flat_init_state_values)
I also tried to run the example after replacing the files affected by the latest commit regarding mixed precision. Unfortunately it did not solve the issue for me
Standalone code to reproduce the issue.
Currently, the example fails at the construction of the RNN layer.
import tensorflow as tf
class RecurrentCell(tf.keras.layers.Layer):
def __init__(self, state_size):
super(RecurrentCell, self).__init__()
self.state_size = state_size
def build(self, input_shape):
super(RecurrentCell, self).build(input_shape)
def get_initial_state(self, inputs=None, batch_size=None, dtype=None):
# explicit initialization with tf.complex64
return tf.zeros((self.state_size, ), dtype=tf.complex64)
@tf.function
def call(self, inputs, states):
# toy example
x = inputs
xfd = tf.signal.rfft(x)[..., :self.state_size]
yfd = tf.multiply(xfd, states)
return tf.signal.irfft(yfd), states
recCell = RecurrentCell(state_size=5)
inp = tf.keras.Input(shape=(None, 8),
batch_size=32)
out = tf.keras.layers.RNN(recCell, # crashes
return_sequences=True,
stateful=True,
return_state=False)(inp)
model = tf.keras.Model(inputs=[inp], outputs=[out])
y = model.predict(tf.random.normal((32, 16, 8)))
Source code / Logs
Stacktrace:
stacktrace.txt
System information.
pip install
) / google-colab: pre-installed (presumably also from binary)Problem:
Models using a custom metric cannot be loaded after being saved to disk.
The Keras serialization guide seems to indicate that all custom layers/objects are saved to the TensorFlow SavedModel format (emphasis mine):
SavedModel is the more comprehensive save format that saves the model architecture, weights, and the traced Tensorflow subgraphs of the call functions. This enables Keras to restore both built-in layers as well as custom objects.
That seems to be correct for a lot of custom objects, but I can't get this to work with custom metrics. If trying to load such a model without supplying it as a custom object, I get the following error:
ValueError: Unable to restore custom object of type _tf_keras_metric. Please make sure that any custom layers are included in the
custom_objects
arg when callingload_model()
and make sure that all layers implementget_config
andfrom_config
.
Further down in the docs there's a brief statement about the SavedModel
limitations, but that doesn't mention custom metrics.
Describe the current behavior.
Keras throws a ValueError when loading a model with custom metrics, if not specified as custom object.
I know I can circumvent this by supplying custom_objects
to the load function, but to me this issue is more that it is currently unclear to me from the documentation which kind of custom objects (or under what conditions) loading from traces is actually support and which not. If anyone can shed some light on that, please do so!
Describe the expected behavior.
Either this is a bug that must be fixed, or the documentation is not correct. A full breakdown of which custom objects (or in what conditions) can be loaded from traces and which not would be helpful.
Contributing.
Standalone code to reproduce the issue.
https://colab.research.google.com/drive/1XK4HJq52Zhf-ekLNOKGoM3Gk9JJBColL?usp=sharing
Describe the problem.
Moving this issue from tensorflow/tensorflow#45197
Describe the current behavior.
Becasue I train my model with data from HDF5 files, so I used model.fit(, , , shuffle='batch').
https://www.tensorflow.org/api_docs/python/tf/keras/Sequential:
"shuffle | Boolean (whether to shuffle the training data before each epoch) or str (for 'batch'). This argument is ignored when x is a generator. 'batch' is a special option for dealing with the limitations of HDF5 data; it shuffles in batch-sized chunks. Has no effect when steps_per_epoch is not None."
Before I upgrade my hardware, I used tensorflow 1.x as Keras 2.x's backend. My model used shuffle='batch' without any problems. Now, I have a new machine, so I need to transfer my codes. However, the new code doesn't work anymore.
Describe the expected behavior
I used MNIST dataset to show what happended: Code from (https://www.machinecurve.com/index.php/2020/04/13/how-to-use-h5py-and-keras-to-train-with-data-from-hdf5-files/)
Describe the expected behavior.
I used MNIST dataset to show what happended: Code from (https://www.machinecurve.com/index.php/2020/04/13/how-to-use-h5py-and-keras-to-train-with-data-from-hdf5-files/)
Standalone code to reproduce the issue.
import h5py
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras.losses import sparse_categorical_crossentropy
from tensorflow.keras.optimizers import Adam
# Model configuration
batch_size = 50
img_width, img_height, img_num_channels = 28, 28, 1
loss_function = sparse_categorical_crossentropy
no_classes = 10
no_epochs = 25
optimizer = Adam()
validation_split = 0.2
verbosity = 1
# Load MNIST data
f = h5py.File('train.hdf5', 'r')
input_train = f['image'][...]
label_train = f['label'][...]
f.close()
f = h5py.File('test.hdf5', 'r')
input_test = f['image'][...]
label_test = f['label'][...]
f.close()
# Reshape data
input_train = input_train.reshape((len(input_train), img_width, img_height, img_num_channels))
input_test = input_test.reshape((len(input_test), img_width, img_height, img_num_channels))
# Determine shape of the data
input_shape = (img_width, img_height, img_num_channels)
# Create the model
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(no_classes, activation='softmax'))
# Display a model summary
model.summary()
# Compile the model
model.compile(loss=loss_function,
optimizer=optimizer,
metrics=['accuracy'])
# Fit data to model
history = model.fit(input_train, label_train,
batch_size=batch_size,
epochs=no_epochs,
verbose=verbosity,shuffle='batch',
validation_split=validation_split)
# Generate generalization metrics
score = model.evaluate(input_test, label_test, verbose=0)
print(f'Test loss: {score[0]} / Test accuracy: {score[1]}')
**Other info / logs** Include any logs or source code that would be helpful to
The output is like this
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_9 (Conv2D) (None, 26, 26, 32) 320
_________________________________________________________________
conv2d_10 (Conv2D) (None, 24, 24, 64) 18496
_________________________________________________________________
conv2d_11 (Conv2D) (None, 22, 22, 128) 73856
_________________________________________________________________
flatten_3 (Flatten) (None, 61952) 0
_________________________________________________________________
dense_6 (Dense) (None, 128) 7929984
_________________________________________________________________
dense_7 (Dense) (None, 10) 1290
=================================================================
Total params: 8,023,946
Trainable params: 8,023,946
Non-trainable params: 0
_________________________________________________________________
Epoch 1/25
960/960 [==============================] - 3s 3ms/step - loss: 6.4279 - accuracy: 0.1099 - val_loss: 2.3022 - val_accuracy: 0.1060
Epoch 2/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3012 - accuracy: 0.1141 - val_loss: 2.3012 - val_accuracy: 0.1060
Epoch 3/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3011 - accuracy: 0.1149 - val_loss: 2.3020 - val_accuracy: 0.1060
Epoch 4/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3010 - accuracy: 0.1142 - val_loss: 2.3021 - val_accuracy: 0.1060
Epoch 5/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3010 - accuracy: 0.1141 - val_loss: 2.3019 - val_accuracy: 0.1060
Epoch 6/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3010 - accuracy: 0.1162 - val_loss: 2.3019 - val_accuracy: 0.1060
Epoch 7/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3012 - accuracy: 0.1139 - val_loss: 2.3020 - val_accuracy: 0.1060
Epoch 8/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3013 - accuracy: 0.1128 - val_loss: 2.3025 - val_accuracy: 0.1060
Epoch 9/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3013 - accuracy: 0.1131 - val_loss: 2.3020 - val_accuracy: 0.1060
Epoch 10/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3011 - accuracy: 0.1156 - val_loss: 2.3021 - val_accuracy: 0.1060
Epoch 11/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3013 - accuracy: 0.1127 - val_loss: 2.3022 - val_accuracy: 0.1060
Epoch 12/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3010 - accuracy: 0.1143 - val_loss: 2.3024 - val_accuracy: 0.1060
Epoch 13/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3012 - accuracy: 0.1131 - val_loss: 2.3025 - val_accuracy: 0.1060
Epoch 14/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3009 - accuracy: 0.1148 - val_loss: 2.3019 - val_accuracy: 0.1060
Epoch 15/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3012 - accuracy: 0.1152 - val_loss: 2.3019 - val_accuracy: 0.1060
Epoch 16/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3009 - accuracy: 0.1149 - val_loss: 2.3020 - val_accuracy: 0.1060
Epoch 17/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3009 - accuracy: 0.1143 - val_loss: 2.3020 - val_accuracy: 0.1060
Epoch 18/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3014 - accuracy: 0.1125 - val_loss: 2.3022 - val_accuracy: 0.1060
Epoch 19/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3011 - accuracy: 0.1147 - val_loss: 2.3019 - val_accuracy: 0.1060
Epoch 20/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3011 - accuracy: 0.1144 - val_loss: 2.3020 - val_accuracy: 0.1060
Epoch 21/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3013 - accuracy: 0.1128 - val_loss: 2.3022 - val_accuracy: 0.1060
Epoch 22/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3012 - accuracy: 0.1122 - val_loss: 2.3024 - val_accuracy: 0.1060
Epoch 23/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3003 - accuracy: 0.1163 - val_loss: 2.3021 - val_accuracy: 0.1060
Epoch 24/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3011 - accuracy: 0.1151 - val_loss: 2.3021 - val_accuracy: 0.1060
Epoch 25/25
960/960 [==============================] - 3s 3ms/step - loss: 2.3012 - accuracy: 0.1131 - val_loss: 2.3021 - val_accuracy: 0.1060
Test loss: 2.3010358810424805 / Test accuracy: 0.11349999904632568
If I changed shuffle='batch' to shuffle=True or shuffle=False
I got convergent results like this
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 32) 320
_________________________________________________________________
conv2d_1 (Conv2D) (None, 24, 24, 64) 18496
_________________________________________________________________
conv2d_2 (Conv2D) (None, 22, 22, 128) 73856
_________________________________________________________________
flatten (Flatten) (None, 61952) 0
_________________________________________________________________
dense (Dense) (None, 128) 7929984
_________________________________________________________________
dense_1 (Dense) (None, 10) 1290
=================================================================
Total params: 8,023,946
Trainable params: 8,023,946
Non-trainable params: 0
_________________________________________________________________
Epoch 1/25
960/960 [==============================] - 5s 3ms/step - loss: 2.3020 - accuracy: 0.9032 - val_loss: 0.0738 - val_accuracy: 0.9786
Epoch 2/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0502 - accuracy: 0.9853 - val_loss: 0.0621 - val_accuracy: 0.9824
Epoch 3/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0333 - accuracy: 0.9896 - val_loss: 0.0811 - val_accuracy: 0.9792
Epoch 4/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0216 - accuracy: 0.9936 - val_loss: 0.0851 - val_accuracy: 0.9805
Epoch 5/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0244 - accuracy: 0.9922 - val_loss: 0.0757 - val_accuracy: 0.9832
Epoch 6/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0139 - accuracy: 0.9956 - val_loss: 0.1344 - val_accuracy: 0.9752
Epoch 7/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0202 - accuracy: 0.9935 - val_loss: 0.1379 - val_accuracy: 0.9779
Epoch 8/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0141 - accuracy: 0.9956 - val_loss: 0.0919 - val_accuracy: 0.9818
Epoch 9/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0125 - accuracy: 0.9962 - val_loss: 0.1184 - val_accuracy: 0.9811
Epoch 10/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0154 - accuracy: 0.9956 - val_loss: 0.1157 - val_accuracy: 0.9832
Epoch 11/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0176 - accuracy: 0.9952 - val_loss: 0.1221 - val_accuracy: 0.9803
Epoch 12/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0101 - accuracy: 0.9976 - val_loss: 0.1170 - val_accuracy: 0.9822
Epoch 13/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0124 - accuracy: 0.9969 - val_loss: 0.1216 - val_accuracy: 0.9846
Epoch 14/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0094 - accuracy: 0.9974 - val_loss: 0.1048 - val_accuracy: 0.9848
Epoch 15/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0067 - accuracy: 0.9982 - val_loss: 0.1130 - val_accuracy: 0.9835
Epoch 16/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0122 - accuracy: 0.9974 - val_loss: 0.1463 - val_accuracy: 0.9835
Epoch 17/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0091 - accuracy: 0.9976 - val_loss: 0.1685 - val_accuracy: 0.9833
Epoch 18/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0110 - accuracy: 0.9977 - val_loss: 0.1224 - val_accuracy: 0.9840
Epoch 19/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0036 - accuracy: 0.9989 - val_loss: 0.1733 - val_accuracy: 0.9838
Epoch 20/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0109 - accuracy: 0.9978 - val_loss: 0.1539 - val_accuracy: 0.9859
Epoch 21/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0074 - accuracy: 0.9982 - val_loss: 0.1791 - val_accuracy: 0.9826
Epoch 22/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0085 - accuracy: 0.9986 - val_loss: 0.2264 - val_accuracy: 0.9830
Epoch 23/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0124 - accuracy: 0.9979 - val_loss: 0.1722 - val_accuracy: 0.9840
Epoch 24/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0089 - accuracy: 0.9984 - val_loss: 0.1472 - val_accuracy: 0.9851
Epoch 25/25
960/960 [==============================] - 3s 3ms/step - loss: 0.0048 - accuracy: 0.9988 - val_loss: 0.2005 - val_accuracy: 0.9847
Test loss: 0.18761441111564636 / Test accuracy: 0.9829999804496765
Disclaimer: I'm an engineer by training, not a statistician, so I can get this wrong. I don't know that the general thrust has technical gaps, but I could be off in some of the details. If I am, please go talk to your friendly neighborhood stats professor and get their thoughts on the idea.
Background:
I hang out (and interact, and have for years, and learn tons) at CrossValidated, a stack-exchange site.
One of the very substantial families of threads there is why accuracy is not ideal in many places, and many of the folks engaged in the discussions are fantastic PhD's, in academia and industry, teaching or working for decades, so they are a very important source of technical wisdom.
Here are some of the threads there:
They like these things called "strictly proper score functions" or "strictly proper scoring rules".
Here are references on strictly proper scoring rules:
When I got to keras loss and metrics pages I don't see those scoring rules explicitly, and I think its a miss. I think some may be in there, but I must have missed them.
Current losses from documentation:
Current metrics from documentation:
Recommendation/Suggestion:
I think you should add the following "strictly proper scoring rules" to Keras because it can make it easier for new users (and their pointy-haired bosses) to use technically exemplary approaches in some of their problem solving.
Some rules to consider:
Describe the problem.
Currently when we are saving the weights using the ModelCheckpoint Callback during training, we do not get the list of checkpoint files correctly from the tensorflow api tf.train.get_checkpoint_state(ckpt_folder).all_model_checkpoint_paths. I am raising this issue in Keras because the checkpoint proto is incorrectly written
Describe the current behavior.
The current behavior of tf.train.get_checkpoint_state(ckpt_folder).all_model_checkpoint_paths only returns the last checkpoint saved instead of all the checkpoints
Describe the expected behavior.
The tensorflow API tf.train.get_checkpoint_state(ckpt_folder).all_model_checkpoint_paths should return all the checkpoints saved
Standalone code to reproduce the issue.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import os
import shutil
def get_uncompiled_model():
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(64, activation="relu", name="dense_1")(inputs)
x = layers.Dense(64, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
return model
def get_compiled_model():
model = get_uncompiled_model()
model.compile(
optimizer="rmsprop",
loss="sparse_categorical_crossentropy",
metrics=["sparse_categorical_accuracy"],
)
return model
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
# Preprocess the data (these are NumPy arrays)
x_train = x_train.reshape(60000, 784).astype("float32") / 255
x_test = x_test.reshape(10000, 784).astype("float32") / 255
y_train = y_train.astype("float32")
y_test = y_test.astype("float32")
# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]
model = get_compiled_model()
ckpt_folder = os.path.join(os.getcwd(), 'ckpt')
if os.path.exists(ckpt_folder):
shutil.rmtree(ckpt_folder)
ckpt_path = os.path.join(ckpt_folder, 'mymodel_{epoch}')
callbacks = [
keras.callbacks.ModelCheckpoint(
# Path where to save the model
# The two parameters below mean that we will overwrite
# the current checkpoint if and only if
# the `val_loss` score has improved.
# The saved model name will include the current epoch.
filepath=ckpt_path,
save_best_only=False,
save_weights_only=True,
verbose=1,
)
]
model.fit(
x_train, y_train, epochs=3, batch_size=1, callbacks=callbacks, validation_split=0.2, steps_per_epoch=1
)
ckpts = tf.train.get_checkpoint_state(ckpt_folder).all_model_checkpoint_paths
print(ckpts)
System information.
Describe the problem.
Both the serializer and deserializer construct the temp_dir path using the hard coded prefix "ram://". This works for Unix-based systems, but the prefix is not valid on Windows.
Describe the current behavior.
Produces the following error, which stems from the fact that f (L75) is invalid because dest_path (L74) is not a valid memory address.
Traceback (most recent call last):
File "C:\Users\Hope\Anaconda3\envs\Association\lib\site-packages\keras\saving\pickle_utils.py", line 77, in serialize_model_as_bytecode
info.size = f.size()
File "C:\Users\Hope\Anaconda3\envs\Association\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 99, in size
return stat(self.__name).length
File "C:\Users\Hope\Anaconda3\envs\Association\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 910, in stat
return stat_v2(filename)
File "C:\Users\Hope\Anaconda3\envs\Association\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 926, in stat_v2
return _pywrap_file_io.Stat(compat.path_to_str(path))
tensorflow.python.framework.errors_impl.NotFoundError
Suggested fix.
Implement something like the _get_temp_folder() method from scikeras
Originally I posted this bug #54753 on tensorflow/tensorflow and was advised to repost it here.
System information
Describe the problem
We save a quantization-aware keras-model in a .pb model format using model.save()
. This operation fails with ValueError: __inference_conv2d_transpose_layer_call_fn_4530
when our model contains a Conv2DTranspose
layer.
tf.keras.models.save_model()
tootfmot.quantization.keras.quantize_model()
tf.keras.models.clone_model()
and apply quantization using tfmot.quantization.keras.quantize_apply()
. Our current workaround is to not annotate Conv2DTranspose
but this prevents us from having a fully quantization-aware model.Saving the same model as .h5 works (unfortunately this workaround is not suitable for us because our technical requirement is to save a .pb-model).
Describe the expected behavior
model.save()
saves a QAT model with a Conv2DTranspose
layer in a .pb-format successfully.
Standalone code to reproduce the issue
Here are the collabs to reproduce the issue using a very simple model with a Conv2DTranspose
layer and two ways to make a model quantization aware mentioned above:
- Collab with tf2.7.0
- Collab with tf2.8.0
Other info / logs
Similar issue #868
Traceback
ValueError Traceback (most recent call last)
<ipython-input-7-dc1f93a93afb> in <module>()
2 annotated_model = tf.keras.models.clone_model(base_model, clone_function=apply_quantization)
3 q_aware_model = tfmot.quantization.keras.quantize_apply(annotated_model)
----> 4 q_aware_model.save('/output_folder/q_aware_model') # save keras model as .pb, fails
1 frames
/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
/usr/local/lib/python3.7/dist-packages/tensorflow/python/saved_model/save.py in map_resources(self)
402 if capture_constant_value is None:
403 raise ValueError(
--> 404 f"Unable to save function {concrete_function.name} because it "
405 f"captures graph tensor {capture} from a parent function which "
406 "cannot be converted to a constant with `tf.get_static_value`.")
ValueError: Unable to save function b'__inference_conv2d_transpose_layer_call_fn_4530' because it captures graph tensor
Tensor("model/quant_conv2d_transpose/transpose_1:0", shape=(3, 3, 16, 16), dtype=float32) from a parent function which
cannot be converted to a constant with `tf.get_static_value`.
A large U-Net 3D model configured with mixed precision fails with No algorithm worked!
(see full a10g.log
attached) when running inference on a NVIDIA A10G 20GB GPU (compute capability 8.6).
Using tensorflow/tensorflow:nightly-gpu
Docker image, the error points to an out-of-memory issue (see full log a10g_tf_nightly.log
attached):
No algorithm worked! Error messages:
Profiling failure on CUDNN engine 1#TC: RESOURCE_EXHAUSTED: Allocating 4718624784 bytes exceeds the memory limit of 4294967296 bytes.
Profiling failure on CUDNN engine 1: RESOURCE_EXHAUSTED: Allocating 4718624784 bytes exceeds the memory limit of 4294967296 bytes.
[[{{node model/conv3d_transpose_3/conv3d_transpose}}]] [Op:__inference_predict_function_1150]
I'm able to overcome the issue by using full precision instead (i.e by setting mixed_precision.set_global_policy("float32")
.
The same model configured with mixed precision works fine on the previous generation T4 Tesla GPU (compute capability 7.5), which have even less GPU memory - 16GB (see full t4_tesla.log
attached).
System information
tensorflow:latest-gpu
Docker image (sha256@fc5eb0604722c7bef7b499bb007b3050c4beec5859c2e0d4409d2cca5c14d442
)nvidia-smi
outputs for both GPU types provided in attachments.Describe the expected behavior
Mixed precision mode should not exhaust all GPU memory on the newest generation of NVIDIA A10G.
Standalone code to reproduce the issue
Steps to reproduce:
Start instance with A10G GPU
Start interactive Docker container and pass test.py
(copy from Colab)
$ docker run --gpus all -v /path/to/test.py:/srv/test.py -it tensorflow/tensorflow:latest-gpu /bin/bash
python /srv/test.py
Other info / logs
a10g.log
a10g_tf_nightly.log
t4_tesla.log
It seems that in at least one case in a Metric object, the sample_weight
argument wasn't being tested. See: keras-team/keras#15997 keras-team/keras#15939
We should add coverage for sample_weight
in a systematic manner to the tests for all metrics. Right now it seems we have generic tests for sample weights, but not systematic tests for each specific metrics.
The loss scale optimizer currently does not reduce the loss scale below 1:
My model is stuck not learning for the first ~1M gradient steps with the loss scale at its lower bound of 1.
System information.
Describe the problem.
Tensorflow profiler crashes when using string categorical layer, not allowing to profile models with those layers.
Maybe the same issue
Please go to TF Forum for help and support:
https://discuss.tensorflow.org/tag/keras
If you open a GitHub issue, here is our policy:
It must be a bug, a feature request, or a significant problem with the documentation (for small docs fixes please send a PR instead).
The form below must be filled out.
Here's why we have that policy:.
Keras developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.
System information.
tf.cast(tf.keras.Input(3, 3, sparse=True, dtype=tf.bool), tf.int32).shape
You can collect some of this information using our environment capture script:
https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh
You can obtain the TensorFlow version with:
python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
Describe the problem.
Describe the problem clearly here. Be sure to convey here why it's a bug in Keras or why the requested feature is needed.
Describe the current behavior.
The problem is the shape of the symbolic sparse tensor is lost and set to None after casting.
>>> a.shape
TensorShape([3, 3])
>>> b.shape
TensorShape([None, None])
Describe the expected behavior.
The shape should be preserved after casting.
Standalone code to reproduce the issue.
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/Jupyter/any notebook.
a = tf.keras.Input(3, 3, dtype=tf.bool, sparse=True)
b = tf.cast(a, dtype=tf.int32)
assert a.shape == b.shape, \
f"a.shape ({a.shape} is different from b.shape ({b.shape}))"
Source code / logs.
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.
NA
Describe the problem.
The docs for image_dataset_from_directory say the following about the directory
argument:
Directory where the data is located. If labels is "inferred", it should contain subdirectories,
each containing images for a class. Otherwise, the directory structure is ignored.
This means that when labels
is a list/tuple, we should ignore the directory structure (this makes sense, as the directory structure would only be used to generate labels).
Describe the current behavior.
However, this is not what happens - instead, see the following code snippet from dataset_utils.py
:
if labels is None:
# in the no-label case, index from the parent directory down.
subdirs = ['']
class_names = subdirs
else:
subdirs = []
for subdir in sorted(tf.io.gfile.listdir(directory)):
We only ignore the subdirectory structure if labels is None
, instead of when labels != 'inferred'
. This means that when labels
is a list/tuple, we expect a subdirectory structure (when none exists), causing image_dataset_from_directory
to fail in this case.
Describe the expected behavior.
We should ignore the subdirectory structure if labels
is anything other than inferred
(i.e. make the code match what the documentation says should happen). This should be a one-line change, and I'd be happy to make a PR.
However, the existence of this issue suggests the use case where labels
is a list/tuple is not unit tested, so it would probably be good to write a test. Would love a suggestion from someone more familiar with the codebase about how best to do this.
System information.
TensorFlow version (you are using):
Are you willing to contribute it (Yes/No) :
Just if we have a clear path on what will be accepted in the repo
Describe the feature and the current behavior/state.
I want to use git-bisect
on different Keras commits to execute third party library tests.
This is hard to achieve currently as Keras doesn't support editable installs (e.g. pip -e
) and we need to build and install the wheel on every single commit.
Describe the feature clearly here. Be sure to convey here why the requested feature is needed. Any brief description about the use-case would help.
I want to move between multiple Keras commits to execute third party library tests.
To achieve this we need to have something like an editable Keras install pip -e
Will this change the current api? How?
No
Who will benefit from this feature?
All the developers and third party libraries that need to execute tests on multiple Keras commits
/cc @qlzh727
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.