GithubHelp home page GithubHelp logo

oarriaga / stn.keras Goto Github PK

View Code? Open in Web Editor NEW
278.0 9.0 75.0 40.84 MB

Implementation of spatial transformer networks (STNs) in keras 2 with tensorflow as backend.

License: MIT License

Jupyter Notebook 89.20% Python 10.80%

stn.keras's Introduction

For a TF-2.0 rewrite visit:

Spatial transformer networks

Implementation of spatial transformer networks in keras 2 using tensorflow backend.

alt tag

alt tag

stn.keras's People

Contributors

chensong1995 avatar d3dave avatar fishhf avatar oarriaga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stn.keras's Issues

ValueError: The channel dimension of the inputs should be defined. Found `None`.

Hi, thank you for sharing your code.

When I run it in a google colab (keras=2.4.3), I got following error. Any idea on how to correct it?

Thank you.


ValueError Traceback (most recent call last)
in ()
----> 1 model = STN()
2 model.compile(loss='categorical_crossentropy', optimizer='adam')
3 model.summary()

5 frames
/content/STN.keras/src/models/STN.py in STN(input_shape, sampling_size, num_classes)
23 locnet = Dense(6, weights=weights)(locnet)
24 x = BilinearInterpolation(sampling_size)([image, locnet])
---> 25 x = Conv2D(32, (3, 3), padding='same')(x)
26 x = Activation('relu')(x)
27 x = MaxPool2D(pool_size=(2, 2))(x)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py in call(self, *args, **kwargs)
924 if _in_functional_construction_mode(self, inputs, args, kwargs, input_list):
925 return self._functional_construction_call(inputs, args, kwargs,
--> 926 input_list)
927
928 # Maintains info about the Layer.call stack.

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py in _functional_construction_call(self, inputs, args, kwargs, input_list)
1096 # Build layer if applicable (if the build method has been
1097 # overridden).
-> 1098 self._maybe_build(inputs)
1099 cast_inputs = self._maybe_cast_inputs(inputs, input_list)
1100

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py in _maybe_build(self, inputs)
2641 # operations.
2642 with tf_utils.maybe_init_scope(self):
-> 2643 self.build(input_shapes) # pylint:disable=not-callable
2644 # We must set also ensure that the layer is marked as built, and the build
2645 # shape is stored since user defined build functions may not be calling

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/convolutional.py in build(self, input_shape)
185 def build(self, input_shape):
186 input_shape = tensor_shape.TensorShape(input_shape)
--> 187 input_channel = self._get_input_channel(input_shape)
188 if input_channel % self.groups != 0:
189 raise ValueError(

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/convolutional.py in _get_input_channel(self, input_shape)
357 channel_axis = self._get_channel_axis()
358 if input_shape.dims[channel_axis].value is None:
--> 359 raise ValueError('The channel dimension of the inputs '
360 'should be defined. Found None.')
361 return int(input_shape[channel_axis])

ValueError: The channel dimension of the inputs should be defined. Found None.

Localization network

Hello,
How can I utilise localization network to predict the affine transformer parameters for images?
thanks

Wrong output on inputting image of different width and height

Hi,

The transformed image (output of STN) is correct when the width and height of the input image are the same. On inputting an image of size 512x256, the transformed image was duplicated twice. However, it seems to work fine for an input image of size 256x512. Below is the output of STN for an image of size 512x256 on applying transformation to a fixed image. The expected output should be similar to the moving image.

image

STN example not working

Traceback (most recent call last):
File "train.py", line 16, in
model = STN()
File "/home/vision/STN/STN.py", line 29, in STN
x = Conv2D(32, (3, 3), padding='same')(interpolated_image)
File "/home/vision/.virtualenvs/dl4cv/lib/python3.5/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 812, in call
self.name)
File "/home/vision/.virtualenvs/dl4cv/lib/python3.5/site-packages/tensorflow_core/python/keras/engine/input_spec.py", line 155, in assert_input_compatibility
' input tensors. Inputs received: ' + str(inputs))
ValueError: Layer conv2d_2 expects 1 inputs, but it received 4 input tensors. Inputs received: [<tf.Tensor 'bilinear_interpolation/Placeholder:0' shape= dtype=float32>, <tf.Tensor 'bilinear_interpolation/Placeholder_1:0' shape=(30,) dtype=float32>, <tf.Tensor 'bilinear_interpolation/Placeholder_2:0' shape=(30,) dtype=float32>, <tf.Tensor 'bilinear_interpolation/Placeholder_3:0' shape=(1,) dtype=float32>]

How to use this code to recognize mnist of size 28 * 28?

Hi, thanks for providing the code.
I have run the code and the result worked out well ,but when I try to transform the code to recognize mnist of size 28 * 28 I encounter some problems. Really appreciate if anyone could help.
Here is my code.

import keras.backend as K
from keras.datasets import mnist
from keras.optimizers import Adam
from src.models import STN
import matplotlib.pyplot as plt
import keras as k

num_classes = 10
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train[:50000]
y_train = y_train[:50000]
x_train = x_train.reshape((x_train.shape[0], 28, 28, 1))
x_test = x_test.reshape((x_test.shape[0], 28, 28, 1))
y_test = k.utils.to_categorical(y_test, num_classes)
y_train = k.utils.to_categorical(y_train, num_classes)

model = STN(input_shape=(28, 28, 1), sampling_size=(14, 14))
model.compile(loss='categorical_crossentropy', optimizer=Adam())
input_image = model.input
output_STN = model.get_layer('bilinear_interpolation_1').output
STN_function = K.function([input_image], [output_STN])

num_epochs = 3
batch_size = 10
model.fit(x_train, y_train, batch_size=batch_size, epochs=num_epochs)
image_result = STN_function([x_train[:10]])
for i in range(2):
    plt.imshow(x_train[i].reshape(28, 28), cmap='gray')
    plt.show()
    image = K.np.squeeze(image_result[0][i])
    plt.imshow(image, cmap='gray')
    plt.show()

And here is the result, I couldn't get the transformed image but the whole black.
image
image

What do I need to do when I change to other types of datasets? The loss stuck at around 2.3000 after 3 epochs of training.

Loading trained model with custom layers

Hi @oarriaga

Could you elaborate on the steps for loading the trained model. I am able to train the model using your code. But, while loading the trained model it gives an error saying SpatialTransformer layer is not found

Thanks

Localization network architecture

Is localization network architecture can be arbitrary, can it be just stack of conv layers, as long as it takes input image as input and outputs 6 values representing affine transformation, i.e. without dense layer?

Crashed when adding SpatialTransformer layer

Thanks for your sharing.
I tried to run your notebook with Tensorflow 1.0 but I got the error:

TypeError Traceback (most recent call last)
in ()
3 model.add(SpatialTransformer(localization_net=locnet,
4 input_shape=input_shape,
----> 5 output_size=(30,30)))
6
7 model.add(Convolution2D(32, 3, 3, border_mode='same'))

/home/deepws/anaconda2/lib/python2.7/site-packages/keras/models.pyc in add(self, layer)
297 else:
298 input_dtype = None
--> 299 layer.create_input_layer(batch_input_shape, input_dtype)
300
301 if len(layer.inbound_nodes) != 1:

/home/deepws/anaconda2/lib/python2.7/site-packages/keras/engine/topology.pyc in create_input_layer(self, batch_input_shape, input_dtype, name)
399 # and create the node connecting the current layer
400 # to the input layer we just created.
--> 401 self(x)
402
403 def add_weight(self, shape, initializer, name=None,

/home/deepws/anaconda2/lib/python2.7/site-packages/keras/engine/topology.pyc in call(self, x, mask)
570 if inbound_layers:
571 # This will call layer.build() if necessary.
--> 572 self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
573 # Outputs were already computed when calling self.add_inbound_node.
574 outputs = self.inbound_nodes[-1].output_tensors

/home/deepws/anaconda2/lib/python2.7/site-packages/keras/engine/topology.pyc in add_inbound_node(self, inbound_layers, node_indices, tensor_indices)
633 # creating the node automatically updates self.inbound_nodes
634 # as well as outbound_nodes on inbound layers.
--> 635 Node.create_node(self, inbound_layers, node_indices, tensor_indices)
636
637 def get_output_shape_for(self, input_shape):

/home/deepws/anaconda2/lib/python2.7/site-packages/keras/engine/topology.pyc in create_node(cls, outbound_layer, inbound_layers, node_indices, tensor_indices)
164
165 if len(input_tensors) == 1:
--> 166 output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
167 output_masks = to_list(outbound_layer.compute_mask(input_tensors[0], input_masks[0]))
168 # TODO: try to auto-infer shape

/home/deepws/workspace/spatial_transformer_networks/src/spatial_transformer.py in call(self, X, mask)
46 def call(self, X, mask=None):
47 affine_transformation = self.locnet.call(X)
---> 48 output = self._transform(affine_transformation, X, self.output_size)
49 return output
50

/home/deepws/workspace/spatial_transformer_networks/src/spatial_transformer.py in _transform(self, affine_transformation, input_shape, output_size)
145 output_height = output_size[0]
146 output_width = output_size[1]
--> 147 indices_grid = self._meshgrid(output_height, output_width)
148 indices_grid = tf.expand_dims(indices_grid, 0)
149 indices_grid = tf.reshape(indices_grid, [-1]) # flatten?

/home/deepws/workspace/spatial_transformer_networks/src/spatial_transformer.py in _meshgrid(self, height, width)
127 y_coordinates = tf.reshape(y_coordinates, shape=(1, -1))
128 ones = tf.ones_like(x_coordinates)
--> 129 indices_grid = tf.concat(0, [x_coordinates, y_coordinates, ones])
130 return indices_grid
131

/home/deepws/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.pyc in concat(values, axis, name)
1045 ops.convert_to_tensor(axis,
1046 name="concat_dim",
-> 1047 dtype=dtypes.int32).get_shape(
1048 ).assert_is_compatible_with(tensor_shape.scalar())
1049 return identity(values[0], name=scope)

/home/deepws/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc in convert_to_tensor(value, dtype, name, preferred_dtype)
649 name=name,
650 preferred_dtype=preferred_dtype,
--> 651 as_ref=False)
652
653

/home/deepws/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype)
714
715 if ret is None:
--> 716 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
717
718 if ret is NotImplemented:

/home/deepws/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.pyc in _constant_tensor_conversion_function(v, dtype, name, as_ref)
174 as_ref=False):
175 _ = as_ref
--> 176 return constant(v, dtype=dtype, name=name)
177
178

/home/deepws/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.pyc in constant(value, dtype, shape, name, verify_shape)
163 tensor_value = attr_value_pb2.AttrValue()
164 tensor_value.tensor.CopyFrom(
--> 165 tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
166 dtype_value = attr_value_pb2.AttrValue(type=tensor_value.tensor.dtype)
167 const_tensor = g.create_op(

/home/deepws/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.pyc in make_tensor_proto(values, dtype, shape, verify_shape)
365 nparray = np.empty(shape, dtype=np_dt)
366 else:
--> 367 _AssertCompatible(values, dtype)
368 nparray = np.array(values, dtype=np_dt)
369 # check to them.

/home/deepws/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.pyc in _AssertCompatible(values, dtype)
300 else:
301 raise TypeError("Expected %s, got %s of type '%s' instead." %
--> 302 (dtype.name, repr(mismatch), type(mismatch).name))
303
304

TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

projective transformation

Dear Oarriaga,

Your implement of STN is awesome, thx a lot for doing this.

In STN.keras/src/models/STN.py, line 23-24

 locnet = Dense(6, weights=weights)(locnet)
 x = BilinearInterpolation(sampling_size)([image, locnet])

the locnet is set to have 6 parameter, which means affine transformation, but I want to do projective transformation, which require 9 (actually 8) parameters, could I just change 6 to 9, or I need to rewrite the whole BilinearInterpolation function?

The BilinearInterpolation function is a little bit long, it may take me some time to fully understand it.

Thank you very much for your help.

Best Wishes,

Alex

Output of spatial transformer network is in plain black color

i tried using the Spatial Transformer layer from https://github.com/johnwangMK/spatial_transformer_networks/blob/master/src/spatial_transformer.py (edited num_channels in line 137 inside def_transform from tf.shape(input_shape)[3] to 3 in my code) in my License Plate Recognition project. But the model didn't converge, I returned output of STN with the model outputs and it is plain black, nothing else.

def locnet(self):
    b = np.zeros((2, 3), dtype='float32')
    b[0, 0] = 1
    b[1, 1] = 1
    W = np.zeros((64, 6), dtype='float32')
    weights = [W, b.flatten()]
    locnet = Sequential()

    locnet.add(Conv2D(16, (7, 7), padding='valid', input_shape=(48,188, 3),kernel_initializer='glorot_normal'))
    locnet.add(MaxPool2D(pool_size=(2, 2)))
    locnet.add(Conv2D(32, (5, 5), padding='valid'))
    locnet.add(MaxPool2D(pool_size=(2, 2)))
    locnet.add(Conv2D(64, (3, 3), padding='valid'))
    locnet.add(MaxPool2D(pool_size=(2, 2)))

    locnet.add(Flatten())
    locnet.add(Dense(128))
    locnet.add(Activation('relu'))
    locnet.add(Dense(64))
    locnet.add(Activation('relu'))
    locnet.add(Dense(6, weights=weights))

    return locnet

`
This is my locnet function and I call the STN as:

self.input_shape = 48,188,3
def _build(self):
    inputs = Input(self.input_shape)
    stn = SpatialTransformer(localization_net=self.locnet(),
                                 output_size=(24,94))(inputs)
    followed by licence plate recognition model which works well    

After using this, the model isn't converging and when I try to plot the output of stn as:

image = np.squeeze(stn_out)[0]
cv2.resize(image, (100,200))
cv2.imshow("frame", image)

it just gives plain black color.

Not only that, the output of the license plate model is same for every image. So, I'm guessing this definitely is because of some mistake I'm making in the Spatial Transformer but I don't know what.

Tensorflow Version - 1.15.2

Edit: I found out the problem, but don't really know how to fix this. In the interpolate function withing Spatial Transformer, while calculating area_a, area_b, area_c and area_d my values are setting up like:
area_a = - area_b and area_c = - area_d. If anyone's got any idea why this is happening or how to fix this, it'd be really helpful.

inverse theta vector

Hello,
when I try to perform different, specific transformations with the SpatialTransformer, I have to set the corresponding inverse transformation matrix in the bias vector (in the localization net). Do you know why the inverse transformation has to be set? Thanks.

Problem with Bilinear Interpolation

Hello,
I have tried your code for an image of size 48*48, i want to use the _transform function whose definition is like: _transform(self,X,affine_transformation,output_size) of class Bi-linear interpolation but i am not getting what should be the value of affine_transformation, as in the code of STN.py file Bilinear interpolation is just initialized i was unable to find the _transform function call. Please help me to understand the code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.