emgarr / kerod Goto Github PK

DETR - Faster RCNN implementation in tensorflow 2

Home Page: https://emgarr.github.io/kerod/

License: MIT License

Makefile 0.02% Python 82.24% Jupyter Notebook 17.74%

coco computer-vision detection detections detr faster-rcnn feature-pyramid-network instance-segmentation object-detection tensorflow tensorflow2 transformer

kerod's Introduction

Hi there 👋

Who am I

Computer Vision Research Engineer with 6+ years experience in deep learning: from research to implementation and design of R&D infrastructure. You can checkout my linkedin.

kerod's People

Contributors

Stargazers

Watchers

Forkers

gaopengcuhk nisarahamedk abc403 cognoscentai kriszhh rock4you cv-ip techthiyanes amiraismai lsanselme

kerod's Issues

Have you tested performance of DETR and SMCA on coco?

If so, is there any chance to get pretrained model?
And as for SMCA, which part is different from original implementation?

Another error when training DETR

Describe the bug
I got following error, it seems there is something wrong with bipartite maching loss.
Do you have any idea?

To Reproduce
run this notebook on colab after fixing tfa version and num gpus
https://colab.research.google.com/github/Emgarr/kerod/blob/master/notebooks/smca_coco_training_multi_gpu.ipynb

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots

Epoch 1/50
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
    176/Unknown - 3724s 21s/step - loss: 28.8533 - giou_last_layer: 1.9511 - l1_last_layer: 2.2543 - focal_loss_last_layer: 0.7390 - sparse_categorical_accuracy: 0.9078 - object_recall: 0.0000e+00
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-4-ee0ddafd652f> in <module>()
     28 ]
     29 
---> 30 history = model.fit(ds_train, validation_data=ds_val, epochs=50, callbacks=callbacks)
     31 model.save('detr_an_awesome_model')
     32 model.save_weights('detr_an_awesome_model.h5')

6 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
   1098                 _r=1):
   1099               callbacks.on_train_batch_begin(step)
-> 1100               tmp_logs = self.train_function(iterator)
   1101               if data_handler.should_sync:
   1102                 context.async_wait()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
    826     tracing_count = self.experimental_get_tracing_count()
    827     with trace.Trace(self._name) as tm:
--> 828       result = self._call(*args, **kwds)
    829       compiler = "xla" if self._experimental_compile else "nonXla"
    830       new_tracing_count = self.experimental_get_tracing_count()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
    853       # In this case we have created variables on the first call, so we run the
    854       # defunned version which is guaranteed to never create variables.
--> 855       return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
    856     elif self._stateful_fn is not None:
    857       # Release the lock early so that multiple threads can perform the call

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in __call__(self, *args, **kwargs)
   2941        filtered_flat_args) = self._maybe_define_function(args, kwargs)
   2942     return graph_function._call_flat(
-> 2943         filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
   2944 
   2945   @property

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
   1917       # No tape is watching; skip to running the function.
   1918       return self._build_call_outputs(self._inference_function.call(
-> 1919           ctx, args, cancellation_manager=cancellation_manager))
   1920     forward_backward = self._select_forward_and_backward_functions(
   1921         args,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in call(self, ctx, args, cancellation_manager)
    558               inputs=args,
    559               attrs=attrs,
--> 560               ctx=ctx)
    561         else:
    562           outputs = execute.execute_with_cancellation(

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

InvalidArgumentError:  ValueError: matrix contains invalid numeric entries
Traceback (most recent call last):

  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/script_ops.py", line 247, in __call__
    return func(device, token, args)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/script_ops.py", line 135, in __call__
    ret = self._func(*args)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 620, in wrapper
    return func(*args, **kwargs)

  File "/usr/local/lib/python3.6/dist-packages/kerod/core/matcher.py", line 181, in <lambda>
    return tf.py_function(lambda c: linear_sum_assignment(c), [cost_matrix],

  File "/usr/local/lib/python3.6/dist-packages/scipy/optimize/_lsap.py", line 93, in linear_sum_assignment
    raise ValueError("matrix contains invalid numeric entries")

ValueError: matrix contains invalid numeric entries


	 [[{{node loop_body/EagerPyFunc/pfor/while/body/_1/loop_body/EagerPyFunc/pfor/while/EagerPyFunc}}]] [Op:__inference_train_function_69153]

Function call stack:
train_function

Desktop (please complete the following information):
colab notebook

Additional context
Sorry for a lot of question!
Thanks in advance.

Error on colab notebook

Describe the bug
Colab notebook stop by follwoing error.

To Reproduce
Steps to reproduce the behavior:
Just run all this notebook.
https://colab.research.google.com/github/Emgarr/kerod/blob/master/notebooks/smca_coco_training_multi_gpu.ipynb

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
Epoch 1/50
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
INFO:tensorflow:Error reported to Coordinator: minimize() got an unexpected keyword argument 'tape'
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
    yield
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/mirrored_run.py", line 323, in run
    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 667, in wrapper
    return converted_call(f, args, kwargs, options=options)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 396, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 478, in _call_unconverted
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 788, in run_step
    outputs = model.train_step(data)
  File "/usr/local/lib/python3.6/dist-packages/kerod/model/smca_detr.py", line 321, in train_step
    self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
TypeError: minimize() got an unexpected keyword argument 'tape'
INFO:tensorflow:Error reported to Coordinator: minimize() got an unexpected keyword argument 'tape'
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
    yield
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/mirrored_run.py", line 323, in run
    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 667, in wrapper
    return converted_call(f, args, kwargs, options=options)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 396, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 478, in _call_unconverted
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 788, in run_step
    outputs = model.train_step(data)
  File "/usr/local/lib/python3.6/dist-packages/kerod/model/smca_detr.py", line 321, in train_step
    self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
TypeError: minimize() got an unexpected keyword argument 'tape'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-3de9ded0293d> in <module>()
     28 ]
     29 
---> 30 history = model.fit(ds_train, validation_data=ds_val, epochs=50, callbacks=callbacks)
     31 model.save('detr_an_awesome_model')
     32 model.save_weights('detr_an_awesome_model.h5')

9 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
    975           except Exception as e:  # pylint:disable=broad-except
    976             if hasattr(e, "ag_error_metadata"):
--> 977               raise e.ag_error_metadata.to_exception(e)
    978             else:
    979               raise

TypeError: in user code:

    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:805 train_function  *
        return step_function(self, iterator)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:795 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:1259 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2730 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/mirrored_strategy.py:629 _call_for_each_replica
        self._container_strategy(), fn, args, kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/mirrored_run.py:93 call_for_each_replica
        return _call_for_each_replica(strategy, fn, args, kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/mirrored_run.py:234 _call_for_each_replica
        coord.join(threads)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/coordinator.py:389 join
        six.reraise(*self._exc_info_to_raise)
    /usr/local/lib/python3.6/dist-packages/six.py:703 reraise
        raise value
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/coordinator.py:297 stop_on_exception
        yield
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/mirrored_run.py:323 run
        self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:788 run_step  **
        outputs = model.train_step(data)
    /usr/local/lib/python3.6/dist-packages/kerod/model/smca_detr.py:321 train_step
        self.optimizer.minimize(loss, self.trainable_variables, tape=tape)

    TypeError: minimize() got an unexpected keyword argument 'tape'

Desktop (please complete the following information):
colab notebook

Additional context
Thanks for sharing great work. I was surprised at finding SMCA inplementation!

Official Code of SMCA-DETR with pytorch implementation has been released

Please refer to the following github link:

https://github.com/gaopengcuhk/SMCA-DETR

loss didn't decrease

Describe the bug
After fixing lr, I ran the DETR training but it seems loss didn't decrease at all.
I know DETR convergence is so slow, but is this loss behavior natural?

To Reproduce
run this notebook
https://colab.research.google.com/github/Emgarr/kerod/blob/master/notebooks/detr_coco_training_multi_gpu.ipynb

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots

Epoch 1/300
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
  34458/Unknown - 16564s 479ms/step - loss: 31.4098 - giou_last_layer: 1.7223 - l1_last_layer: 1.3152 - scc_last_layer: 2.1834 - sparse_categorical_accuracy: 0.5316 - object_recall: 5.6169e-04WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
34458/34458 [==============================] - 16938s 490ms/step - loss: 31.4098 - giou_last_layer: 1.7223 - l1_last_layer: 1.3152 - scc_last_layer: 2.1834 - sparse_categorical_accuracy: 0.5316 - object_recall: 5.6169e-04 - val_loss: 31.0949 - val_giou_last_layer: 1.7153 - val_l1_last_layer: 1.2824 - val_scc_last_layer: 2.1859 - val_sparse_categorical_accuracy: 0.5282 - val_object_recall: 0.0000e+00
Epoch 2/300
34458/34458 [==============================] - 16649s 483ms/step - loss: 31.5205 - giou_last_layer: 1.7350 - l1_last_layer: 1.3259 - scc_last_layer: 2.1788 - sparse_categorical_accuracy: 0.5319 - object_recall: 0.0000e+00 - val_loss: 31.5231 - val_giou_last_layer: 1.7450 - val_l1_last_layer: 1.2715 - val_scc_last_layer: 2.2023 - val_sparse_categorical_accuracy: 0.5282 - val_object_recall: 0.0000e+00
Epoch 3/300
34458/34458 [==============================] - 15912s 462ms/step - loss: 31.5544 - giou_last_layer: 1.7355 - l1_last_layer: 1.3301 - scc_last_layer: 2.1814 - sparse_categorical_accuracy: 0.5319 - object_recall: 0.0000e+00 - val_loss: 31.5587 - val_giou_last_layer: 1.7398 - val_l1_last_layer: 1.2982 - val_scc_last_layer: 2.1964 - val_sparse_categorical_accuracy: 0.5282 - val_object_recall: 0.0000e+00
Epoch 4/300
34458/34458 [==============================] - 15974s 463ms/step - loss: 31.5491 - giou_last_layer: 1.7391 - l1_last_layer: 1.3330 - scc_last_layer: 2.1796 - sparse_categorical_accuracy: 0.5319 - object_recall: 0.0000e+00 - val_loss: 31.4192 - val_giou_last_layer: 1.7525 - val_l1_last_layer: 1.3120 - val_scc_last_layer: 2.1949 - val_sparse_categorical_accuracy: 0.5282 - val_object_recall: 0.0000e+00
Epoch 5/300
34458/34458 [==============================] - 16581s 481ms/step - loss: 31.4819 - giou_last_layer: 1.7322 - l1_last_layer: 1.3308 - scc_last_layer: 2.1796 - sparse_categorical_accuracy: 0.5319 - object_recall: 0.0000e+00 - val_loss: 31.6360 - val_giou_last_layer: 1.7783 - val_l1_last_layer: 1.3163 - val_scc_last_layer: 2.1977 - val_sparse_categorical_accuracy: 0.5282 - val_object_recall: 0.0000e+00
Epoch 6/300
 1580/34458 [>.............................] - ETA: 4:21:11 - loss: 31.5871 - giou_last_layer: 1.7425 - l1_last_layer: 1.3287 - scc_last_layer: 2.1863 - sparse_categorical_accuracy: 0.5323 - object_recall: 0.0000e+00

Desktop (please complete the following information):
colab notebook

Additional context
Add any other context about the problem here.

Custom dataset with "from_generator"

Hi!

First off, thanks for the great work with a TF2 compatible DETR code!

I've been working with object detection using EfficientDet and my own dataset. The dataset consists of synthetic data, from a generator function, something along the lines of

def generator():

    while True:

        # Do work

        # yields the image, the label, and the bounding box
        yield (image, ([label], [[x1, y1, x2, y2]]))

My initial tensorflow dataset object is then made from:

output_shapes = (tf.TensorShape([512,512, 3]), 
                    (tf.TensorShape([None]),
                    tf.TensorShape([None, 4])))

ds = (tf.data.Dataset
        .from_generator(generator=generator, 
                        output_types=(tf.float32, (tf.int32, tf.float32)),
                        output_shapes=output_shapes)
        )

However, the output of the tensorflow_datasets imports is in the form a dict.
Any way we can get a custom generator to work with your current code?

Thanks!!

Is there any way to run this repo on TPU?

Is your feature request related to a problem? Please describe.
I saw the notebook which has TPU in its name, but now I can't find.
Is there any way to train DETR on TPU?
Thanks,

emgarr / kerod Goto Github PK

kerod's Introduction

Hi there 👋

Who am I

kerod's People

Contributors

Stargazers

Watchers

Forkers

kerod's Issues

Have you tested performance of DETR and SMCA on coco?

Another error when training DETR

Error on colab notebook

Official Code of SMCA-DETR with pytorch implementation has been released

loss didn't decrease

Custom dataset with "from_generator"

Is there any way to run this repo on TPU?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs