GithubHelp home page GithubHelp logo

emgarr / kerod Goto Github PK

View Code? Open in Web Editor NEW
54.0 54.0 10.0 4.86 MB

DETR - Faster RCNN implementation in tensorflow 2

Home Page: https://emgarr.github.io/kerod/

License: MIT License

Makefile 0.02% Python 82.24% Jupyter Notebook 17.74%
coco computer-vision detection detections detr faster-rcnn feature-pyramid-network instance-segmentation object-detection tensorflow tensorflow2 transformer

kerod's Introduction

Hi there ๐Ÿ‘‹

Who am I

Computer Vision Research Engineer with 6+ years experience in deep learning: from research to implementation and design of R&D infrastructure. You can checkout my linkedin.

kerod's People

Contributors

emgarr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

kerod's Issues

Another error when training DETR

Describe the bug
I got following error, it seems there is something wrong with bipartite maching loss.
Do you have any idea?

To Reproduce
run this notebook on colab after fixing tfa version and num gpus
https://colab.research.google.com/github/Emgarr/kerod/blob/master/notebooks/smca_coco_training_multi_gpu.ipynb

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots

Epoch 1/50
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
    176/Unknown - 3724s 21s/step - loss: 28.8533 - giou_last_layer: 1.9511 - l1_last_layer: 2.2543 - focal_loss_last_layer: 0.7390 - sparse_categorical_accuracy: 0.9078 - object_recall: 0.0000e+00
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-4-ee0ddafd652f> in <module>()
     28 ]
     29 
---> 30 history = model.fit(ds_train, validation_data=ds_val, epochs=50, callbacks=callbacks)
     31 model.save('detr_an_awesome_model')
     32 model.save_weights('detr_an_awesome_model.h5')

6 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
   1098                 _r=1):
   1099               callbacks.on_train_batch_begin(step)
-> 1100               tmp_logs = self.train_function(iterator)
   1101               if data_handler.should_sync:
   1102                 context.async_wait()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
    826     tracing_count = self.experimental_get_tracing_count()
    827     with trace.Trace(self._name) as tm:
--> 828       result = self._call(*args, **kwds)
    829       compiler = "xla" if self._experimental_compile else "nonXla"
    830       new_tracing_count = self.experimental_get_tracing_count()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
    853       # In this case we have created variables on the first call, so we run the
    854       # defunned version which is guaranteed to never create variables.
--> 855       return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
    856     elif self._stateful_fn is not None:
    857       # Release the lock early so that multiple threads can perform the call

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in __call__(self, *args, **kwargs)
   2941        filtered_flat_args) = self._maybe_define_function(args, kwargs)
   2942     return graph_function._call_flat(
-> 2943         filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
   2944 
   2945   @property

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
   1917       # No tape is watching; skip to running the function.
   1918       return self._build_call_outputs(self._inference_function.call(
-> 1919           ctx, args, cancellation_manager=cancellation_manager))
   1920     forward_backward = self._select_forward_and_backward_functions(
   1921         args,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in call(self, ctx, args, cancellation_manager)
    558               inputs=args,
    559               attrs=attrs,
--> 560               ctx=ctx)
    561         else:
    562           outputs = execute.execute_with_cancellation(

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

InvalidArgumentError:  ValueError: matrix contains invalid numeric entries
Traceback (most recent call last):

  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/script_ops.py", line 247, in __call__
    return func(device, token, args)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/script_ops.py", line 135, in __call__
    ret = self._func(*args)

  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 620, in wrapper
    return func(*args, **kwargs)

  File "/usr/local/lib/python3.6/dist-packages/kerod/core/matcher.py", line 181, in <lambda>
    return tf.py_function(lambda c: linear_sum_assignment(c), [cost_matrix],

  File "/usr/local/lib/python3.6/dist-packages/scipy/optimize/_lsap.py", line 93, in linear_sum_assignment
    raise ValueError("matrix contains invalid numeric entries")

ValueError: matrix contains invalid numeric entries


	 [[{{node loop_body/EagerPyFunc/pfor/while/body/_1/loop_body/EagerPyFunc/pfor/while/EagerPyFunc}}]] [Op:__inference_train_function_69153]

Function call stack:
train_function

Desktop (please complete the following information):
colab notebook

Additional context
Sorry for a lot of question!
Thanks in advance.

Error on colab notebook

Describe the bug
Colab notebook stop by follwoing error.

To Reproduce
Steps to reproduce the behavior:
Just run all this notebook.
https://colab.research.google.com/github/Emgarr/kerod/blob/master/notebooks/smca_coco_training_multi_gpu.ipynb

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
Epoch 1/50
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
INFO:tensorflow:Error reported to Coordinator: minimize() got an unexpected keyword argument 'tape'
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
    yield
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/mirrored_run.py", line 323, in run
    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 667, in wrapper
    return converted_call(f, args, kwargs, options=options)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 396, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 478, in _call_unconverted
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 788, in run_step
    outputs = model.train_step(data)
  File "/usr/local/lib/python3.6/dist-packages/kerod/model/smca_detr.py", line 321, in train_step
    self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
TypeError: minimize() got an unexpected keyword argument 'tape'
INFO:tensorflow:Error reported to Coordinator: minimize() got an unexpected keyword argument 'tape'
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
    yield
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/mirrored_run.py", line 323, in run
    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 667, in wrapper
    return converted_call(f, args, kwargs, options=options)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 396, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 478, in _call_unconverted
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 788, in run_step
    outputs = model.train_step(data)
  File "/usr/local/lib/python3.6/dist-packages/kerod/model/smca_detr.py", line 321, in train_step
    self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
TypeError: minimize() got an unexpected keyword argument 'tape'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-3de9ded0293d> in <module>()
     28 ]
     29 
---> 30 history = model.fit(ds_train, validation_data=ds_val, epochs=50, callbacks=callbacks)
     31 model.save('detr_an_awesome_model')
     32 model.save_weights('detr_an_awesome_model.h5')

9 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
    975           except Exception as e:  # pylint:disable=broad-except
    976             if hasattr(e, "ag_error_metadata"):
--> 977               raise e.ag_error_metadata.to_exception(e)
    978             else:
    979               raise

TypeError: in user code:

    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:805 train_function  *
        return step_function(self, iterator)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:795 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:1259 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2730 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/mirrored_strategy.py:629 _call_for_each_replica
        self._container_strategy(), fn, args, kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/mirrored_run.py:93 call_for_each_replica
        return _call_for_each_replica(strategy, fn, args, kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/mirrored_run.py:234 _call_for_each_replica
        coord.join(threads)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/coordinator.py:389 join
        six.reraise(*self._exc_info_to_raise)
    /usr/local/lib/python3.6/dist-packages/six.py:703 reraise
        raise value
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/coordinator.py:297 stop_on_exception
        yield
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/mirrored_run.py:323 run
        self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:788 run_step  **
        outputs = model.train_step(data)
    /usr/local/lib/python3.6/dist-packages/kerod/model/smca_detr.py:321 train_step
        self.optimizer.minimize(loss, self.trainable_variables, tape=tape)

    TypeError: minimize() got an unexpected keyword argument 'tape'

Desktop (please complete the following information):
colab notebook

Additional context
Thanks for sharing great work. I was surprised at finding SMCA inplementation!

loss didn't decrease

Describe the bug
After fixing lr, I ran the DETR training but it seems loss didn't decrease at all.
I know DETR convergence is so slow, but is this loss behavior natural?

To Reproduce
run this notebook
https://colab.research.google.com/github/Emgarr/kerod/blob/master/notebooks/detr_coco_training_multi_gpu.ipynb

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots

Epoch 1/300
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
  34458/Unknown - 16564s 479ms/step - loss: 31.4098 - giou_last_layer: 1.7223 - l1_last_layer: 1.3152 - scc_last_layer: 2.1834 - sparse_categorical_accuracy: 0.5316 - object_recall: 5.6169e-04WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
34458/34458 [==============================] - 16938s 490ms/step - loss: 31.4098 - giou_last_layer: 1.7223 - l1_last_layer: 1.3152 - scc_last_layer: 2.1834 - sparse_categorical_accuracy: 0.5316 - object_recall: 5.6169e-04 - val_loss: 31.0949 - val_giou_last_layer: 1.7153 - val_l1_last_layer: 1.2824 - val_scc_last_layer: 2.1859 - val_sparse_categorical_accuracy: 0.5282 - val_object_recall: 0.0000e+00
Epoch 2/300
34458/34458 [==============================] - 16649s 483ms/step - loss: 31.5205 - giou_last_layer: 1.7350 - l1_last_layer: 1.3259 - scc_last_layer: 2.1788 - sparse_categorical_accuracy: 0.5319 - object_recall: 0.0000e+00 - val_loss: 31.5231 - val_giou_last_layer: 1.7450 - val_l1_last_layer: 1.2715 - val_scc_last_layer: 2.2023 - val_sparse_categorical_accuracy: 0.5282 - val_object_recall: 0.0000e+00
Epoch 3/300
34458/34458 [==============================] - 15912s 462ms/step - loss: 31.5544 - giou_last_layer: 1.7355 - l1_last_layer: 1.3301 - scc_last_layer: 2.1814 - sparse_categorical_accuracy: 0.5319 - object_recall: 0.0000e+00 - val_loss: 31.5587 - val_giou_last_layer: 1.7398 - val_l1_last_layer: 1.2982 - val_scc_last_layer: 2.1964 - val_sparse_categorical_accuracy: 0.5282 - val_object_recall: 0.0000e+00
Epoch 4/300
34458/34458 [==============================] - 15974s 463ms/step - loss: 31.5491 - giou_last_layer: 1.7391 - l1_last_layer: 1.3330 - scc_last_layer: 2.1796 - sparse_categorical_accuracy: 0.5319 - object_recall: 0.0000e+00 - val_loss: 31.4192 - val_giou_last_layer: 1.7525 - val_l1_last_layer: 1.3120 - val_scc_last_layer: 2.1949 - val_sparse_categorical_accuracy: 0.5282 - val_object_recall: 0.0000e+00
Epoch 5/300
34458/34458 [==============================] - 16581s 481ms/step - loss: 31.4819 - giou_last_layer: 1.7322 - l1_last_layer: 1.3308 - scc_last_layer: 2.1796 - sparse_categorical_accuracy: 0.5319 - object_recall: 0.0000e+00 - val_loss: 31.6360 - val_giou_last_layer: 1.7783 - val_l1_last_layer: 1.3163 - val_scc_last_layer: 2.1977 - val_sparse_categorical_accuracy: 0.5282 - val_object_recall: 0.0000e+00
Epoch 6/300
 1580/34458 [>.............................] - ETA: 4:21:11 - loss: 31.5871 - giou_last_layer: 1.7425 - l1_last_layer: 1.3287 - scc_last_layer: 2.1863 - sparse_categorical_accuracy: 0.5323 - object_recall: 0.0000e+00

Desktop (please complete the following information):
colab notebook

Additional context
Add any other context about the problem here.

Custom dataset with "from_generator"

Hi!

First off, thanks for the great work with a TF2 compatible DETR code!

I've been working with object detection using EfficientDet and my own dataset. The dataset consists of synthetic data, from a generator function, something along the lines of

def generator():

    while True:

        # Do work

        # yields the image, the label, and the bounding box
        yield (image, ([label], [[x1, y1, x2, y2]]))

My initial tensorflow dataset object is then made from:

output_shapes = (tf.TensorShape([512,512, 3]), 
                    (tf.TensorShape([None]),
                    tf.TensorShape([None, 4])))

ds = (tf.data.Dataset
        .from_generator(generator=generator, 
                        output_types=(tf.float32, (tf.int32, tf.float32)),
                        output_shapes=output_shapes)
        )

However, the output of the tensorflow_datasets imports is in the form a dict.
Any way we can get a custom generator to work with your current code?

Thanks!!

Is there any way to run this repo on TPU?

Is your feature request related to a problem? Please describe.
I saw the notebook which has TPU in its name, but now I can't find.
Is there any way to train DETR on TPU?
Thanks,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.