idea-research / dab-detr Goto Github PK

View Code? Open in Web Editor NEW

499.0 499.0 84.0 12.15 MB

[ICLR 2022] Official implementation of the paper "DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR"

License: Apache License 2.0

Python 18.97% Shell 0.06% C++ 0.30% Cuda 2.99% Jupyter Notebook 77.68%

dab-detr detection detr transformer

dab-detr's People

Contributors

Stargazers

Watchers

Forkers

sorrowyn surajitsaikia27 y78h11b09 wulele2 xu-justin lv-tuan yjgwak jacobyuan7 qritive-ola ooolox hangfang6 4three2one yoowang zhu-han233 nymbayar xiaohaochen futuremine97 celsopitta lk-wordsnake k-mike leayz-888 hassan11196 pangrunqi aoxu2000 dhkim36 dmitriyg228 wolfworld6 jonychoi qwopqwop200 cv-det dulucas timmhxw tahirashehzadi skyrookieyu alittlenico shengstar aiyayf rentainhe ola0x jurehudoklin vghost2008 quanghuy2907 joker316701882 zhangjunpeng9354 anhtu-phan leonhlj paperkaiser xiaomabufei sundrops damien911224 mryu001 james-rogerson zhenlongsong ashstuff keyu-tian zjufkq kanikel developer0hye seungyonglee0802 1xilanhua athrunsunny drryanhuang whatchang lyf-cmd leftthink lihpam haohao11 dyinglight-dwc dreamer312 vincelab401 rastna12 jennyleestat robot-76 qiwen-1123 spaci-yanghaonan udischarge kinkalow golden-slumbers-1997 conditionedstimulus whuhxb richardminsoogo-ml arminphys johnyizheng qiqidonebyte

dab-detr's Issues

ONNX model generation

Can you please convert your model into the ONNX model? I want to test it on tensor rt for inferencing.
I am trying to convert it to the ONNX model but getting the following error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0, and cpu! (when checking argument for argument index in method wrapper__index_select)

The meaning of num_feature_levels.

Thanks for your great and detailed code!

In dab_deformable_detr.py, is this part about dealing with the insufficient channel of features (for Deformable DETR)?

https://github.com/IDEA-opensource/DAB-DETR/blob/309f6ad92af7a62d7732c1bdf1e0c7a69a7bdaef/models/dab_deformable_detr/dab_deformable_detr.py#L169-L181

It seems that the last channel of features is put to backbone[1] repeatedly. (channel? or something else?)
Could you please help me understand that?

Num_patterns in dab deformable detr

Thanks for the great contribution to the literature.

I have a quick question. Is there a specific reason for not using pattern embedding in dab-deformable-detr? Is pattern embedding not applicable to deformable structure? Maybe it might not increase the performance of dab-detr but I havent seen any ablation study related with that.

Thanks in advance

PostProcess

Is it possible that multiple labels of a query are topks during post-processing? How to solve this situation?

Why does the classification head output 91 dimensionalities?

COCO only has 80 classes in object detection dataset ，why are 91 categories predicted in the code？

What is the purpose of the minus of max attention weight

Hi, thanks for you nice work!

But I have a confuse about the code.What is the purpose of the minus of max attention weight？

https://github.com/IDEA-opensource/DAB-DETR/blob/9b637396d2d8eea16b39940cde8e7d34262cb2e2/models/DAB_DETR/attention.py#L381-L382

Looking for your reply!

Temperature tuning and positional embeddings for DAB-DETR

In the paper, there is a section saying the optimal temperature for positional embedding is 20 in your model. However, this line under gen_sineembed_for_position indicates that a value of 10000 is used for the temperature. Is there any part I missed when I am trying to understand the codes?

Besides, the paper also says that only x and y coordinates are used to generate positional embedding for the cross-attention, but this line, despite commenting as num_queries x batch_size x 2, actually operates on num_queries x batch_size x 4 if printing out the tensor shape. Does this perform better than only using x&y or they are similar in performance?

What's the performance when `rm_self_attn_decoder` set to `True`?

Hello, I noticed that in https://github.com/IDEA-opensource/DAB-DETR/blob/94f20eda2be7cf38ddf94102db92e949b6846a65/models/DAB_DETR/transformer.py#L333 , there is a parameter rm_self_attn_decoder that is set to False by default.
Have you ever set it to True? What's the performance?

Why modulating attention by w&h works?

I have some doubts on line https://github.com/IDEA-opensource/DAB-DETR/blob/main/models/DAB_DETR/transformer.py#L242 .

refHW_cond = self.ref_anchor_head(output).sigmoid() # nq, bs, 2

This line asks the model to learn absolute value of w, h from output. But NO supervision is applied. Besides, the 'output' tensor is used to learn the OFFSET of bbox (x, y, w, h).

So, I am wondering whether the model can learn width and height as expected?

I see that the strong_aug (augmentations) argument is missing from the main.py. Is there a specific reason behind it? Does it change the Acutal AP?

Twostage deformable detr and dab

Can't two stage and use DAB be set to true at the same time?
if self.two_stage: ...

elif self.use_dab: ...

Question about some configuration in DAB-Deformable DETR.

Thanks for your great work. I have some questions about the implementation of DAB-Deformable DETR.

In DAB-DETR the position embedding is sinehw, while in DAB-Deformable-DETR it uses the original sine. Is there any reason for this difference?
I found the configuration uses a larger dim_feedforward=2048. How does it performance with 1024?
Have you experimented with the two-stage setting in Deformable-DETR. Could you share the results?

How to use the swin-transformer as the backbone for DAB-DETR

Thank you for help!

About plot_logs

Hello, thanks for your great work!

When I finish training and get log.txt, I want to visualize it using plot_logs, as follows：

But I get an ERROR on this line：

https://github.com/IDEA-opensource/DAB-DETR/blob/11d1948565b71e6622d01eccca0b9f08022cdc82/util/plot_utils.py#L65

Traceback (most recent call last):
  File "H:/yjs/code/DAB-DETR-main/tmp.py", line 53, in <module>
    fig, axs = plot_logs(log_path)
  File "H:\yjs\code\DAB-DETR-main\util\plot_utils.py", line 65, in plot_logs
    df.interpolate().ewm(com=ewm_col).mean().plot(
  File "H:\Anaconda\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "H:\Anaconda\lib\site-packages\pandas\core\frame.py", line 10712, in interpolate
    return super().interpolate(
  File "H:\Anaconda\lib\site-packages\pandas\core\generic.py", line 6899, in interpolate
    new_data = obj._mgr.interpolate(
  File "H:\Anaconda\lib\site-packages\pandas\core\internals\managers.py", line 377, in interpolate
    return self.apply("interpolate", **kwargs)
  File "H:\Anaconda\lib\site-packages\pandas\core\internals\managers.py", line 327, in apply
    applied = getattr(b, f)(**kwargs)
  File "H:\Anaconda\lib\site-packages\pandas\core\internals\blocks.py", line 1369, in interpolate
    new_values = values.fillna(value=fill_value, method=method, limit=limit)
  File "H:\Anaconda\lib\site-packages\pandas\core\arrays\_mixins.py", line 218, in fillna
    value, method = validate_fillna_kwargs(
  File "H:\Anaconda\lib\site-packages\pandas\util\_validators.py", line 372, in validate_fillna_kwargs
    method = clean_fill_method(method)
  File "H:\Anaconda\lib\site-packages\pandas\core\missing.py", line 120, in clean_fill_method
    raise ValueError(f"Invalid fill method. Expecting {expecting}. Got {method}")
ValueError: Invalid fill method. Expecting pad (ffill) or backfill (bfill). Got linear

My version of pandas is 1.3.5

I don't know if I am using it in a wrong way or it is a bug in pandas, how can I fix it?

nvcc fatal : Unsupported gpu architecture 'compute_86'

When I setup deformable multi head attention, report the error as below:

nvcc fatal : Unsupported gpu architecture 'compute_86'
Traceback (most recent call last):
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1723, in _run_ninja_build
env=env)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "setup.py", line 72, in
cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension},
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 735, in build_extensions
build_ext.build_extensions(self)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
_build_ext.build_extension(self, ext)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/command/build_ext.py", line 534, in build_extension
depends=ext.depends)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 565, in unix_wrap_ninja_compile
with_cuda=with_cuda)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1404, in _write_ninja_file_and_compile_objects
error_prefix='Error compiling objects for extension')
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

Issue in DETRsegm

First of all, thank you very much for the great work.

My question is..

NotImplementError is raised when executing "self.detr.input_proj(src)" in code in DETRsegm class.
"self.detr.transformer(src_proj, mask, self.detr.query_embed.weight, pos[-1])"

In the code above, self.detr.query_embed is a class variable that is only initialized when use_dab=False.
I'm just curious if this is what you meant by implementation.

Compiling CUDA operators error

when try to compile CUDA operator I get this errors:

C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(97): error: identifier "__floorf" is undefined in device code

C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(98): error: calling a host function("__floorf") from a device function("ms_deform_attn_col2im_bilinear ") is not allowed

C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(98): error: identifier "__floorf" is undefined in device code

C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(172): error: calling a host function("__floorf") from a device function("ms_deform_attn_col2im_bilinear_gm ") is not allowed

C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(172): error: identifier "__floorf" is undefined in device code

C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(173): error: calling a host function("__floorf") from a device function("ms_deform_attn_col2im_bilinear_gm ") is not allowed

C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(173): error: identifier "__floorf" is undefined in device code

12 errors detected in the compilation of "C:/Users/Ali/AppData/Local/Temp/tmpxft_00002e2c_00000000-7_ms_deform_attn_cuda.cpp1.ii".
error: command 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\nvcc.exe' failed with exit code 1

Any suggestions......

Some questions about reproducing DAB-DETR

Hi, I'd like to reproduce DAB-DETR, and I have two questions about some technique details of DAB-DETR.

i. How do you initialize the learnable anchor boxes (results in Table 2)? Why not using the results in Table 8 (random initialization and fixing them in the first decoder layer) as default setting?

ii. I am confused about modulated positional attention in Section 4.4. Is it an improvement on "conditional cross attention" in Conditional DETR (split cross attention into two parts, content and spatial dot-products)? Does the proposed modulated positional attention add referenced w into spatial dot-products?

Some question about reference points!

Thank you for your excellent work! After I read the code about DAB-detr Decoder, I have some question about the reference points.
In the code of Decoder, "reference_points = new_reference_points.detach()". I am confused about why should use the detach operator. In my understanding, if using detach operator, the gradient won't backpropagated to the reference embedding, the gradient is cut-off.
Looking forward to your reply! Thank you!.

Bad testing results

I run the model with the default setting and got bad results

any suggestions .....

can i train dab-deformable-detr with multi-gpu in one machine?

i've tried to train dab-deformable-detr with multi-gpus in one ubuntu server by using 'torch.nn.Parallel', but a runtime error raised which is "Expected tensor for augment #1 'input' to have the same device as tensor for argument #2 'weight'; but device 0 does not equal 1(while checking arguments for cudnn_convolution)". Is this a bug? Or there is any other way to train dab-deformable-detr with multi-gpus?

Ask some questions about modulated HW attentions

I have some doubts in understanding the modulated HW attentions part of DAB-DETR. In line 238 of transformer.py, query_sine_embed has only intercepted the embedding of x and y. Shouldn't the 243 and 244 lines of modulat HW attentions be done for the embedding of w and h? Why do you still operate on the embedding of x and y? It's a bit confusing.

Is there something wrong with the code: line390-392 in deformable_transformer.py?

The original code is :

https://github.com/IDEA-opensource/DAB-DETR/blob/309f6ad92af7a62d7732c1bdf1e0c7a69a7bdaef/models/dab_deformable_detr/deformable_transformer.py#L390
https://github.com/IDEA-opensource/DAB-DETR/blob/309f6ad92af7a62d7732c1bdf1e0c7a69a7bdaef/models/dab_deformable_detr/deformable_transformer.py#L391
https://github.com/IDEA-opensource/DAB-DETR/blob/309f6ad92af7a62d7732c1bdf1e0c7a69a7bdaef/models/dab_deformable_detr/deformable_transformer.py#L392

I think the code "reference_points[: , : , None]" should be "reference_points[ : , None , : ]". Because the last dmension of "torch.cat([src_valid_ratios, src_valid_ratios], -1)[:, None]" has a length of 4 and means [w_ratio,h_ratio,w_ratio,h_ratio] for [x,y,w,h]，so the last dimension of "reference_points" should be [x,y,w,h].
However, in the case of "reference_points[: , : , None]" , the last dimension will be [x,x,x,x] or [y,y,y,y] 、 [w,w,w,w]、[h,h,h,h] after broadcast. Actually,in this case , the last dimension of "reference_points_input" has a length of 4, but reference_points_input[...,0] = reference_points_input[...,2]，reference_points_input[...,1] = reference_points_input[...,3]. That means the result is [xw_ratio,xh_ratio,xw_ratio,xh_ratio].
But what we want to get should be [x * w_ratio,y * h_ratio,w * w_ratio,h * h_ratio].

Is there something wrong with my understanding?

cat(y, x) in gen_sineembed_for_position()

Hi, first thanks for your great work to improve DETR-like detectors!
I have a question on the cat() operation when we want to get query pos embedding using func gen_sineembed_for_position() in file models/DAB_DETR/transformer.py line 51 and 61. Why here use

pos = torch.cat((pos_y, pos_x)...)

should x come first than y ?

Run test.py error

I use 8-node V100 and the environment is below:

Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

import torch
torch.version
'1.9.0+cu102'

Error info: cuda out of memory

test.py

True check_forward_equal_with_pytorch_double: max_abs_err 8.67e-19 max_rel_err 2.35e-16
True check_forward_equal_with_pytorch_float: max_abs_err 4.66e-10 max_rel_err 1.13e-07
True check_gradient_numerical(D=30)
True check_gradient_numerical(D=32)
True check_gradient_numerical(D=64)
True check_gradient_numerical(D=71)
True check_gradient_numerical(D=1025)
Traceback (most recent call last):
File "test.py", line 86, in
check_gradient_numerical(channels, True, True, True)
File "test.py", line 76, in check_gradient_numerical
gradok = gradcheck(func, (value.double(), shapes, level_start_index, sampling_locations.double(), attention_weights.double(), im2col_step))
File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 1245, in gradcheck
return _gradcheck_helper(**args)
File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 1258, in _gradcheck_helper
_gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 930, in _gradcheck_real_imag
gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 974, in _slow_gradcheck
analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 520, in _check_analytical_jacobian_attributes
jacobians1, types_ok, sizes_ok = _stack_and_check_tensors(vjps1, inputs, output_numel)
File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 461, in _stack_and_check_tensors
out_jacobians = _allocate_jacobians_with_inputs(inputs, numel_outputs)
File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 31, in _allocate_jacobians_with_inputs
out.append(t.new_zeros((t.numel(), numel_output), layout=torch.strided))
RuntimeError: CUDA out of memory. Tried to allocate 7.50 GiB (GPU 0; 31.75 GiB total capacity; 22.52 GiB already allocated; 6.90 GiB free; 23.52 GiB reserved in total by PyTorch)

Fine-Tuning on own dataset

Hi there - I'm trying to finetune on my own dataset (2 classes) and I'd like to know what params from the trained model I should remove besides these ones:
del checkpoint['model']['class_embed.0.weight']
del checkpoint['model']['class_embed.0.bias']
del checkpoint['model']['class_embed.1.weight']
del checkpoint['model']['class_embed.1.bias']
del checkpoint['model']['class_embed.2.weight']
del checkpoint['model']['class_embed.2.bias']
del checkpoint['model']['class_embed.3.weight']
del checkpoint['model']['class_embed.3.bias']
del checkpoint['model']['class_embed.4.weight']
del checkpoint['model']['class_embed.4.bias']
del checkpoint['model']['class_embed.5.weight']
del checkpoint['model']['class_embed.5.bias']

There's something else I need to change as I'm receiving the following error:
: RuntimeErrorThe size of tensor a (91) must match the size of tensor b (2) at non-singleton dimension 0
: The size of tensor a (91) must match the size of tensor b (2) at non-singleton dimension 0
exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
RuntimeError: The size of tensor a (91) must match the size of tensor b (2) at non-singleton dimension 0

Any hint? Thanks!

Has the 'look forward' strategy been used in DAB-DETR?

Thanks for your work!

It seems that the 'look forward twice' strategy mentioned in DINO has been implemented in DAB DETR, even in Deformable DETR.
Because I notice that the iterative regression operation is implemented in both dab_deformable_deter.py and deformable_transformer.py.

Is there anything wrong with my understanding?

Looking forward to your reply!

question about two_satge:

Thank you for your excellent work,
What I want to know is have you used the two-stage strategy when training DAB-Deformable-DETR? For DAB-Deformable-DETR, does it give a performance boost?

train on VOCdataset

Can I use your project to train on my own VOC dataset?

Attention weights visualization

Thank you very much for your work, how can I visualize the selfattention part? For example the outputs of encoder or the inputs of crossattention?

Seems --lr_drop does not work during training.

swim_transformer

Can the inference test script be released?

Can the inference test script be released?
Further, release scripts that output json format.

thanks.

ImportError: cannot import name 'test' from 'engine'

It seems that 'engine.py' doesn't conclude the test function.

Understanding the role of refpoint_embed

I am having some trouble understanding the role of refpoint_embed under DABDeformableDETR module. Particularly what are the 4 values represent. Do they represent x,y,w,h or correspond to the 4 levels of the input feature maps? Because from line 391 in models/dab_deformanle_detr/deformable_transformer.py(link) the four values seem to be multiplied with valid ratios from four levels and broadcast along the xywh dimension. Also, when they are inputted into deformable attention, the order of dimension indicate that the 4 values correspond to 4 levels. On the other hand, when random_refpoints_xy is used, the first two values seem to represent xy instead? It's a bit confusing.

Architecture Changes From DINO-DETR

Hello,

Thanks for the great work.

I wonder that the architecture changes such as two stage structure of DAB-Deformable-DETR (mixed query selection) which is implemented on DINO-DETR paperwork, will be also released in this repo or only in DINO-DETR page?

Additional question is that will the DINO-DETR github repo will be based upon this repo, or will there be some drastic changes? If we start from this repo, can it be easy to move the changes of DINO training and model modifications into this repo when DINO page is released?

Thank you so much for your responses in advance.

Swim-Transformer Pre-trained Weights

Dear Shilong,

Thanks for the amazing work again! In your codes, it supports swim-transformer as the backbone, but I could not find its ImageNet pre-trained weights. Could you please kindly provide support?

Thank you in advance!

question about modulated attention

when you visualize Figure7. Positional attention maps with different temperatures, why don't your figure have period ?
Since sinusoidal embedding has sin/cos func, does should it have some repeated shapes?

Training problems of AP and AR.

Sorry to bother you! During training, I had a problem.

Training parameters:

batch_size = 1,
epochs = 50,
lr_drop = 40,
modelname = 'dab_detr',
num_workers = 6,

Dataset(tiny-coco):

10% of the entire coco2017

Problem:

So far, I've trained 20 epochs. But the results of AP and AR in every epoch are the same, and the results are as follows:

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.003

And the average statistics for the 20th epoch are:

class_error: 88.89  loss: 16.2750 (16.8940)  loss_ce: 0.8517 (0.9079)  loss_bbox: 0.7610 (0.8277)  loss_giou: 1.1219 (1.0805)  loss_ce_0: 0.8508 (0.9079)  loss_bbox_0: 0.7550 (0.8284)  loss_giou_0: 1.1278 (1.0790)  loss_ce_1: 0.8519 (0.9078)  loss_bbox_1: 0.7502 (0.8281)  loss_giou_1: 1.1264 (1.0795)  loss_ce_2: 0.8520 (0.9078)  loss_bbox_2: 0.7614 (0.8283)  loss_giou_2: 1.1250 (1.0794)  loss_ce_3: 0.8514 (0.9080)  loss_bbox_3: 0.7613 (0.8279)  loss_giou_3: 1.1236 (1.0799)  loss_ce_4: 0.8517 (0.9079)  loss_bbox_4: 0.7611 (0.8278)  loss_giou_4: 1.1225 (1.0802)  loss_ce_unscaled: 0.8517 (0.9079)  class_error_unscaled: 80.0000 (77.4109)  loss_bbox_unscaled: 0.1522 (0.1655)  loss_giou_unscaled: 0.5610 (0.5403)  loss_xy_unscaled: 0.0560 (0.0604)  loss_hw_unscaled: 0.0998 (0.1051)  cardinality_error_unscaled: 293.0000 (293.0660)  loss_ce_0_unscaled: 0.8508 (0.9079)  loss_bbox_0_unscaled: 0.1510 (0.1657)  loss_giou_0_unscaled: 0.5639 (0.5395)  loss_xy_0_unscaled: 0.0561 (0.0605)  loss_hw_0_unscaled: 0.1034 (0.1052)  cardinality_error_0_unscaled: 293.0000 (293.0660)  loss_ce_1_unscaled: 0.8519 (0.9078)  loss_bbox_1_unscaled: 0.1500 (0.1656)  loss_giou_1_unscaled: 0.5632 (0.5397)  loss_xy_1_unscaled: 0.0560 (0.0605)  loss_hw_1_unscaled: 0.1029 (0.1051)  cardinality_error_1_unscaled: 293.0000 (293.0660)  loss_ce_2_unscaled: 0.8520 (0.9078)  loss_bbox_2_unscaled: 0.1523 (0.1657)  loss_giou_2_unscaled: 0.5625 (0.5397)  loss_xy_2_unscaled: 0.0560 (0.0605)  loss_hw_2_unscaled: 0.1022 (0.1052)  cardinality_error_2_unscaled: 293.0000 (293.0660)  loss_ce_3_unscaled: 0.8514 (0.9080)  loss_bbox_3_unscaled: 0.1523 (0.1656)  loss_giou_3_unscaled: 0.5618 (0.5400)  loss_xy_3_unscaled: 0.0560 (0.0604)  loss_hw_3_unscaled: 0.1014 (0.1052)  cardinality_error_3_unscaled: 293.0000 (293.0660)  loss_ce_4_unscaled: 0.8517 (0.9079)  loss_bbox_4_unscaled: 0.1522 (0.1656)  loss_giou_4_unscaled: 0.5612 (0.5401)  loss_xy_4_unscaled: 0.0560 (0.0604)  loss_hw_4_unscaled: 0.1006 (0.1052)  cardinality_error_4_unscaled: 293.0000 (293.0660)

So what could be the problem? Dataset? Training parameters? Or something else?
Thank you!

Why model performance is better than expected?

Instead of 42.2, I got 43.1 AP on the default setup with your pre-trained weights. That's a big surprise! :) May I know the reason?

'self.query_scale' before each transformer encoder layer

Thanks for your great work. I notice a difference between dab-detr and conditional detr where there is a MLP defined as 'self.query_scale' before each transformer encoder layer. Does this operation have a description in the paper or other reference paper to explain its effect?

The dropout rate

Thanks for your great work.
However, after reading the code, I'm confused about the setting of dropout.

The dropout rates used in the encoder and decoder layers of DAB-DETR and DAB-Deformable-DETR are set at 0.0 as default.
https://github.com/IDEA-opensource/DAB-DETR/blob/2a096e2d59fc804b20dd6da78b504654647107c7/main.py#L79-L80
https://github.com/IDEA-opensource/DAB-DETR/blob/2a096e2d59fc804b20dd6da78b504654647107c7/models/DAB_DETR/transformer.py#L459-L462
https://github.com/IDEA-opensource/DAB-DETR/blob/2a096e2d59fc804b20dd6da78b504654647107c7/models/dab_deformable_detr/deformable_transformer.py#L449-L456
DETR and Deformable-DETR use 0.1 as default, does this means training DAB-DETR or DAB-Deformable-DETR without using dropout can get better performance?

The feature dim mismatch?

https://github.com/IDEA-opensource/DAB-DETR/blob/309f6ad92af7a62d7732c1bdf1e0c7a69a7bdaef/models/DAB_DETR/transformer.py#L446
In this place, the last dim of tgt2 is 2*256, while the tgt's last dim is 256, so it will cause a mismatch error?

How to calculate flops

Hi! thanks for your excellent work, I'm wondering how to evaluate the flops of DAB-DETR model?
I can't directly use the DETR script which will get AssertionError in jit_handles.py.
facebookresearch/detr#110
Could you pls share your python script?

CUDA out of memory on test.py

I'm following the installation guide. When running test.py on step 4, I got RuntimeError: CUDA out of memory. Is it okay to proceed (using smaller batch on training or inference), or will it have any effect on the performance?

* True check_forward_equal_with_pytorch_double: max_abs_err 8.67e-19 max_rel_err 2.35e-16
* True check_forward_equal_with_pytorch_float: max_abs_err 4.66e-10 max_rel_err 1.13e-07
* True check_gradient_numerical(D=30)
* True check_gradient_numerical(D=32)
* True check_gradient_numerical(D=64)
* True check_gradient_numerical(D=71)
* True check_gradient_numerical(D=1025)
Traceback (most recent call last):
  File "/home/azureuser/WilliamJustin/DAB-DETR/models/dab_deformable_detr/ops/test.py", line 86, in <module>
    check_gradient_numerical(channels, True, True, True)
  File "/home/azureuser/WilliamJustin/DAB-DETR/models/dab_deformable_detr/ops/test.py", line 76, in check_gradient_numerical
    gradok = gradcheck(func, (value.double(), shapes, level_start_index, sampling_locations.double(), attention_weights.double(), im2col_step))
  File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 1400, in gradcheck
    return _gradcheck_helper(**args)
  File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 1414, in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
  File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 1061, in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
  File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 1097, in _slow_gradcheck
    numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
  File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 146, in _get_numerical_jacobian
    jacobians += [get_numerical_jacobian_wrt_specific_input(fn, inp_idx, inputs, outputs, eps,
  File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 290, in get_numerical_jacobian_wrt_specific_input
    return _combine_jacobian_cols(jacobian_cols, outputs, input, input.numel())
  File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 230, in _combine_jacobian_cols
    jacobians = _allocate_jacobians_with_outputs(outputs, numel, dtype=input.dtype if input.dtype.is_complex else None)
  File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 45, in _allocate_jacobians_with_outputs
    out.append(t.new_zeros((numel_input, t.numel()), **options))
RuntimeError: CUDA out of memory. Tried to allocate 7.50 GiB (GPU 0; 15.75 GiB total capacity; 7.50 GiB already allocated; 7.30 GiB free; 7.50 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

About parameter bbox_embed_diff_each_layer in DABDETR.py.

model/DAB_DETR/DABDETR.py line 85 should be
"bbox_embed_diff_each_layer: dont share weights of prediction heads. Default for False.(shared weights.)"
rather than the origin "Default for True". So it consistent with line 72.
Am I right?

Approximate time to release codes and models

Thanks for your great works! I am curious of the code release time. Can you please provide an approximate time? Thanks.

Why does the model predict bbox offset twice?

The model predicts bbox offsets twice,
in DAB-DETR/blob/main/models/DAB_DETR/transformer.py, Line 255-265,
in DAB-DETR/blob/main/models/DAB_DETR/DABDETR.py, Line 171-184.
The difference is that the input of the second predict is normed.
The second predict seems unnecessary.
Am I right?

A question about attention weight calculation

Hi~ Thanks for your excellent work! I'm confused about an operation about attention weight calculation.

In the implementation of the attention, there is a small modification, which i have not found in the paper.

The code is:

# previous choise of conditional detr and nn.MultiheadAttention
attn_output_weights = softmax(attn_output_weights, dim=-1)

# DAB-DETR modified this line:
attn_output_weights = softmax(attn_output_weights - attn_output_weights.max(dim=-1, keepdim=True)[0], dim=-1)

Whether or not this procedure refers to some previous studies, which i have not been read.
Will doing this improve the performance?

visualizing results

Thanks for your great work.
After testing the code, how can I print the image results with bounding box?
Is there any command or code?