GithubHelp home page GithubHelp logo

idea-research / dab-detr Goto Github PK

View Code? Open in Web Editor NEW
499.0 499.0 84.0 12.15 MB

[ICLR 2022] Official implementation of the paper "DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR"

License: Apache License 2.0

Python 18.97% Shell 0.06% C++ 0.30% Cuda 2.99% Jupyter Notebook 77.68%
dab-detr detection detr transformer

dab-detr's People

Contributors

developer0hye avatar rentainhe avatar seungyonglee0802 avatar slongliu avatar whatchang avatar xu-justin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dab-detr's Issues

ONNX model generation

Can you please convert your model into the ONNX model? I want to test it on tensor rt for inferencing.
I am trying to convert it to the ONNX model but getting the following error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0, and cpu! (when checking argument for argument index in method wrapper__index_select)

Num_patterns in dab deformable detr

Thanks for the great contribution to the literature.

I have a quick question. Is there a specific reason for not using pattern embedding in dab-deformable-detr? Is pattern embedding not applicable to deformable structure? Maybe it might not increase the performance of dab-detr but I havent seen any ablation study related with that.

Thanks in advance

PostProcess

Is it possible that multiple labels of a query are topks during post-processing? How to solve this situation?

Temperature tuning and positional embeddings for DAB-DETR

In the paper, there is a section saying the optimal temperature for positional embedding is 20 in your model. However, this line under gen_sineembed_for_position indicates that a value of 10000 is used for the temperature. Is there any part I missed when I am trying to understand the codes?

Besides, the paper also says that only x and y coordinates are used to generate positional embedding for the cross-attention, but this line, despite commenting as num_queries x batch_size x 2, actually operates on num_queries x batch_size x 4 if printing out the tensor shape. Does this perform better than only using x&y or they are similar in performance?

Question about some configuration in DAB-Deformable DETR.

Thanks for your great work. I have some questions about the implementation of DAB-Deformable DETR.

  1. In DAB-DETR the position embedding is sinehw, while in DAB-Deformable-DETR it uses the original sine. Is there any reason for this difference?
  2. I found the configuration uses a larger dim_feedforward=2048. How does it performance with 1024?
  3. Have you experimented with the two-stage setting in Deformable-DETR. Could you share the results?

About plot_logs

Hello, thanks for your great work!

When I finish training and get log.txt, I want to visualize it using plot_logs, as follows:

image

But I get an ERROR on this line:

https://github.com/IDEA-opensource/DAB-DETR/blob/11d1948565b71e6622d01eccca0b9f08022cdc82/util/plot_utils.py#L65

Traceback (most recent call last):
  File "H:/yjs/code/DAB-DETR-main/tmp.py", line 53, in <module>
    fig, axs = plot_logs(log_path)
  File "H:\yjs\code\DAB-DETR-main\util\plot_utils.py", line 65, in plot_logs
    df.interpolate().ewm(com=ewm_col).mean().plot(
  File "H:\Anaconda\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "H:\Anaconda\lib\site-packages\pandas\core\frame.py", line 10712, in interpolate
    return super().interpolate(
  File "H:\Anaconda\lib\site-packages\pandas\core\generic.py", line 6899, in interpolate
    new_data = obj._mgr.interpolate(
  File "H:\Anaconda\lib\site-packages\pandas\core\internals\managers.py", line 377, in interpolate
    return self.apply("interpolate", **kwargs)
  File "H:\Anaconda\lib\site-packages\pandas\core\internals\managers.py", line 327, in apply
    applied = getattr(b, f)(**kwargs)
  File "H:\Anaconda\lib\site-packages\pandas\core\internals\blocks.py", line 1369, in interpolate
    new_values = values.fillna(value=fill_value, method=method, limit=limit)
  File "H:\Anaconda\lib\site-packages\pandas\core\arrays\_mixins.py", line 218, in fillna
    value, method = validate_fillna_kwargs(
  File "H:\Anaconda\lib\site-packages\pandas\util\_validators.py", line 372, in validate_fillna_kwargs
    method = clean_fill_method(method)
  File "H:\Anaconda\lib\site-packages\pandas\core\missing.py", line 120, in clean_fill_method
    raise ValueError(f"Invalid fill method. Expecting {expecting}. Got {method}")
ValueError: Invalid fill method. Expecting pad (ffill) or backfill (bfill). Got linear

My version of pandas is 1.3.5

I don't know if I am using it in a wrong way or it is a bug in pandas, how can I fix it?

nvcc fatal : Unsupported gpu architecture 'compute_86'

When I setup deformable multi head attention, report the error as below:

nvcc fatal : Unsupported gpu architecture 'compute_86'
Traceback (most recent call last):
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1723, in _run_ninja_build
env=env)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "setup.py", line 72, in
cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension},
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 735, in build_extensions
build_ext.build_extensions(self)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
_build_ext.build_extension(self, ext)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/distutils/command/build_ext.py", line 534, in build_extension
depends=ext.depends)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 565, in unix_wrap_ninja_compile
with_cuda=with_cuda)
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1404, in _write_ninja_file_and_compile_objects
error_prefix='Error compiling objects for extension')
File "/data/queenie_liu/anaconda3/envs/mmTrans/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

Issue in DETRsegm

First of all, thank you very much for the great work.

My question is..

  1. NotImplementError is raised when executing "self.detr.input_proj(src)" in code in DETRsegm class.

  2. "self.detr.transformer(src_proj, mask, self.detr.query_embed.weight, pos[-1])"

In the code above, self.detr.query_embed is a class variable that is only initialized when use_dab=False.
I'm just curious if this is what you meant by implementation.

Compiling CUDA operators error

when try to compile CUDA operator I get this errors:

C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(97): error: identifier "__floorf" is undefined in device code

C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(98): error: calling a host function("__floorf") from a device function("ms_deform_attn_col2im_bilinear ") is not allowed

C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(98): error: identifier "__floorf" is undefined in device code

C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(172): error: calling a host function("__floorf") from a device function("ms_deform_attn_col2im_bilinear_gm ") is not allowed

C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(172): error: identifier "__floorf" is undefined in device code

C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(173): error: calling a host function("__floorf") from a device function("ms_deform_attn_col2im_bilinear_gm ") is not allowed

C:/Users/Ali/DAB-DETR-main/models/dab_deformable_detr/ops/src\cuda/ms_deform_im2col_cuda.cuh(173): error: identifier "__floorf" is undefined in device code

12 errors detected in the compilation of "C:/Users/Ali/AppData/Local/Temp/tmpxft_00002e2c_00000000-7_ms_deform_attn_cuda.cpp1.ii".
error: command 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\nvcc.exe' failed with exit code 1

Any suggestions......

Some questions about reproducing DAB-DETR

Hi, I'd like to reproduce DAB-DETR, and I have two questions about some technique details of DAB-DETR.

i. How do you initialize the learnable anchor boxes (results in Table 2)? Why not using the results in Table 8 (random initialization and fixing them in the first decoder layer) as default setting?

ii. I am confused about modulated positional attention in Section 4.4. Is it an improvement on "conditional cross attention" in Conditional DETR (split cross attention into two parts, content and spatial dot-products)? Does the proposed modulated positional attention add referenced w into spatial dot-products?

Some question about reference points!

Thank you for your excellent work! After I read the code about DAB-detr Decoder, I have some question about the reference points.
In the code of Decoder, "reference_points = new_reference_points.detach()". I am confused about why should use the detach operator. In my understanding, if using detach operator, the gradient won't backpropagated to the reference embedding, the gradient is cut-off.
Looking forward to your reply! Thank you!.

Bad testing results

I run the model with the default setting and got bad results

any suggestions .....

image

image

can i train dab-deformable-detr with multi-gpu in one machine?

i've tried to train dab-deformable-detr with multi-gpus in one ubuntu server by using 'torch.nn.Parallel', but a runtime error raised which is "Expected tensor for augment #1 'input' to have the same device as tensor for argument #2 'weight'; but device 0 does not equal 1(while checking arguments for cudnn_convolution)". Is this a bug? Or there is any other way to train dab-deformable-detr with multi-gpus?

Ask some questions about modulated HW attentions

I have some doubts in understanding the modulated HW attentions part of DAB-DETR. In line 238 of transformer.py, query_sine_embed has only intercepted the embedding of x and y. Shouldn't the 243 and 244 lines of modulat HW attentions be done for the embedding of w and h? Why do you still operate on the embedding of x and y? It's a bit confusing.

Is there something wrong with the code: line390-392 in deformable_transformer.py?

The original code is :

https://github.com/IDEA-opensource/DAB-DETR/blob/309f6ad92af7a62d7732c1bdf1e0c7a69a7bdaef/models/dab_deformable_detr/deformable_transformer.py#L390
https://github.com/IDEA-opensource/DAB-DETR/blob/309f6ad92af7a62d7732c1bdf1e0c7a69a7bdaef/models/dab_deformable_detr/deformable_transformer.py#L391
https://github.com/IDEA-opensource/DAB-DETR/blob/309f6ad92af7a62d7732c1bdf1e0c7a69a7bdaef/models/dab_deformable_detr/deformable_transformer.py#L392

I think the code "reference_points[: , : , None]" should be "reference_points[ : , None , : ]". Because the last dmension of "torch.cat([src_valid_ratios, src_valid_ratios], -1)[:, None]" has a length of 4 and means [w_ratio,h_ratio,w_ratio,h_ratio] for [x,y,w,h],so the last dimension of "reference_points" should be [x,y,w,h].
However, in the case of "reference_points[: , : , None]" , the last dimension will be [x,x,x,x] or [y,y,y,y] 、 [w,w,w,w]、[h,h,h,h] after broadcast. Actually,in this case , the last dimension of "reference_points_input" has a length of 4, but reference_points_input[...,0] = reference_points_input[...,2],reference_points_input[...,1] = reference_points_input[...,3]. That means the result is [xw_ratio,xh_ratio,xw_ratio,xh_ratio].
But what we want to get should be [x * w_ratio,y * h_ratio,w * w_ratio,h * h_ratio].

Is there something wrong with my understanding?

cat(y, x) in gen_sineembed_for_position()

Hi, first thanks for your great work to improve DETR-like detectors!
I have a question on the cat() operation when we want to get query pos embedding using func gen_sineembed_for_position() in file models/DAB_DETR/transformer.py line 51 and 61. Why here use

pos = torch.cat((pos_y, pos_x)...)

should x come first than y ?

Run test.py error

I use 8-node V100 and the environment is below:

Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

import torch
torch.version
'1.9.0+cu102'

Error info: cuda out of memory

test.py

  • True check_forward_equal_with_pytorch_double: max_abs_err 8.67e-19 max_rel_err 2.35e-16
  • True check_forward_equal_with_pytorch_float: max_abs_err 4.66e-10 max_rel_err 1.13e-07
  • True check_gradient_numerical(D=30)
  • True check_gradient_numerical(D=32)
  • True check_gradient_numerical(D=64)
  • True check_gradient_numerical(D=71)
  • True check_gradient_numerical(D=1025)
    Traceback (most recent call last):
    File "test.py", line 86, in
    check_gradient_numerical(channels, True, True, True)
    File "test.py", line 76, in check_gradient_numerical
    gradok = gradcheck(func, (value.double(), shapes, level_start_index, sampling_locations.double(), attention_weights.double(), im2col_step))
    File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 1245, in gradcheck
    return _gradcheck_helper(**args)
    File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 1258, in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
    File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 930, in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
    File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 974, in _slow_gradcheck
    analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 520, in _check_analytical_jacobian_attributes
    jacobians1, types_ok, sizes_ok = _stack_and_check_tensors(vjps1, inputs, output_numel)
    File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 461, in _stack_and_check_tensors
    out_jacobians = _allocate_jacobians_with_inputs(inputs, numel_outputs)
    File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 31, in _allocate_jacobians_with_inputs
    out.append(t.new_zeros((t.numel(), numel_output), layout=torch.strided))
    RuntimeError: CUDA out of memory. Tried to allocate 7.50 GiB (GPU 0; 31.75 GiB total capacity; 22.52 GiB already allocated; 6.90 GiB free; 23.52 GiB reserved in total by PyTorch)

Fine-Tuning on own dataset

Hi there - I'm trying to finetune on my own dataset (2 classes) and I'd like to know what params from the trained model I should remove besides these ones:
del checkpoint['model']['class_embed.0.weight']
del checkpoint['model']['class_embed.0.bias']
del checkpoint['model']['class_embed.1.weight']
del checkpoint['model']['class_embed.1.bias']
del checkpoint['model']['class_embed.2.weight']
del checkpoint['model']['class_embed.2.bias']
del checkpoint['model']['class_embed.3.weight']
del checkpoint['model']['class_embed.3.bias']
del checkpoint['model']['class_embed.4.weight']
del checkpoint['model']['class_embed.4.bias']
del checkpoint['model']['class_embed.5.weight']
del checkpoint['model']['class_embed.5.bias']

There's something else I need to change as I'm receiving the following error:
: RuntimeErrorThe size of tensor a (91) must match the size of tensor b (2) at non-singleton dimension 0
: The size of tensor a (91) must match the size of tensor b (2) at non-singleton dimension 0
exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
RuntimeError: The size of tensor a (91) must match the size of tensor b (2) at non-singleton dimension 0

Any hint? Thanks!

Has the 'look forward' strategy been used in DAB-DETR?

Thanks for your work!

It seems that the 'look forward twice' strategy mentioned in DINO has been implemented in DAB DETR, even in Deformable DETR.
Because I notice that the iterative regression operation is implemented in both dab_deformable_deter.py and deformable_transformer.py.

Is there anything wrong with my understanding?

Looking forward to your reply!

question about two_satge:

Thank you for your excellent work,
What I want to know is have you used the two-stage strategy when training DAB-Deformable-DETR? For DAB-Deformable-DETR, does it give a performance boost?

Attention weights visualization

Thank you very much for your work, how can I visualize the selfattention part? For example the outputs of encoder or the inputs of crossattention?

Understanding the role of refpoint_embed

I am having some trouble understanding the role of refpoint_embed under DABDeformableDETR module. Particularly what are the 4 values represent. Do they represent x,y,w,h or correspond to the 4 levels of the input feature maps? Because from line 391 in models/dab_deformanle_detr/deformable_transformer.py(link) the four values seem to be multiplied with valid ratios from four levels and broadcast along the xywh dimension. Also, when they are inputted into deformable attention, the order of dimension indicate that the 4 values correspond to 4 levels. On the other hand, when random_refpoints_xy is used, the first two values seem to represent xy instead? It's a bit confusing.

Architecture Changes From DINO-DETR

Hello,

Thanks for the great work.

I wonder that the architecture changes such as two stage structure of DAB-Deformable-DETR (mixed query selection) which is implemented on DINO-DETR paperwork, will be also released in this repo or only in DINO-DETR page?

Additional question is that will the DINO-DETR github repo will be based upon this repo, or will there be some drastic changes? If we start from this repo, can it be easy to move the changes of DINO training and model modifications into this repo when DINO page is released?

Thank you so much for your responses in advance.

Swim-Transformer Pre-trained Weights

Dear Shilong,

Thanks for the amazing work again! In your codes, it supports swim-transformer as the backbone, but I could not find its ImageNet pre-trained weights. Could you please kindly provide support?

Thank you in advance!

question about modulated attention

when you visualize Figure7. Positional attention maps with different temperatures, why don't your figure have period ?
Since sinusoidal embedding has sin/cos func, does should it have some repeated shapes?

Training problems of AP and AR.

Sorry to bother you! During training, I had a problem.

Training parameters:

batch_size = 1,
epochs = 50,
lr_drop = 40,
modelname = 'dab_detr',
num_workers = 6,

Dataset(tiny-coco):

10% of the entire coco2017

Problem:

So far, I've trained 20 epochs. But the results of AP and AR in every epoch are the same, and the results are as follows:

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.003

And the average statistics for the 20th epoch are:

class_error: 88.89  loss: 16.2750 (16.8940)  loss_ce: 0.8517 (0.9079)  loss_bbox: 0.7610 (0.8277)  loss_giou: 1.1219 (1.0805)  loss_ce_0: 0.8508 (0.9079)  loss_bbox_0: 0.7550 (0.8284)  loss_giou_0: 1.1278 (1.0790)  loss_ce_1: 0.8519 (0.9078)  loss_bbox_1: 0.7502 (0.8281)  loss_giou_1: 1.1264 (1.0795)  loss_ce_2: 0.8520 (0.9078)  loss_bbox_2: 0.7614 (0.8283)  loss_giou_2: 1.1250 (1.0794)  loss_ce_3: 0.8514 (0.9080)  loss_bbox_3: 0.7613 (0.8279)  loss_giou_3: 1.1236 (1.0799)  loss_ce_4: 0.8517 (0.9079)  loss_bbox_4: 0.7611 (0.8278)  loss_giou_4: 1.1225 (1.0802)  loss_ce_unscaled: 0.8517 (0.9079)  class_error_unscaled: 80.0000 (77.4109)  loss_bbox_unscaled: 0.1522 (0.1655)  loss_giou_unscaled: 0.5610 (0.5403)  loss_xy_unscaled: 0.0560 (0.0604)  loss_hw_unscaled: 0.0998 (0.1051)  cardinality_error_unscaled: 293.0000 (293.0660)  loss_ce_0_unscaled: 0.8508 (0.9079)  loss_bbox_0_unscaled: 0.1510 (0.1657)  loss_giou_0_unscaled: 0.5639 (0.5395)  loss_xy_0_unscaled: 0.0561 (0.0605)  loss_hw_0_unscaled: 0.1034 (0.1052)  cardinality_error_0_unscaled: 293.0000 (293.0660)  loss_ce_1_unscaled: 0.8519 (0.9078)  loss_bbox_1_unscaled: 0.1500 (0.1656)  loss_giou_1_unscaled: 0.5632 (0.5397)  loss_xy_1_unscaled: 0.0560 (0.0605)  loss_hw_1_unscaled: 0.1029 (0.1051)  cardinality_error_1_unscaled: 293.0000 (293.0660)  loss_ce_2_unscaled: 0.8520 (0.9078)  loss_bbox_2_unscaled: 0.1523 (0.1657)  loss_giou_2_unscaled: 0.5625 (0.5397)  loss_xy_2_unscaled: 0.0560 (0.0605)  loss_hw_2_unscaled: 0.1022 (0.1052)  cardinality_error_2_unscaled: 293.0000 (293.0660)  loss_ce_3_unscaled: 0.8514 (0.9080)  loss_bbox_3_unscaled: 0.1523 (0.1656)  loss_giou_3_unscaled: 0.5618 (0.5400)  loss_xy_3_unscaled: 0.0560 (0.0604)  loss_hw_3_unscaled: 0.1014 (0.1052)  cardinality_error_3_unscaled: 293.0000 (293.0660)  loss_ce_4_unscaled: 0.8517 (0.9079)  loss_bbox_4_unscaled: 0.1522 (0.1656)  loss_giou_4_unscaled: 0.5612 (0.5401)  loss_xy_4_unscaled: 0.0560 (0.0604)  loss_hw_4_unscaled: 0.1006 (0.1052)  cardinality_error_4_unscaled: 293.0000 (293.0660)

So what could be the problem? Dataset? Training parameters? Or something else?
Thank you!

'self.query_scale' before each transformer encoder layer

Thanks for your great work. I notice a difference between dab-detr and conditional detr where there is a MLP defined as 'self.query_scale' before each transformer encoder layer. Does this operation have a description in the paper or other reference paper to explain its effect?

The dropout rate

Thanks for your great work.
However, after reading the code, I'm confused about the setting of dropout.

The dropout rates used in the encoder and decoder layers of DAB-DETR and DAB-Deformable-DETR are set at 0.0 as default.
https://github.com/IDEA-opensource/DAB-DETR/blob/2a096e2d59fc804b20dd6da78b504654647107c7/main.py#L79-L80
https://github.com/IDEA-opensource/DAB-DETR/blob/2a096e2d59fc804b20dd6da78b504654647107c7/models/DAB_DETR/transformer.py#L459-L462
https://github.com/IDEA-opensource/DAB-DETR/blob/2a096e2d59fc804b20dd6da78b504654647107c7/models/dab_deformable_detr/deformable_transformer.py#L449-L456
DETR and Deformable-DETR use 0.1 as default, does this means training DAB-DETR or DAB-Deformable-DETR without using dropout can get better performance?

How to calculate flops

Hi! thanks for your excellent work, I'm wondering how to evaluate the flops of DAB-DETR model?
I can't directly use the DETR script which will get AssertionError in jit_handles.py.
facebookresearch/detr#110
Could you pls share your python script?

CUDA out of memory on test.py

I'm following the installation guide. When running test.py on step 4, I got RuntimeError: CUDA out of memory. Is it okay to proceed (using smaller batch on training or inference), or will it have any effect on the performance?

* True check_forward_equal_with_pytorch_double: max_abs_err 8.67e-19 max_rel_err 2.35e-16
* True check_forward_equal_with_pytorch_float: max_abs_err 4.66e-10 max_rel_err 1.13e-07
* True check_gradient_numerical(D=30)
* True check_gradient_numerical(D=32)
* True check_gradient_numerical(D=64)
* True check_gradient_numerical(D=71)
* True check_gradient_numerical(D=1025)
Traceback (most recent call last):
  File "/home/azureuser/WilliamJustin/DAB-DETR/models/dab_deformable_detr/ops/test.py", line 86, in <module>
    check_gradient_numerical(channels, True, True, True)
  File "/home/azureuser/WilliamJustin/DAB-DETR/models/dab_deformable_detr/ops/test.py", line 76, in check_gradient_numerical
    gradok = gradcheck(func, (value.double(), shapes, level_start_index, sampling_locations.double(), attention_weights.double(), im2col_step))
  File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 1400, in gradcheck
    return _gradcheck_helper(**args)
  File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 1414, in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
  File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 1061, in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
  File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 1097, in _slow_gradcheck
    numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
  File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 146, in _get_numerical_jacobian
    jacobians += [get_numerical_jacobian_wrt_specific_input(fn, inp_idx, inputs, outputs, eps,
  File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 290, in get_numerical_jacobian_wrt_specific_input
    return _combine_jacobian_cols(jacobian_cols, outputs, input, input.numel())
  File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 230, in _combine_jacobian_cols
    jacobians = _allocate_jacobians_with_outputs(outputs, numel, dtype=input.dtype if input.dtype.is_complex else None)
  File "/home/azureuser/miniconda3/envs/jstnxu-DAB-DETR/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 45, in _allocate_jacobians_with_outputs
    out.append(t.new_zeros((numel_input, t.numel()), **options))
RuntimeError: CUDA out of memory. Tried to allocate 7.50 GiB (GPU 0; 15.75 GiB total capacity; 7.50 GiB already allocated; 7.30 GiB free; 7.50 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

About parameter bbox_embed_diff_each_layer in DABDETR.py.

model/DAB_DETR/DABDETR.py line 85 should be
"bbox_embed_diff_each_layer: dont share weights of prediction heads. Default for False.(shared weights.)"
rather than the origin "Default for True". So it consistent with line 72.
Am I right?

A question about attention weight calculation

Hi~ Thanks for your excellent work! I'm confused about an operation about attention weight calculation.

In the implementation of the attention, there is a small modification, which i have not found in the paper.

The code is:

# previous choise of conditional detr and nn.MultiheadAttention
attn_output_weights = softmax(attn_output_weights, dim=-1)

# DAB-DETR modified this line:
attn_output_weights = softmax(attn_output_weights - attn_output_weights.max(dim=-1, keepdim=True)[0], dim=-1)

Whether or not this procedure refers to some previous studies, which i have not been read.
Will doing this improve the performance?

visualizing results

Thanks for your great work.
After testing the code, how can I print the image results with bounding box?
Is there any command or code?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.