Some questions about QSAttn. about meta-detr HOT 8 OPEN

nhw649 commented on June 30, 2024

Some questions about QSAttn.

from meta-detr.

Comments (8)

nanfangAlan commented on June 30, 2024

You could find answers in Sec 4.2 and Table 8.

from meta-detr.

ZhangGongjie commented on June 30, 2024

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

from meta-detr.

nhw649 commented on June 30, 2024

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

When computing category codes, src is support feature, why do src, category_code and tsp need to do QSAttn? It's not mentioned in the paper. Cannot be seen from fig. 4.

In deformable_transformer.py:

def forward_supp_branch(self, src, pos, reference_points, spatial_shapes, level_start_index, padding_mask, tsp, support_boxes):
    # self attention
    src2 = self.self_attn(self.with_pos_embed(src, pos), reference_points, src, spatial_shapes, level_start_index, padding_mask)
    src = src + self.dropout1(src2)
    src = self.norm1(src)

    support_img_h, support_img_w = spatial_shapes[0, 0], spatial_shapes[0, 1]
    supp_roi = torchvision.ops.roi_align(
        src.transpose(1, 2).reshape(src.shape[0], -1, support_img_h, support_img_w),
        support_boxes,
        output_size=(7, 7),
        spatial_scale=1 / 32.0,
        aligned=True)

    supp_roi=supp_roi.mean(3).mean(2) # [episode_size, C]
    category_code = supp_roi.sigmoid() # [episode_size, C]

    if self.QSAttn:
        # siamese attention
        src, tsp = self.siamese_attn(src,
                                     inverse_sigmoid(category_code).unsqueeze(0).expand(src.shape[0], -1, -1),
                                     category_code.unsqueeze(0).expand(src.shape[0], -1, -1),
                                     tsp)

        # ffn
        src = self.forward_ffn(src + tsp)
    else:
        src = self.forward_ffn(src)

    return src, category_code

from meta-detr.

Yuxuan-W commented on June 30, 2024

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

Sorry for my careless reading, the comment has been removed for clarification.

from meta-detr.

ZhangGongjie commented on June 30, 2024

@Yuxuan-W No problem. Any discussions are welcome.

@nhw649 Do allow me some time to address your concern, as I have a full-time job now, and I haven't read my own paper for a long period of time.

from meta-detr.

nhw649 commented on June 30, 2024

@Yuxuan-W No problem. Any discussions are welcome.

@nhw649 Do allow me some time to address your concern, as I have a full-time job now, and I haven't read my own paper for a long period of time.

Okay, please. Thank you.

from meta-detr.

Yuxuan-W commented on June 30, 2024

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

When computing category codes, src is support feature, why do src, category_code and tsp need to do QSAttn? It's not mentioned in the paper. Cannot be seen from fig. 4.

In deformable_transformer.py:

def forward_supp_branch(self, src, pos, reference_points, spatial_shapes, level_start_index, padding_mask, tsp, support_boxes):
    # self attention
    src2 = self.self_attn(self.with_pos_embed(src, pos), reference_points, src, spatial_shapes, level_start_index, padding_mask)
    src = src + self.dropout1(src2)
    src = self.norm1(src)

    support_img_h, support_img_w = spatial_shapes[0, 0], spatial_shapes[0, 1]
    supp_roi = torchvision.ops.roi_align(
        src.transpose(1, 2).reshape(src.shape[0], -1, support_img_h, support_img_w),
        support_boxes,
        output_size=(7, 7),
        spatial_scale=1 / 32.0,
        aligned=True)

    supp_roi=supp_roi.mean(3).mean(2) # [episode_size, C]
    category_code = supp_roi.sigmoid() # [episode_size, C]

    if self.QSAttn:
        # siamese attention
        src, tsp = self.siamese_attn(src,
                                     inverse_sigmoid(category_code).unsqueeze(0).expand(src.shape[0], -1, -1),
                                     category_code.unsqueeze(0).expand(src.shape[0], -1, -1),
                                     tsp)

        # ffn
        src = self.forward_ffn(src + tsp)
    else:
        src = self.forward_ffn(src)

    return src, category_code

In the DeformableTransformerEncoder, you will find self.QSAttn == True is only for the first layer. However, if you look at the category code computed for the first layer, namely category_code[0], you will find it is not involved in the siamese_attn. In another word, only the category_code[1:] are effected by siamese_attn, but they are never used.

So I guess it's just for convenience in implementation. There's no problem and it is aligned with the paper.

from meta-detr.

ZhangGongjie commented on June 30, 2024

I hope Yuxuan-W's comments have addressed your concerns. @nhw649 Thank you @Yuxuan-W !

from meta-detr.

Some questions about QSAttn. about meta-detr HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs