GithubHelp home page GithubHelp logo

Some questions about QSAttn. about meta-detr HOT 8 OPEN

nhw649 avatar nhw649 commented on June 30, 2024
Some questions about QSAttn.

from meta-detr.

Comments (8)

nanfangAlan avatar nanfangAlan commented on June 30, 2024

You could find answers in Sec 4.2 and Table 8.

from meta-detr.

ZhangGongjie avatar ZhangGongjie commented on June 30, 2024

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

from meta-detr.

nhw649 avatar nhw649 commented on June 30, 2024

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

When computing category codes, src is support feature, why do src, category_code and tsp need to do QSAttn? It's not mentioned in the paper. Cannot be seen from fig. 4.

In deformable_transformer.py:

def forward_supp_branch(self, src, pos, reference_points, spatial_shapes, level_start_index, padding_mask, tsp, support_boxes):
    # self attention
    src2 = self.self_attn(self.with_pos_embed(src, pos), reference_points, src, spatial_shapes, level_start_index, padding_mask)
    src = src + self.dropout1(src2)
    src = self.norm1(src)

    support_img_h, support_img_w = spatial_shapes[0, 0], spatial_shapes[0, 1]
    supp_roi = torchvision.ops.roi_align(
        src.transpose(1, 2).reshape(src.shape[0], -1, support_img_h, support_img_w),
        support_boxes,
        output_size=(7, 7),
        spatial_scale=1 / 32.0,
        aligned=True)

    supp_roi=supp_roi.mean(3).mean(2) # [episode_size, C]
    category_code = supp_roi.sigmoid() # [episode_size, C]

    if self.QSAttn:
        # siamese attention
        src, tsp = self.siamese_attn(src,
                                     inverse_sigmoid(category_code).unsqueeze(0).expand(src.shape[0], -1, -1),
                                     category_code.unsqueeze(0).expand(src.shape[0], -1, -1),
                                     tsp)

        # ffn
        src = self.forward_ffn(src + tsp)
    else:
        src = self.forward_ffn(src)

    return src, category_code

from meta-detr.

Yuxuan-W avatar Yuxuan-W commented on June 30, 2024

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

Sorry for my careless reading, the comment has been removed for clarification.

from meta-detr.

ZhangGongjie avatar ZhangGongjie commented on June 30, 2024

@Yuxuan-W No problem. Any discussions are welcome.

@nhw649 Do allow me some time to address your concern, as I have a full-time job now, and I haven't read my own paper for a long period of time.

from meta-detr.

nhw649 avatar nhw649 commented on June 30, 2024

@Yuxuan-W No problem. Any discussions are welcome.

@nhw649 Do allow me some time to address your concern, as I have a full-time job now, and I haven't read my own paper for a long period of time.

Okay, please. Thank you.

from meta-detr.

Yuxuan-W avatar Yuxuan-W commented on June 30, 2024

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

When computing category codes, src is support feature, why do src, category_code and tsp need to do QSAttn? It's not mentioned in the paper. Cannot be seen from fig. 4.

In deformable_transformer.py:

def forward_supp_branch(self, src, pos, reference_points, spatial_shapes, level_start_index, padding_mask, tsp, support_boxes):
    # self attention
    src2 = self.self_attn(self.with_pos_embed(src, pos), reference_points, src, spatial_shapes, level_start_index, padding_mask)
    src = src + self.dropout1(src2)
    src = self.norm1(src)

    support_img_h, support_img_w = spatial_shapes[0, 0], spatial_shapes[0, 1]
    supp_roi = torchvision.ops.roi_align(
        src.transpose(1, 2).reshape(src.shape[0], -1, support_img_h, support_img_w),
        support_boxes,
        output_size=(7, 7),
        spatial_scale=1 / 32.0,
        aligned=True)

    supp_roi=supp_roi.mean(3).mean(2) # [episode_size, C]
    category_code = supp_roi.sigmoid() # [episode_size, C]

    if self.QSAttn:
        # siamese attention
        src, tsp = self.siamese_attn(src,
                                     inverse_sigmoid(category_code).unsqueeze(0).expand(src.shape[0], -1, -1),
                                     category_code.unsqueeze(0).expand(src.shape[0], -1, -1),
                                     tsp)

        # ffn
        src = self.forward_ffn(src + tsp)
    else:
        src = self.forward_ffn(src)

    return src, category_code

In the DeformableTransformerEncoder, you will find self.QSAttn == True is only for the first layer. However, if you look at the category code computed for the first layer, namely category_code[0], you will find it is not involved in the siamese_attn. In another word, only the category_code[1:] are effected by siamese_attn, but they are never used.

So I guess it's just for convenience in implementation. There's no problem and it is aligned with the paper.

from meta-detr.

ZhangGongjie avatar ZhangGongjie commented on June 30, 2024

I hope Yuxuan-W's comments have addressed your concerns. @nhw649 Thank you @Yuxuan-W !

from meta-detr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.