GithubHelp home page GithubHelp logo

Some Questions about selfpatch HOT 12 OPEN

bryanwong17 avatar bryanwong17 commented on August 19, 2024
Some Questions

from selfpatch.

Comments (12)

yanjk3 avatar yanjk3 commented on August 19, 2024

I am not the author and I hope my answer can help you.
A1: because the neighbor of a patch is defined in the same view. The ''neighbor'' is not easy to define in a cross-view situation. (or you can try to define it with some spatial priori)
A2: the local views are fed into the teacher network to contribute to the selfpatch loss, i.e., the loss from the same view mentioned before, which may not be a must and may accelerate the convergence.
A3: ''loc=True'' means aggregating the neighbor's features, which is enabled in the teacher network. E.g., the i^th patch of the teacher network aggregates its neighbor's features. In the student model, we do not aggregate them. Then, we maximize the similarity between the student's i^th patch and the teacher's i^th patch (it includes the neighbor's features) to model the patch-level representations.

I hope the above opinion may help u.

from selfpatch.

bryanwong17 avatar bryanwong17 commented on August 19, 2024

Hi @yanjk3, Thank you very much for the answers, I really appreciate it. It makes more sense now that I know the authors made a slight modification to the original DINO

from selfpatch.

bryanwong17 avatar bryanwong17 commented on August 19, 2024

Hi @yanjk3, When I use eval_knn.py from original dino to evaluate selfpatch, it says:  

size mismatch for pos_embed: copying a param with shape torch.Size([1, 196, 384]) from checkpoint, the shape in current model is torch.Size([1, 197, 384]).

Do you have any ideas on how can I fix it? Thank you

from selfpatch.

yanjk3 avatar yanjk3 commented on August 19, 2024

This is because the selfpatch checkpoint does not contain the CLS token. Therefore, the position embedding's size is mismatched. In selfpatch, the CLS token is in the SelfPatchHead https://github.com/alinlab/SelfPatch/blob/main/selfpatch_vision_transformer.py#L362, so the ViT backbone does not need the CLS token.

I think you can fix it by modifying the dino's ViT codes https://github.com/facebookresearch/dino/blob/main/vision_transformer.py#L147 from self.pos_embed = nn.Parameter(torch.zeros(1, num_patches + 1, embed_dim)) to self.pos_embed = nn.Parameter(torch.zeros(1, num_patches, embed_dim)).
And then you should delete the '-1' in line 175 and line 176, and exchange line 202 and line 205.
However, as the selfpatch checkpoint does not contain the CLS token, the ViT model will randomly initialize a CLS token and lead to a potential performance drop. I think you can use a global average pooling on the last transformer block to get the global feature representation of images instead of using the CLS token.

from selfpatch.

bryanwong17 avatar bryanwong17 commented on August 19, 2024

Hi @yanjk3, thank you for your answers. Could you demonstrate how I can use a global avg pooling on the last transformer blocks?

from selfpatch.

yanjk3 avatar yanjk3 commented on August 19, 2024

You should make sure you delete the CLS token in the ViT first. And then, you can insert
x = x.mean(dim=1)
after the
x = self.norm(x)
and then return the x

from selfpatch.

bryanwong17 avatar bryanwong17 commented on August 19, 2024

Hi @yanjk3, I already took your advice, but it appears that the accuracy is 3% less than it was for the original DINO under the same settings for eval knn.py. What solutions do you have for this? How can accuracy be checked more accurately? Is it better to check from eval linear.py or eval knn.py? Thanks

from selfpatch.

yanjk3 avatar yanjk3 commented on August 19, 2024

To overcome the performance drops, I recommend copying the SelfPatch ViT to the Dino ViT.
The main difference between them is:

SelfPatch uses the CA block after the ViT blocks to aggregate the global feature representations and output the CLS token.

If you use this CLS token, the performance may be improved.
But unfortunately, the released checkpoint only contains the ViT backbone. So if you want to get a precise answer, you should pre-train the entire model on your own.

from selfpatch.

bryanwong17 avatar bryanwong17 commented on August 19, 2024

Hi @yanjk3, sorry I don't really get it. What do you mean by copying SelfPatch VIT to DINO VIT?

from selfpatch.

yanjk3 avatar yanjk3 commented on August 19, 2024

I mean you should replace the dino vit model's code with selfpatch vit model's code.

from selfpatch.

bryanwong17 avatar bryanwong17 commented on August 19, 2024

Hi @yanjk3, do you mean adding everything you previously suggested to the code for the Dino Vit Model (vision transformer.py)?

from selfpatch.

alijavidani avatar alijavidani commented on August 19, 2024

Hi @bryanwong17, @yanjk3 . I'm having the same problem as yours. I cannot do the evaluation using eval_knn.py.
I was wondering could you find a solution for this problem?
Thanks in advance.

from selfpatch.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.