GithubHelp home page GithubHelp logo

Confusion about fine-tune about simmim HOT 8 CLOSED

Breeze-Zero avatar Breeze-Zero commented on May 20, 2024
Confusion about fine-tune

from simmim.

Comments (8)

Breeze-Zero avatar Breeze-Zero commented on May 20, 2024 1

Thanks for your sharing and asking. Our findings is that if you want good results on down-stream tasks of your own, it is highly recommended that a second-stage supervised pretraining approach after SimMIM (or similar approaches such as MAE) is encouraged. This second-stage supervised pretraining will introduce additional semantics that will be helpful for other down-stream tasks. This is what we did for our 3B Swin V2 training: SimMIM + supervised (classification).

Thank you for your reply in your busy schedule, but my question has not been solved at all. I am curious about why the training loss convergence speed is not much different from the initialization model in the downstream tasks such as segmentation after SimMIM pre-training (maybe part of the reason is that the segmentation network has half of the decoder parameters). Because I have tried the comparative learning self-supervision method like DINO before, its downstream task training loss convergence speed is very fast, so I feel confused about this, and I am also checking whether there is a problem in my operation.

from simmim.

ancientmooner avatar ancientmooner commented on May 20, 2024

Thanks for your sharing and asking. Our findings is that if you want good results on down-stream tasks of your own, it is highly recommended that a second-stage supervised pretraining approach after SimMIM (or similar approaches such as MAE) is encouraged. This second-stage supervised pretraining will introduce additional semantics that will be helpful for other down-stream tasks. This is what we did for our 3B Swin V2 training: SimMIM + supervised (classification).

from simmim.

Breeze-Zero avatar Breeze-Zero commented on May 20, 2024

Supplement training data for reference
1642838225(1)
1642838294(1)

from simmim.

ancientmooner avatar ancientmooner commented on May 20, 2024

I did not quite follow your steps. Is it the following comparison:

SimMIM pre-training + segmentation fine-tune (red)
vs. supervised pre-training + segmentation fine-tune (blue)

from simmim.

Breeze-Zero avatar Breeze-Zero commented on May 20, 2024

SimMIM pre-training backbone + segmentation fine-tune (red)
vs. Initialization weight backbone + segmentation fine-tune (blue)

from simmim.

ancientmooner avatar ancientmooner commented on May 20, 2024

SimMIM pre-training backbone + segmentation fine-tune (red) vs. Initialization weight backbone + segmentation fine-tune (blue)

Thank you for your clarification. In general, the model with pretraining will converge much faster.

Yes, it is probably because the head is heavy compared to backbone. Another possible explanation could be that this problem is relatively simple, that both methods converge very fast.

from simmim.

Asers387 avatar Asers387 commented on May 20, 2024

Thanks for your sharing and asking. Our findings is that if you want good results on down-stream tasks of your own, it is highly recommended that a second-stage supervised pretraining approach after SimMIM (or similar approaches such as MAE) is encouraged. This second-stage supervised pretraining will introduce additional semantics that will be helpful for other down-stream tasks. This is what we did for our 3B Swin V2 training: SimMIM + supervised (classification).

Would it be possible to explain what exactly you mean by "second-stage supervised pretraining"? Is there any documentation you could link concerning this? Thanks!

from simmim.

ywdong avatar ywdong commented on May 20, 2024

Thanks for your sharing and asking. Our findings is that if you want good results on down-stream tasks of your own, it is highly recommended that a second-stage supervised pretraining approach after SimMIM (or similar approaches such as MAE) is encouraged. This second-stage supervised pretraining will introduce additional semantics that will be helpful for other down-stream tasks. This is what we did for our 3B Swin V2 training: SimMIM + supervised (classification).

Thank you for your reply in your busy schedule, but my question has not been solved at all. I am curious about why the training loss convergence speed is not much different from the initialization model in the downstream tasks such as segmentation after SimMIM pre-training (maybe part of the reason is that the segmentation network has half of the decoder parameters). Because I have tried the comparative learning self-supervision method like DINO before, its downstream task training loss convergence speed is very fast, so I feel confused about this, and I am also checking whether there is a problem in my operation.

I also have the same problem. @834799106 Do u solve the problem?

from simmim.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.