Thank you very much for your work. I tried to use your method to conduct pre-training

Supplement training data for reference <a target="_blank" rel="noopener noreferrer

Confusion about fine-tune about simmim HOT 8 CLOSED

Breeze-Zero commented on May 20, 2024

Confusion about fine-tune

from simmim.

Comments (8)

Breeze-Zero commented on May 20, 2024 1

Thanks for your sharing and asking. Our findings is that if you want good results on down-stream tasks of your own, it is highly recommended that a second-stage supervised pretraining approach after SimMIM (or similar approaches such as MAE) is encouraged. This second-stage supervised pretraining will introduce additional semantics that will be helpful for other down-stream tasks. This is what we did for our 3B Swin V2 training: SimMIM + supervised (classification).

Thank you for your reply in your busy schedule, but my question has not been solved at all. I am curious about why the training loss convergence speed is not much different from the initialization model in the downstream tasks such as segmentation after SimMIM pre-training (maybe part of the reason is that the segmentation network has half of the decoder parameters). Because I have tried the comparative learning self-supervision method like DINO before, its downstream task training loss convergence speed is very fast, so I feel confused about this, and I am also checking whether there is a problem in my operation.

from simmim.

ancientmooner commented on May 20, 2024

Thanks for your sharing and asking. Our findings is that if you want good results on down-stream tasks of your own, it is highly recommended that a second-stage supervised pretraining approach after SimMIM (or similar approaches such as MAE) is encouraged. This second-stage supervised pretraining will introduce additional semantics that will be helpful for other down-stream tasks. This is what we did for our 3B Swin V2 training: SimMIM + supervised (classification).

from simmim.

Breeze-Zero commented on May 20, 2024

Supplement training data for reference

from simmim.

ancientmooner commented on May 20, 2024

I did not quite follow your steps. Is it the following comparison:

SimMIM pre-training + segmentation fine-tune (red)
vs. supervised pre-training + segmentation fine-tune (blue)

from simmim.

Breeze-Zero commented on May 20, 2024

SimMIM pre-training backbone + segmentation fine-tune (red)
vs. Initialization weight backbone + segmentation fine-tune (blue)

from simmim.

ancientmooner commented on May 20, 2024

SimMIM pre-training backbone + segmentation fine-tune (red) vs. Initialization weight backbone + segmentation fine-tune (blue)

Thank you for your clarification. In general, the model with pretraining will converge much faster.

Yes, it is probably because the head is heavy compared to backbone. Another possible explanation could be that this problem is relatively simple, that both methods converge very fast.

from simmim.

Asers387 commented on May 20, 2024

Thanks for your sharing and asking. Our findings is that if you want good results on down-stream tasks of your own, it is highly recommended that a second-stage supervised pretraining approach after SimMIM (or similar approaches such as MAE) is encouraged. This second-stage supervised pretraining will introduce additional semantics that will be helpful for other down-stream tasks. This is what we did for our 3B Swin V2 training: SimMIM + supervised (classification).

Would it be possible to explain what exactly you mean by "second-stage supervised pretraining"? Is there any documentation you could link concerning this? Thanks!

from simmim.

ywdong commented on May 20, 2024

Thanks for your sharing and asking. Our findings is that if you want good results on down-stream tasks of your own, it is highly recommended that a second-stage supervised pretraining approach after SimMIM (or similar approaches such as MAE) is encouraged. This second-stage supervised pretraining will introduce additional semantics that will be helpful for other down-stream tasks. This is what we did for our 3B Swin V2 training: SimMIM + supervised (classification).

Thank you for your reply in your busy schedule, but my question has not been solved at all. I am curious about why the training loss convergence speed is not much different from the initialization model in the downstream tasks such as segmentation after SimMIM pre-training (maybe part of the reason is that the segmentation network has half of the decoder parameters). Because I have tried the comparative learning self-supervision method like DINO before, its downstream task training loss convergence speed is very fast, so I feel confused about this, and I am also checking whether there is a problem in my operation.

I also have the same problem. @834799106 Do u solve the problem？

from simmim.

Confusion about fine-tune about simmim HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs