Comments (8)
Thanks for your sharing and asking. Our findings is that if you want good results on down-stream tasks of your own, it is highly recommended that a second-stage supervised pretraining approach after SimMIM (or similar approaches such as MAE) is encouraged. This second-stage supervised pretraining will introduce additional semantics that will be helpful for other down-stream tasks. This is what we did for our 3B Swin V2 training: SimMIM + supervised (classification).
Thank you for your reply in your busy schedule, but my question has not been solved at all. I am curious about why the training loss convergence speed is not much different from the initialization model in the downstream tasks such as segmentation after SimMIM pre-training (maybe part of the reason is that the segmentation network has half of the decoder parameters). Because I have tried the comparative learning self-supervision method like DINO before, its downstream task training loss convergence speed is very fast, so I feel confused about this, and I am also checking whether there is a problem in my operation.
from simmim.
Thanks for your sharing and asking. Our findings is that if you want good results on down-stream tasks of your own, it is highly recommended that a second-stage supervised pretraining approach after SimMIM (or similar approaches such as MAE) is encouraged. This second-stage supervised pretraining will introduce additional semantics that will be helpful for other down-stream tasks. This is what we did for our 3B Swin V2 training: SimMIM + supervised (classification).
from simmim.
Supplement training data for reference
from simmim.
I did not quite follow your steps. Is it the following comparison:
SimMIM pre-training + segmentation fine-tune (red)
vs. supervised pre-training + segmentation fine-tune (blue)
from simmim.
SimMIM pre-training backbone + segmentation fine-tune (red)
vs. Initialization weight backbone + segmentation fine-tune (blue)
from simmim.
SimMIM pre-training backbone + segmentation fine-tune (red) vs. Initialization weight backbone + segmentation fine-tune (blue)
Thank you for your clarification. In general, the model with pretraining will converge much faster.
Yes, it is probably because the head is heavy compared to backbone. Another possible explanation could be that this problem is relatively simple, that both methods converge very fast.
from simmim.
Thanks for your sharing and asking. Our findings is that if you want good results on down-stream tasks of your own, it is highly recommended that a second-stage supervised pretraining approach after SimMIM (or similar approaches such as MAE) is encouraged. This second-stage supervised pretraining will introduce additional semantics that will be helpful for other down-stream tasks. This is what we did for our 3B Swin V2 training: SimMIM + supervised (classification).
Would it be possible to explain what exactly you mean by "second-stage supervised pretraining"? Is there any documentation you could link concerning this? Thanks!
from simmim.
Thanks for your sharing and asking. Our findings is that if you want good results on down-stream tasks of your own, it is highly recommended that a second-stage supervised pretraining approach after SimMIM (or similar approaches such as MAE) is encouraged. This second-stage supervised pretraining will introduce additional semantics that will be helpful for other down-stream tasks. This is what we did for our 3B Swin V2 training: SimMIM + supervised (classification).
Thank you for your reply in your busy schedule, but my question has not been solved at all. I am curious about why the training loss convergence speed is not much different from the initialization model in the downstream tasks such as segmentation after SimMIM pre-training (maybe part of the reason is that the segmentation network has half of the decoder parameters). Because I have tried the comparative learning self-supervision method like DINO before, its downstream task training loss convergence speed is very fast, so I feel confused about this, and I am also checking whether there is a problem in my operation.
I also have the same problem. @834799106 Do u solve the problem?
from simmim.
Related Issues (20)
- when training, here happens 'Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to' HOT 1
- a bug in loading pretrain checkpoint , in utils.py line 118, you dont use checkpoint in line 111
- For cnn architecture like resnet50
- Could you please release the pre-trained R50 model?
- Why there is "no_weight_decay" function for Swin-T but not for VIT
- Train from scratch HOT 1
- Performance using the cosine distance
- Indentation bug in utils.remap_pretrained_keys_vit
- Why Swin-Large-W12 contains [36, 36] `encoder.layers.3.blocks.0.attn.relative_position_index` HOT 1
- The specific configs and code for downstream tasks, like semantic segmentation and object detection
- Any plan to support 3D version?
- Questions about AvgDist
- any plan to release the code of DDP training of swinV2-G with multi machine?
- Tt
- Setting for Linear eval
- Could you provide the pretrained ViT-Base model with patch size 16?
- How can I solve this problem HOT 2
- 用自监督结果的预训练权重对下游任务精度影响
- inverse swin HOT 1
- About mask_token in SimMIM codes
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simmim.