alpha-vl / convmae Goto Github PK

View Code? Open in Web Editor NEW

468.0 11.0 39.0 8.74 MB

ConvMAE: Masked Convolution Meets Masked Autoencoders

License: MIT License

Python 99.80% Shell 0.20%

backbone computer-vision masked-image-modeling object-detection semantic-segmentation mae

convmae's People

Contributors

Stargazers

Watchers

convmae's Issues

ConvMAE flops for classification ?

output of FastconvMAE

I used your fastconvmae train imgnet data.

In your code, you said the output should be:

However, when i used the pretrained model to predict, it gave me prediction size= torch.Size([4, 196, 768]).
I also tested MAE mode, it can give prediction in torch.Size([1, 196, 768]).

Can you explain it why?

Hi

Hi,
Great work ! Congratulation！
How to draw the pictures in the "Visualization" section of README.md?

Will the code of MAE pretrain be updated recently?

image unpatchify related problems

There is a little problem in the open source code, self.patch_embed is not defined in the unpatchify function of the model. The original dimension of the image cannot be restored. I hope it can be modified slightly for our convenience. Thank you for your answer.

full checkpoint

Hi, @gaopengpjlab, could you kindly provide the full checkpoint (including the decoder) of ConvMAE-v2-S? Lots of thanks!

When will you update the MR-MCMAE model?

I want to try the MR-MCMAE model, but quite confused about how to build a pretrain structure @Alpha-VL @TeleeMa

Train on

Could you provide a tutorial on how to train and finetune with custom dataset? And how to modify the input image size during the detection, the current code seems not to support custom image size.

Doubts about masking strategy

Hi! Thanks for the opensource code. I have the doubts about masking strategy.
In the paper: Uniformly masking stage-1 input tokens from the H/4 × W/4 featuremaps would cause all tokens of stage-3 to have partially visible information and requires keeping all stage-3 tokens. Why the visible information will pass to the stage-3, if the images was masked in the first.
Thanks very much!

why still can't find the paper or details of ConvMAE-v2?

ConvMAE-v2 is much better than v1, what is the difference？please

How long will the the pretraining stage takes in V100?

Hi,

Thank you for your excellent work!
We would like to know how long would the pretraining of ImNet-1k take when running on the machine with 8 V100.
Also, will you release your manuscript about your work on Faster ConvMAE soon? We can't wait to know more details about the Faster ConvMAE.

about the training loss

Hello, dear, master! I observe that the training loss decrease from 0.42 to 0.39 spending too much epoch. So, I have a question that it really have a big different for the test result when the training loss decrease from 0.42 to 0.39?

Hello, how to finetune own datasets

What should I do if I want to fine tune the current pre training model to my own dataset instead of Imagenet's Val dataset? Can you answer it? Thank you very much

Question about VideoConvMAE

Thank you for your impressive work! VideoConvMAE seems still lack a code release, can you update it?

Pretraining implementation

I have implemented pretraining codes based on MAE repo but I wonder one thing: in the decoder phase, (1) do you sum all features of 3 stages and then normalize it or (2) you normalize the feature of last stage and then sum it with 2 previous ones? Because I got nan loss after 270 epochs with (1) approach. Btw, Have you ever met Nan loss during training?

When to release other model weights rather than base model?

The results of LIN pretrained with ConvMAE-Based for 200 epochs.

Thank you for your excellent work. I noticed that in Table 6, the results of LIN pretrained with ConvMAE-Based for 200 epochs are missed. May I ask what they are? Or is it convenient for you to provide the pretrained ConvMAE-Based for 200 epochs?

ImageNet Evaluation

Thanks for sharing the great work.
I encountered difficulties in reproducing the evaluation results on FINETUNE.md. My evaluation results are:
* Acc@1 1.090 Acc@5 2.188 loss 8.955
Accuracy of the network on the 50000 test images: 1.1%
That's obviously too big a gap.

I download the ImageNet-1K following your guidance and prepared the ImageNet-1K following Jasonlee1995. Are there any details I haven't noticed, or any specific requirements for preparing the dataset？

hi，i need help

Hello, I would like to ask you how to display the accuracy of each class output by finetune, and how to use the model of downstream tasks for visualization of detection

Questions about convmae-v2

convmae-v2 is great work, I'm very interested in some of the details of the paper, Where can I find the code for convmae-v2

Total memory consumption for training with 32 batch size.

I have tried training the convmae detector (as provided in this repository) with 2 GPUs with each 32GB (V-100). It looks like I can carry out training with only batch size = 2. Going beyond batch-size 2 raises CUDA out of memory. Also with such small batch size training does not seem to produce any well-trained model. Could you tell me the recommended memory size for training the model with batch size = 32?

Thank you so much.

Question about ConvMAE-v2

Thank you for your excellent work!

When I load ConvMAE-v2-Base pretrained checkpoints [https://drive.google.com/file/d/1gykVKNDlRn8eiuXk5bUj1PbSnHXFzLnI/view?usp=sharing], it has cls_token parameter, which not in models_convmae.py.

Does ConvMAE-v2 model different from models_convmae.py in some details, thanks!

refactor hard coded numbers for more control over parameters (MaskedAutoencoderConvViT)

Hi - I'd like to do patches of size 32x32, and a smaller model in general.
any thing I change breaks the entire code. It would be really helpful if you refactored out all of the places that specify 4,2,16...etc throughout the code for MaskedAutoencoderConvViT

Thanks,
Dan

Why not use the masked transformers directly in the first two stages?

Why use convolutions instead? Since upsampling is already employed to obtain the mask matrix, it seems like transformers could also be used.

Running pretrained convvit on larger image sizes

Hi,
I am looking to see how well the pretrained base model runs on my own dataset, but the current model is configured for an image size of 224
In the original MAE code, the 'interpolate_pos_embed' function would allow the user to increase the positional embedding to allow for larger image patches
In your linear probing code, that same script is commented out, and (obviously) doesn't function the same way, as there are multiple positional embeddings to take care of
Do you have a function that can allow the pretrained model to run on different image sizes?
Thanks

How to pretrain CNN backbones like ResNet and EfficientNet?

have you try pure convolution network?

have you try pure convolution network? does this work?

How can i train 200 epoches for DET ?

Hi ,
I want to train the pretrained model in detectron2 framework for object detection.
But the code only train 1 epoch and then ended.
Is this a bug ?

When to update the code for object detection？

Can you give us guide how to run your downloaded convmae.pth model to finetune?

Visualization VIT feature

Hi, author.

To visualize your results attention map, how can you visualize this?

Use Encoder (ViT)?
Use Decoder (VIT)?

given input x -> y = encoder(x) -> decoder(y). then use final vit of decoder(y)?

how many gpus and which type of gpu are needed for pretraining, down-stream task finetuning?

specifically, gpu type, gpu number and gpu training time for pretraining, detection training and segmentation training

mask convolution

Hi! Thanks for the opensource code.
I noticed that the mask convolution in the code only masks the residual block, but the skip connection does not have a mask, as shown in line 119 of "ConvMAE/vision_transformer.py". The corresponding code is as follows:
"x = x + self.drop_path(self.conv2(self.attn(mask * self.conv1(self.norm1(x.permute(0, 2, 3, 1)).permute(0, 3, 1, 2))))) "
Will this lead to information leakage in convolution stage?

about pretrain model convmae.pth

I downloaded your pretrained model. And when i tried to load it.
It gave me the following errors.

_IncompatibleKeys(missing_keys=['mask_token', 'decoder_pos_embed', 'stage1_output_decode.weight', 'stage1_output_decode.bias', 'stage2_output_decode.weight', 'stage2_output_decode.bias', 'decoder_embed.weight', 'decoder_embed.bias', 'decoder_blocks.0.norm1.weight', 'decoder_blocks.0.norm1.bias', 'decoder_blocks.0.attn.qkv.weight', 'decoder_blocks.0.attn.qkv.bias', 'decoder_blocks.0.attn.proj.weight', 'decoder_blocks.0.attn.proj.bias', 'decoder_blocks.0.norm2.weight', 'decoder_blocks.0.norm2.bias', 'decoder_blocks.0.mlp.fc1.weight', 'decoder_blocks.0.mlp.fc1.bias', 'decoder_blocks.0.mlp.fc2.weight', 'decoder_blocks.0.mlp.fc2.bias', 'decoder_blocks.1.norm1.weight', 'decoder_blocks.1.norm1.bias', 'decoder_blocks.1.attn.qkv.weight', 'decoder_blocks.1.attn.qkv.bias', 'decoder_blocks.1.attn.proj.weight', 'decoder_blocks.1.attn.proj.bias', 'decoder_blocks.1.norm2.weight', 'decoder_blocks.1.norm2.bias', 'decoder_blocks.1.mlp.fc1.weight', 'decoder_blocks.1.mlp.fc1.bias', 'decoder_blocks.1.mlp.fc2.weight', 'decoder_blocks.1.mlp.fc2.bias', 'decoder_blocks.2.norm1.weight', 'decoder_blocks.2.norm1.bias', 'decoder_blocks.2.attn.qkv.weight', 'decoder_blocks.2.attn.qkv.bias', 'decoder_blocks.2.attn.proj.weight', 'decoder_blocks.2.attn.proj.bias', 'decoder_blocks.2.norm2.weight', 'decoder_blocks.2.norm2.bias', 'decoder_blocks.2.mlp.fc1.weight', 'decoder_blocks.2.mlp.fc1.bias', 'decoder_blocks.2.mlp.fc2.weight', 'decoder_blocks.2.mlp.fc2.bias', 'decoder_blocks.3.norm1.weight', 'decoder_blocks.3.norm1.bias', 'decoder_blocks.3.attn.qkv.weight', 'decoder_blocks.3.attn.qkv.bias', 'decoder_blocks.3.attn.proj.weight', 'decoder_blocks.3.attn.proj.bias', 'decoder_blocks.3.norm2.weight', 'decoder_blocks.3.norm2.bias', 'decoder_blocks.3.mlp.fc1.weight', 'decoder_blocks.3.mlp.fc1.bias', 'decoder_blocks.3.mlp.fc2.weight', 'decoder_blocks.3.mlp.fc2.bias', 'decoder_blocks.4.norm1.weight', 'decoder_blocks.4.norm1.bias', 'decoder_blocks.4.attn.qkv.weight', 'decoder_blocks.4.attn.qkv.bias', 'decoder_blocks.4.attn.proj.weight', 'decoder_blocks.4.attn.proj.bias', 'decoder_blocks.4.norm2.weight', 'decoder_blocks.4.norm2.bias', 'decoder_blocks.4.mlp.fc1.weight', 'decoder_blocks.4.mlp.fc1.bias', 'decoder_blocks.4.mlp.fc2.weight', 'decoder_blocks.4.mlp.fc2.bias', 'decoder_blocks.5.norm1.weight', 'decoder_blocks.5.norm1.bias', 'decoder_blocks.5.attn.qkv.weight', 'decoder_blocks.5.attn.qkv.bias', 'decoder_blocks.5.attn.proj.weight', 'decoder_blocks.5.attn.proj.bias', 'decoder_blocks.5.norm2.weight', 'decoder_blocks.5.norm2.bias', 'decoder_blocks.5.mlp.fc1.weight', 'decoder_blocks.5.mlp.fc1.bias', 'decoder_blocks.5.mlp.fc2.weight', 'decoder_blocks.5.mlp.fc2.bias', 'decoder_blocks.6.norm1.weight', 'decoder_blocks.6.norm1.bias', 'decoder_blocks.6.attn.qkv.weight', 'decoder_blocks.6.attn.qkv.bias', 'decoder_blocks.6.attn.proj.weight', 'decoder_blocks.6.attn.proj.bias', 'decoder_blocks.6.norm2.weight', 'decoder_blocks.6.norm2.bias', 'decoder_blocks.6.mlp.fc1.weight', 'decoder_blocks.6.mlp.fc1.bias', 'decoder_blocks.6.mlp.fc2.weight', 'decoder_blocks.6.mlp.fc2.bias', 'decoder_blocks.7.norm1.weight', 'decoder_blocks.7.norm1.bias', 'decoder_blocks.7.attn.qkv.weight', 'decoder_blocks.7.attn.qkv.bias', 'decoder_blocks.7.attn.proj.weight', 'decoder_blocks.7.attn.proj.bias', 'decoder_blocks.7.norm2.weight', 'decoder_blocks.7.norm2.bias', 'decoder_blocks.7.mlp.fc1.weight', 'decoder_blocks.7.mlp.fc1.bias', 'decoder_blocks.7.mlp.fc2.weight', 'decoder_blocks.7.mlp.fc2.bias', 'decoder_norm.weight', 'decoder_norm.bias', 'decoder_pred.weight', 'decoder_pred.bias'], unexpected_keys=[])

Time required to train one epoch.

Dear author:
Thank you for sharing the excellent work! May I ask how the time overhead of ConvMAE pre-training compares to MAE? Can you provide the time required to train an epoch for these two methods on the same type of GPU?

Model Settings and checkpoint not match

Thanks for your great work!

But I have a problem about the model setting with your provided checkpoint.
I load your checkpoints, but the model setting that can be loaded correctly does not match what is written in the paper.

the mlp_ratio of Large and Huge
the patch_size of huge

I want to find out what's going on, thanks a lot!

alpha-vl / convmae Goto Github PK

convmae's People

Contributors

Stargazers

Watchers

Forkers

convmae's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs