alpha-vl / convmae Goto Github PK
View Code? Open in Web Editor NEWConvMAE: Masked Convolution Meets Masked Autoencoders
License: MIT License
ConvMAE: Masked Convolution Meets Masked Autoencoders
License: MIT License
Hi,
Great work ! Congratulation!
How to draw the pictures in the "Visualization" section of README.md?
There is a little problem in the open source code, self.patch_embed is not defined in the unpatchify function of the model. The original dimension of the image cannot be restored. I hope it can be modified slightly for our convenience. Thank you for your answer.
Hi, @gaopengpjlab, could you kindly provide the full checkpoint (including the decoder) of ConvMAE-v2-S? Lots of thanks!
Could you provide a tutorial on how to train and finetune with custom dataset? And how to modify the input image size during the detection, the current code seems not to support custom image size.
Hi! Thanks for the opensource code. I have the doubts about masking strategy.
In the paper: Uniformly masking stage-1 input tokens from the H/4 × W/4 featuremaps would cause all tokens of stage-3 to have partially visible information and requires keeping all stage-3 tokens. Why the visible information will pass to the stage-3, if the images was masked in the first.
Thanks very much!
ConvMAE-v2 is much better than v1, what is the difference?please
Hi,
Thank you for your excellent work!
We would like to know how long would the pretraining of ImNet-1k take when running on the machine with 8 V100.
Also, will you release your manuscript about your work on Faster ConvMAE soon? We can't wait to know more details about the Faster ConvMAE.
Hello, dear, master! I observe that the training loss decrease from 0.42 to 0.39 spending too much epoch. So, I have a question that it really have a big different for the test result when the training loss decrease from 0.42 to 0.39?
What should I do if I want to fine tune the current pre training model to my own dataset instead of Imagenet's Val dataset? Can you answer it? Thank you very much
Thank you for your impressive work! VideoConvMAE seems still lack a code release, can you update it?
I have implemented pretraining codes based on MAE repo but I wonder one thing: in the decoder phase, (1) do you sum all features of 3 stages and then normalize it or (2) you normalize the feature of last stage and then sum it with 2 previous ones? Because I got nan loss after 270 epochs with (1) approach. Btw, Have you ever met Nan loss during training?
Thank you for your excellent work. I noticed that in Table 6, the results of LIN pretrained with ConvMAE-Based for 200 epochs are missed. May I ask what they are? Or is it convenient for you to provide the pretrained ConvMAE-Based for 200 epochs?
Thanks for sharing the great work.
I encountered difficulties in reproducing the evaluation results on FINETUNE.md. My evaluation results are:
* Acc@1 1.090 Acc@5 2.188 loss 8.955
Accuracy of the network on the 50000 test images: 1.1%
That's obviously too big a gap.
I download the ImageNet-1K following your guidance and prepared the ImageNet-1K following Jasonlee1995. Are there any details I haven't noticed, or any specific requirements for preparing the dataset?
Hello, I would like to ask you how to display the accuracy of each class output by finetune, and how to use the model of downstream tasks for visualization of detection
convmae-v2 is great work, I'm very interested in some of the details of the paper, Where can I find the code for convmae-v2
I have tried training the convmae detector (as provided in this repository) with 2 GPUs with each 32GB (V-100). It looks like I can carry out training with only batch size = 2. Going beyond batch-size 2 raises CUDA out of memory. Also with such small batch size training does not seem to produce any well-trained model. Could you tell me the recommended memory size for training the model with batch size = 32?
Thank you so much.
Thank you for your excellent work!
When I load ConvMAE-v2-Base pretrained checkpoints [https://drive.google.com/file/d/1gykVKNDlRn8eiuXk5bUj1PbSnHXFzLnI/view?usp=sharing], it has cls_token parameter, which not in models_convmae.py.
Does ConvMAE-v2 model different from models_convmae.py in some details, thanks!
Hi - I'd like to do patches of size 32x32, and a smaller model in general.
any thing I change breaks the entire code. It would be really helpful if you refactored out all of the places that specify 4,2,16...etc throughout the code for MaskedAutoencoderConvViT
Thanks,
Dan
Why use convolutions instead? Since upsampling is already employed to obtain the mask matrix, it seems like transformers could also be used.
Hi,
I am looking to see how well the pretrained base model runs on my own dataset, but the current model is configured for an image size of 224
In the original MAE code, the 'interpolate_pos_embed' function would allow the user to increase the positional embedding to allow for larger image patches
In your linear probing code, that same script is commented out, and (obviously) doesn't function the same way, as there are multiple positional embeddings to take care of
Do you have a function that can allow the pretrained model to run on different image sizes?
Thanks
have you try pure convolution network? does this work?
Hi ,
I want to train the pretrained model in detectron2 framework for object detection.
But the code only train 1 epoch and then ended.
Is this a bug ?
Hi, author.
To visualize your results attention map, how can you visualize this?
given input x -> y = encoder(x) -> decoder(y). then use final vit of decoder(y)?
specifically, gpu type, gpu number and gpu training time for pretraining, detection training and segmentation training
Hi! Thanks for the opensource code.
I noticed that the mask convolution in the code only masks the residual block, but the skip connection does not have a mask, as shown in line 119 of "ConvMAE/vision_transformer.py". The corresponding code is as follows:
"x = x + self.drop_path(self.conv2(self.attn(mask * self.conv1(self.norm1(x.permute(0, 2, 3, 1)).permute(0, 3, 1, 2))))) "
Will this lead to information leakage in convolution stage?
I downloaded your pretrained model. And when i tried to load it.
It gave me the following errors.
_IncompatibleKeys(missing_keys=['mask_token', 'decoder_pos_embed', 'stage1_output_decode.weight', 'stage1_output_decode.bias', 'stage2_output_decode.weight', 'stage2_output_decode.bias', 'decoder_embed.weight', 'decoder_embed.bias', 'decoder_blocks.0.norm1.weight', 'decoder_blocks.0.norm1.bias', 'decoder_blocks.0.attn.qkv.weight', 'decoder_blocks.0.attn.qkv.bias', 'decoder_blocks.0.attn.proj.weight', 'decoder_blocks.0.attn.proj.bias', 'decoder_blocks.0.norm2.weight', 'decoder_blocks.0.norm2.bias', 'decoder_blocks.0.mlp.fc1.weight', 'decoder_blocks.0.mlp.fc1.bias', 'decoder_blocks.0.mlp.fc2.weight', 'decoder_blocks.0.mlp.fc2.bias', 'decoder_blocks.1.norm1.weight', 'decoder_blocks.1.norm1.bias', 'decoder_blocks.1.attn.qkv.weight', 'decoder_blocks.1.attn.qkv.bias', 'decoder_blocks.1.attn.proj.weight', 'decoder_blocks.1.attn.proj.bias', 'decoder_blocks.1.norm2.weight', 'decoder_blocks.1.norm2.bias', 'decoder_blocks.1.mlp.fc1.weight', 'decoder_blocks.1.mlp.fc1.bias', 'decoder_blocks.1.mlp.fc2.weight', 'decoder_blocks.1.mlp.fc2.bias', 'decoder_blocks.2.norm1.weight', 'decoder_blocks.2.norm1.bias', 'decoder_blocks.2.attn.qkv.weight', 'decoder_blocks.2.attn.qkv.bias', 'decoder_blocks.2.attn.proj.weight', 'decoder_blocks.2.attn.proj.bias', 'decoder_blocks.2.norm2.weight', 'decoder_blocks.2.norm2.bias', 'decoder_blocks.2.mlp.fc1.weight', 'decoder_blocks.2.mlp.fc1.bias', 'decoder_blocks.2.mlp.fc2.weight', 'decoder_blocks.2.mlp.fc2.bias', 'decoder_blocks.3.norm1.weight', 'decoder_blocks.3.norm1.bias', 'decoder_blocks.3.attn.qkv.weight', 'decoder_blocks.3.attn.qkv.bias', 'decoder_blocks.3.attn.proj.weight', 'decoder_blocks.3.attn.proj.bias', 'decoder_blocks.3.norm2.weight', 'decoder_blocks.3.norm2.bias', 'decoder_blocks.3.mlp.fc1.weight', 'decoder_blocks.3.mlp.fc1.bias', 'decoder_blocks.3.mlp.fc2.weight', 'decoder_blocks.3.mlp.fc2.bias', 'decoder_blocks.4.norm1.weight', 'decoder_blocks.4.norm1.bias', 'decoder_blocks.4.attn.qkv.weight', 'decoder_blocks.4.attn.qkv.bias', 'decoder_blocks.4.attn.proj.weight', 'decoder_blocks.4.attn.proj.bias', 'decoder_blocks.4.norm2.weight', 'decoder_blocks.4.norm2.bias', 'decoder_blocks.4.mlp.fc1.weight', 'decoder_blocks.4.mlp.fc1.bias', 'decoder_blocks.4.mlp.fc2.weight', 'decoder_blocks.4.mlp.fc2.bias', 'decoder_blocks.5.norm1.weight', 'decoder_blocks.5.norm1.bias', 'decoder_blocks.5.attn.qkv.weight', 'decoder_blocks.5.attn.qkv.bias', 'decoder_blocks.5.attn.proj.weight', 'decoder_blocks.5.attn.proj.bias', 'decoder_blocks.5.norm2.weight', 'decoder_blocks.5.norm2.bias', 'decoder_blocks.5.mlp.fc1.weight', 'decoder_blocks.5.mlp.fc1.bias', 'decoder_blocks.5.mlp.fc2.weight', 'decoder_blocks.5.mlp.fc2.bias', 'decoder_blocks.6.norm1.weight', 'decoder_blocks.6.norm1.bias', 'decoder_blocks.6.attn.qkv.weight', 'decoder_blocks.6.attn.qkv.bias', 'decoder_blocks.6.attn.proj.weight', 'decoder_blocks.6.attn.proj.bias', 'decoder_blocks.6.norm2.weight', 'decoder_blocks.6.norm2.bias', 'decoder_blocks.6.mlp.fc1.weight', 'decoder_blocks.6.mlp.fc1.bias', 'decoder_blocks.6.mlp.fc2.weight', 'decoder_blocks.6.mlp.fc2.bias', 'decoder_blocks.7.norm1.weight', 'decoder_blocks.7.norm1.bias', 'decoder_blocks.7.attn.qkv.weight', 'decoder_blocks.7.attn.qkv.bias', 'decoder_blocks.7.attn.proj.weight', 'decoder_blocks.7.attn.proj.bias', 'decoder_blocks.7.norm2.weight', 'decoder_blocks.7.norm2.bias', 'decoder_blocks.7.mlp.fc1.weight', 'decoder_blocks.7.mlp.fc1.bias', 'decoder_blocks.7.mlp.fc2.weight', 'decoder_blocks.7.mlp.fc2.bias', 'decoder_norm.weight', 'decoder_norm.bias', 'decoder_pred.weight', 'decoder_pred.bias'], unexpected_keys=[])
Dear author:
Thank you for sharing the excellent work! May I ask how the time overhead of ConvMAE pre-training compares to MAE? Can you provide the time required to train an epoch for these two methods on the same type of GPU?
Thanks for your great work!
But I have a problem about the model setting with your provided checkpoint.
I load your checkpoints, but the model setting that can be loaded correctly does not match what is written in the paper.
I want to find out what's going on, thanks a lot!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.