visual-attention-network / van-classification Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
I've tried to use Van Large as backbone for binary semantic segmentation and found it very unstable.
Model: UpperNet. Previously well tested with Resnet50 and SwinBase.
Features: same as in https://github.com/Visual-Attention-Network/VAN-Segmentation
Just switching backbone to van large fails after 5 of 7 epochs: model generate nan outputs.
In the same time F1 and accuracy grows from 1 to 5 epochs (this is not a divergency).
Right now i have no time to dive deeper to find nan's source, so this is just a feedback.
First, thanks for the great and helpful work!
I wonder why Sigmoid isn't used in the LKA module which can be written as:
Since in the previous works Sigmoid is always used as the last part of the attention module like SE-block[1] or CBAM[2]:
Are there any ablation studies about this change? Whether the fact that Sigmoid harms the performance results in this removal.
I cannot come up with why to remove Sigmoid except slight reduction in computational cost to the best of my knowledge. I would really appreciate it if you can answer my question!
[1]:Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.
[2]:Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3-19.
Thanks for your amazing work!It converges fastly and get amazing accuracy on imagenet.
In your pth file, I found args.sync_bn is False.
In your train.py, I found the model in your code broadcast buffers about bn before each forward instead of SyncBN.
So, I have a small question, why don't you use SyncBN?
Thanks very much!
How to choose the target layer to compute CAM on your VAN?
Thank you so much for this enlightening and inspiring work.I don't quite understand why LKC(i.e. Large Kernel Convolution) = DW-Conv + DW-D-Conv + 1x1Conv.
My current understanding can refer to the following table:
So according to my understanding, what you want to express in Fig.2 is: DW-Conv, DW-D-Conv and 1x1Conv each have a part of LKC properties, and have lower computational complexity.That is to say, the plus sign and equal sign in Fig.2 are for the addition of Properties.
Is my understanding correct? If it is not correct, please give a more intuitive understanding. Thank you very much!
Hello, I have doubt about when to use the freeze_patch_emb function, did you freeze the param when you train your model?
Can someone help with porting the models and weights from PyTorch to Keras?
Thanks in advance!
Yusuf
Hi, it looks like all the dimension permutation operations on PatchEmbed and Block are unnecessary, making the whole network a convolutional neural network rather than a transformer. Is that true? I was wondering your opinion on this argument. Thanks:)
I found imagenet 22k pretrained models on TsingHua Cloud.
But I can't find 1k finetuned models.
Do you have any release plans or links?
For classification,can you tell me the total GPU usage about the base version,Thanks!
In table 3, changing attention (mul) to add reduces VAN performance from 75.4 to 74.6. I think this is really huge. However, in the ablation study, you stated that "Besides, replacing attention with adding operation is also not achieving a lower accuracy". Is it okay to say it like that since the performance drop is 0.8
Can't treat add as a type of attention function? In Attention Mechanisms in Computer Vision: A Survey, we have the formula:
I can treat function f here is an addition operation can't I?
哪位大神知道下载下来的models如何使用啊
Hi, is there already a repo, or will there be upcoming ones, for the object detection and/or instance segmentation tasks? Thanks.
Thank you for releasing this code and pre-trained models.
Could you verify what the data processing configuration was for your pre-trained models? Is it the following, which I get when running your example training script?
Data processing configuration for current model + dataset:
input_size: (3, 224, 224)
interpolation: bicubic
mean: (0.5, 0.5, 0.5)
std: (0.5, 0.5, 0.5)
crop_pct: 0.9
Env:
torch == 1.10.1
timm == 0.4.12
Script:
ckpt_file = 'ckpt/van_base_828.pth.tar'
ckpt = torch.load(ckpt_file, map_location="cpu")
Error message:
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
Since torch == 1.10.1
satisfies the requirement (torch >= 1.7
) and we do need the latest pytorch version for other parts of our project. Anyone can help? Thanks!
where is the detection model?
The forward functions for LKA and Attention both used u = x.clone() to detach u from the computation graph.
Is this a new trick? What advantages do you observe?
Thanks.
Hello ! what's the model you used for the classification problem? I know the VANs is the backbone, the train.py
indicates the model default is ResNet?
It's same that the release model is not the best one.
python validate.py "MYDATA/ImageNet" --model "van_tiny" --checkpoint "weights/van_tiny_754.pth.tar" -b 200
The result is following: Acc@1 72.032 (27.968) Acc@5 91.126 (8.874)
I check the state_dict of checkpoint and found that ema version is not exist, and I think the ema version's accuracy is same with paper.
so will you check it ?
Thanks for publishing such an excellent work ! It will be perfect if you could provide the semantic segmentation version. Really appreciate for your help !
Thanks for your great work!
It will be so kind if you release the codes for VAN detection. :)
Thanks for your great work! I noticed that the visualization results of Grad-CAM are wonderful.
I would appreciate it if you could release the code!
Hello,
Thanks for your great work.
When I try to run validate.py using the model "van_small_811.pth.tar" downloading from google drive, I got the following error:
Error(s) in loading state_dict for DPN:
Missing key(s) in state_dict: "features.conv1_1.conv.weight", "features.conv1_1.bn.weight", "features.conv1_1.bn.bias", "features.conv1_1.bn.running_mean", "features.conv1_1.bn.running_var", "features.conv2_1.c1x1_w_s1.bn.weight", "features.conv2_1.c1x1_w_s1.bn.bias", "features.conv2_1.c1x1_w_s1.bn.running_mean", "features.conv2_1.c1x1_w_s1.bn.running_var", "features.conv2_1.c1x1_w_s1.conv.weight", "features.conv2_1.c1x1_a.bn.weight", "features.conv2_1.c1x1_a.bn.bias", "features.conv2_1.c1x1_a.bn.running_mean", "features.conv2_1.c1x1_a.bn.running_var", "features.conv2_1.c1x1_a.conv.weight", "features.conv2_1.c3x3_b.bn.weight", "features.conv2_1.c3x3_b.bn.bias", "features.conv2_1.c3x3_b.bn.running_mean", "features.conv2_1.c3x3_b.bn.running_var", "features.conv2_1.c3x3_b.conv.weight", "features.conv2_1.c1x1_c.bn.weight", "features.conv2_1.c1x1_c.bn.bias", "features.conv2_1.c1x1_c.bn.running_mean", "features.conv2_1.c1x1_c.bn.running_var", "features.conv2_1.c1x1_c.conv.weight", "features.conv2_2.c1x1_a.bn.weight", "features.conv2_2.c1x1_a.bn.bias", "features.conv2_2.c1x1_a.bn.running_mean", "features.conv2_2.c1x1_a.bn.running_var", "features.conv2_2.c1x1_a.conv.weight", "features.conv2_2.c3x3_b.bn.weight", "features.conv2_2.c3x3_b.bn.bias", "features.conv2_2.c3x3_b.bn.running_mean", "features.conv2_2.c3x3_b.bn.running_var", "features.conv2_2.c3x3_b.conv.weight", "features.conv2_2.c1x1_c.bn.weight", "features.conv2_2.c1x1_c.bn.bias", "features.conv2_2.c1x1_c.bn.running_mean", "features.conv2_2.c1x1_c.bn.running_var", "features.conv2_2.c1x1_c.conv.weight", "features.conv2_3.c1x1_a.bn.weight", "features.conv2_3.c1x1_a.bn.bias", "features.conv2_3.c1x1_a.bn.running_mean", "features.conv2_3.c1x1_a.bn.running_var", "features.conv2_3.c1x1_a.conv.weight", "features.conv2_3.c3x3_b.bn.weight", "features.conv2_3.c3x3_b.bn.bias", "features.conv2_3.c3x3_b.bn.running_mean", "features.conv2_3.c3x3_b.bn.running_var", "features.conv2_3.c3x3_b.conv.weight", "features.conv2_3.c1x1_c.bn.weight", "features.conv2_3.c1x1_c.bn.bias", "features.conv2_3.c1x1_c.bn.running_mean", "features.conv2_3.c1x1_c.bn.running_var", "features.conv2_3.c1x1_c.conv.weight", "features.conv3_1.c1x1_w_s2.bn.weight", "features.conv3_1.c1x1_w_s2.bn.bias", "features.conv3_1.c1x1_w_s2.bn.running_mean", "features.conv3_1.c1x1_w_s2.bn.running_var", "features.conv3_1.c1x1_w_s2.conv.weight", "features.conv3_1.c1x1_a.bn.weight", "features.conv3_1.c1x1_a.bn.bias", "features.conv3_1.c1x1_a.bn.running_mean", "features.conv3_1.c1x1_a.bn.running_var", "features.conv3_1.c1x1_a.conv.weight", "features.conv3_1.c3x3_b.bn.weight", "features.conv3_1.c3x3_b.bn.bias", "features.conv3_1.c3x3_b.bn.running_mean", "features.conv3_1.c3x3_b.bn.running_var", "features.conv3_1.c3x3_b.conv.weight", "features.conv3_1.c1x1_c.bn.weight", "features.conv3_1.c1x1_c.bn.bias", "features.conv3_1.c1x1_c.bn.running_mean", "features.conv3_1.c1x1_c.bn.running_var", "features.conv3_1.c1x1_c.conv.weight", "features.conv3_2.c1x1_a.bn.weight", "features.conv3_2.c1x1_a.bn.bias", "features.conv3_2.c1x1_a.bn.running_mean", "features.conv3_2.c1x1_a.bn.running_var", "features.conv3_2.c1x1_a.conv.weight", "features.conv3_2.c3x3_b.bn.weight", "features.conv3_2.c3x3_b.bn.bias", "features.conv3_2.c3x3_b.bn.running_mean", "features.conv3_2.c3x3_b.bn.running_var", "features.conv3_2.c3x3_b.conv.weight", "features.conv3_2.c1x1_c.bn.weight", "features.conv3_2.c1x1_c.bn.bias", "features.conv3_2.c1x1_c.bn.running_mean", "features.conv3_2.c1x1_c.bn.running_var", "features.conv3_2.c1x1_c.conv.weight", "features.conv3_3.c1x1_a.bn.weight", "features.conv3_3.c1x1_a.bn.bias", "features.conv3_3.c1x1_a.bn.running_mean", "features.conv3_3.c1x1_a.bn.running_var", "features.conv3_3.c1x1_a.conv.weight", "features.conv3_3.c3x3_b.bn.weight", "features.conv3_3.c3x3_b.bn.bias", "features.conv3_3.c3x3_b.bn.running_mean", "features.conv3_3.c3x3_b.bn.running_var", "features.conv3_3.c3x3_b.conv.weight", "features.conv3_3.c1x1_c.bn.weight", "features.conv3_3.c1x1_c.bn.bias", "features.conv3_3.c1x1_c.bn.running_mean", "features.conv3_3.c1x1_c.bn.running_var", "features.conv3_3.c1x1_c.conv.weight", "features.conv3_4.c1x1_a.bn.weight", "features.conv3_4.c1x1_a.bn.bias", "features.conv3_4.c1x1_a.bn.running_mean", "features.conv3_4.c1x1_a.bn.running_var", "features.conv3_4.c1x1_a.conv.weight", "features.conv3_4.c3x3_b.bn.weight", "features.conv3_4.c3x3_b.bn.bias", "features.conv3_4.c3x3_b.bn.running_mean", "features.conv3_4.c3x3_b.bn.running_var", "features.conv3_4.c3x3_b.conv.weight", "features.conv3_4.c1x1_c.bn.weight", "features.conv3_4.c1x1_c.bn.bias", "features.conv3_4.c1x1_c.bn.running_mean", "features.conv3_4.c1x1_c.bn.running_var", "features.conv3_4.c1x1_c.conv.weight", "features.conv4_1.c1x1_w_s2.bn.weight", "features.conv4_1.c1x1_w_s2.bn.bias", "features.conv4_1.c1x1_w_s2.bn.running_mean", "features.conv4_1.c1x1_w_s2.bn.running_var", "features.conv4_1.c1x1_w_s2.conv.weight", "features.conv4_1.c1x1_a.bn.weight", "features.conv4_1.c1x1_a.bn.bias", "features.conv4_1.c1x1_a.bn.running_mean", "features.conv4_1.c1x1_a.bn.running_var", "features.conv4_1.c1x1_a.conv.weight", "features.conv4_1.c3x3_b.bn.weight", "features.conv4_1.c3x3_b.bn.bias", "features.conv4_1.c3x3_b.bn.running_mean", "features.conv4_1.c3x3_b.bn.running_var", "features.conv4_1.c3x3_b.conv.weight", "features.conv4_1.c1x1_c.bn.weight", "features.conv4_1.c1x1_c.bn.bias", "features.conv4_1.c1x1_c.bn.running_mean", "features.conv4_1.c1x1_c.bn.running_var", "features.conv4_1.c1x1_c.conv.weight", "features.conv4_2.c1x1_a.bn.weight", "features.conv4_2.c1x1_a.bn.bias", "features.conv4_2.c1x1_a.bn.running_mean", "features.conv4_2.c1x1_a.bn.running_var", "features.conv4_2.c1x1_a.conv.weight", "features.conv4_2.c3x3_b.bn.weight", "features.conv4_2.c3x3_b.bn.bias", "features.conv4_2.c3x3_b.bn.running_mean", "features.conv4_2.c3x3_b.bn.running_var", "features.conv4_2.c3x3_b.conv.weight", "features.conv4_2.c1x1_c.bn.weight", "features.conv4_2.c1x1_c.bn.bias", "features.conv4_2.c1x1_c.bn.running_mean", "features.conv4_2.c1x1_c.bn.running_var", "features.conv4_2.c1x1_c.conv.weight", "features.conv4_3.c1x1_a.bn.weight", "features.conv4_3.c1x1_a.bn.bias", "features.conv4_3.c1x1_a.bn.running_mean", "features.conv4_3.c1x1_a.bn.running_var", "features.conv4_3.c1x1_a.conv.weight", "features.conv4_3.c3x3_b.bn.weight", "features.conv4_3.c3x3_b.bn.bias", "features.conv4_3.c3x3_b.bn.running_mean", "features.conv4_3.c3x3_b.bn.running_var", "features.conv4_3.c3x3_b.conv.weight", "features.conv4_3.c1x1_c.bn.weight", "features.conv4_3.c1x1_c.bn.bias", "features.conv4_3.c1x1_c.bn.running_mean", "features.conv4_3.c1x1_c.bn.running_var", "features.conv4_3.c1x1_c.conv.weight", "features.conv4_4.c1x1_a.bn.weight", "features.conv4_4.c1x1_a.bn.bias", "features.conv4_4.c1x1_a.bn.running_mean", "features.conv4_4.c1x1_a.bn.running_var", "features.conv4_4.c1x1_a.conv.weight", "features.conv4_4.c3x3_b.bn.weight", "features.conv4_4.c3x3_b.bn.bias", "features.conv4_4.c3x3_b.bn.running_mean", "features.conv4_4.c3x3_b.bn.running_var", "features.conv4_4.c3x3_b.conv.weight", "features.conv4_4.c1x1_c.bn.weight", "features.conv4_4.c1x1_c.bn.bias", "features.conv4_4.c1x1_c.bn.running_mean", "features.conv4_4.c1x1_c.bn.running_var", "features.conv4_4.c1x1_c.conv.weight", "features.conv4_5.c1x1_a.bn.weight", "features.conv4_5.c1x1_a.bn.bias", "features.conv4_5.c1x1_a.bn.running_mean", "features.conv4_5.c1x1_a.bn.running_var", "features.conv4_5.c1x1_a.conv.weight", "features.conv4_5.c3x3_b.bn.weight", "features.conv4_5.c3x3_b.bn.bias", "features.conv4_5.c3x3_b.bn.running_mean", "features.conv4_5.c3x3_b.bn.running_var", "features.conv4_5.c3x3_b.conv.weight", "features.conv4_5.c1x1_c.bn.weight", "features.conv4_5.c1x1_c.bn.bias", "features.conv4_5.c1x1_c.bn.running_mean", "features.conv4_5.c1x1_c.bn.running_var", "features.conv4_5.c1x1_c.conv.weight", "features.conv4_6.c1x1_a.bn.weight", "features.conv4_6.c1x1_a.bn.bias", "features.conv4_6.c1x1_a.bn.running_mean", "features.conv4_6.c1x1_a.bn.running_var", "features.conv4_6.c1x1_a.conv.weight", "features.conv4_6.c3x3_b.bn.weight", "features.conv4_6.c3x3_b.bn.bias", "features.conv4_6.c3x3_b.bn.running_mean", "features.conv4_6.c3x3_b.bn.running_var", "features.conv4_6.c3x3_b.conv.weight", "features.conv4_6.c1x1_c.bn.weight", "features.conv4_6.c1x1_c.bn.bias", "features.conv4_6.c1x1_c.bn.running_mean", "features.conv4_6.c1x1_c.bn.running_var", "features.conv4_6.c1x1_c.conv.weight", "features.conv4_7.c1x1_a.bn.weight", "features.conv4_7.c1x1_a.bn.bias", "features.conv4_7.c1x1_a.bn.running_mean", "features.conv4_7.c1x1_a.bn.running_var", "features.conv4_7.c1x1_a.conv.weight", "features.conv4_7.c3x3_b.bn.weight", "features.conv4_7.c3x3_b.bn.bias", "features.conv4_7.c3x3_b.bn.running_mean", "features.conv4_7.c3x3_b.bn.running_var", "features.conv4_7.c3x3_b.conv.weight", "features.conv4_7.c1x1_c.bn.weight", "features.conv4_7.c1x1_c.bn.bias", "features.conv4_7.c1x1_c.bn.running_mean", "features.conv4_7.c1x1_c.bn.running_var", "features.conv4_7.c1x1_c.conv.weight", "features.conv4_8.c1x1_a.bn.weight", "features.conv4_8.c1x1_a.bn.bias", "features.conv4_8.c1x1_a.bn.running_mean", "features.conv4_8.c1x1_a.bn.running_var", "features.conv4_8.c1x1_a.conv.weight", "features.conv4_8.c3x3_b.bn.weight", "features.conv4_8.c3x3_b.bn.bias", "features.conv4_8.c3x3_b.bn.running_mean", "features.conv4_8.c3x3_b.bn.running_var", "features.conv4_8.c3x3_b.conv.weight", "features.conv4_8.c1x1_c.bn.weight", "features.conv4_8.c1x1_c.bn.bias", "features.conv4_8.c1x1_c.bn.running_mean", "features.conv4_8.c1x1_c.bn.running_var", "features.conv4_8.c1x1_c.conv.weight", "features.conv4_9.c1x1_a.bn.weight", "features.conv4_9.c1x1_a.bn.bias", "features.conv4_9.c1x1_a.bn.running_mean", "features.conv4_9.c1x1_a.bn.running_var", "features.conv4_9.c1x1_a.conv.weight", "features.conv4_9.c3x3_b.bn.weight", "features.conv4_9.c3x3_b.bn.bias", "features.conv4_9.c3x3_b.bn.running_mean", "features.conv4_9.c3x3_b.bn.running_var", "features.conv4_9.c3x3_b.conv.weight", "features.conv4_9.c1x1_c.bn.weight", "features.conv4_9.c1x1_c.bn.bias", "features.conv4_9.c1x1_c.bn.running_mean", "features.conv4_9.c1x1_c.bn.running_var", "features.conv4_9.c1x1_c.conv.weight", "features.conv4_10.c1x1_a.bn.weight", "features.conv4_10.c1x1_a.bn.bias", "features.conv4_10.c1x1_a.bn.running_mean", "features.conv4_10.c1x1_a.bn.running_var", "features.conv4_10.c1x1_a.conv.weight", "features.conv4_10.c3x3_b.bn.weight", "features.conv4_10.c3x3_b.bn.bias", "features.conv4_10.c3x3_b.bn.running_mean", "features.conv4_10.c3x3_b.bn.running_var", "features.conv4_10.c3x3_b.conv.weight", "features.conv4_10.c1x1_c.bn.weight", "features.conv4_10.c1x1_c.bn.bias", "features.conv4_10.c1x1_c.bn.running_mean", "features.conv4_10.c1x1_c.bn.running_var", "features.conv4_10.c1x1_c.conv.weight", "features.conv4_11.c1x1_a.bn.weight", "features.conv4_11.c1x1_a.bn.bias", "features.conv4_11.c1x1_a.bn.running_mean", "features.conv4_11.c1x1_a.bn.running_var", "features.conv4_11.c1x1_a.conv.weight", "features.conv4_11.c3x3_b.bn.weight", "features.conv4_11.c3x3_b.bn.bias", "features.conv4_11.c3x3_b.bn.running_mean", "features.conv4_11.c3x3_b.bn.running_var", "features.conv4_11.c3x3_b.conv.weight", "features.conv4_11.c1x1_c.bn.weight", "features.conv4_11.c1x1_c.bn.bias", "features.conv4_11.c1x1_c.bn.running_mean", "features.conv4_11.c1x1_c.bn.running_var", "features.conv4_11.c1x1_c.conv.weight", "features.conv4_12.c1x1_a.bn.weight", "features.conv4_12.c1x1_a.bn.bias", "features.conv4_12.c1x1_a.bn.running_mean", "features.conv4_12.c1x1_a.bn.running_var", "features.conv4_12.c1x1_a.conv.weight", "features.conv4_12.c3x3_b.bn.weight", "features.conv4_12.c3x3_b.bn.bias", "features.conv4_12.c3x3_b.bn.running_mean", "features.conv4_12.c3x3_b.bn.running_var", "features.conv4_12.c3x3_b.conv.weight", "features.conv4_12.c1x1_c.bn.weight", "features.conv4_12.c1x1_c.bn.bias", "features.conv4_12.c1x1_c.bn.running_mean", "features.conv4_12.c1x1_c.bn.running_var", "features.conv4_12.c1x1_c.conv.weight", "features.conv4_13.c1x1_a.bn.weight", "features.conv4_13.c1x1_a.bn.bias", "features.conv4_13.c1x1_a.bn.running_mean", "features.conv4_13.c1x1_a.bn.running_var", "features.conv4_13.c1x1_a.conv.weight", "features.conv4_13.c3x3_b.bn.weight", "features.conv4_13.c3x3_b.bn.bias", "features.conv4_13.c3x3_b.bn.running_mean", "features.conv4_13.c3x3_b.bn.running_var", "features.conv4_13.c3x3_b.conv.weight", "features.conv4_13.c1x1_c.bn.weight", "features.conv4_13.c1x1_c.bn.bias", "features.conv4_13.c1x1_c.bn.running_mean", "features.conv4_13.c1x1_c.bn.running_var", "features.conv4_13.c1x1_c.conv.weight", "features.conv4_14.c1x1_a.bn.weight", "features.conv4_14.c1x1_a.bn.bias", "features.conv4_14.c1x1_a.bn.running_mean", "features.conv4_14.c1x1_a.bn.running_var", "features.conv4_14.c1x1_a.conv.weight", "features.conv4_14.c3x3_b.bn.weight", "features.conv4_14.c3x3_b.bn.bias", "features.conv4_14.c3x3_b.bn.running_mean", "features.conv4_14.c3x3_b.bn.running_var", "features.conv4_14.c3x3_b.conv.weight", "features.conv4_14.c1x1_c.bn.weight", "features.conv4_14.c1x1_c.bn.bias", "features.conv4_14.c1x1_c.bn.running_mean", "features.conv4_14.c1x1_c.bn.running_var", "features.conv4_14.c1x1_c.conv.weight", "features.conv4_15.c1x1_a.bn.weight", "features.conv4_15.c1x1_a.bn.bias", "features.conv4_15.c1x1_a.bn.running_mean", "features.conv4_15.c1x1_a.bn.running_var", "features.conv4_15.c1x1_a.conv.weight", "features.conv4_15.c3x3_b.bn.weight", "features.conv4_15.c3x3_b.bn.bias", "features.conv4_15.c3x3_b.bn.running_mean", "features.conv4_15.c3x3_b.bn.running_var", "features.conv4_15.c3x3_b.conv.weight", "features.conv4_15.c1x1_c.bn.weight", "features.conv4_15.c1x1_c.bn.bias", "features.conv4_15.c1x1_c.bn.running_mean", "features.conv4_15.c1x1_c.bn.running_var", "features.conv4_15.c1x1_c.conv.weight", "features.conv4_16.c1x1_a.bn.weight", "features.conv4_16.c1x1_a.bn.bias", "features.conv4_16.c1x1_a.bn.running_mean", "features.conv4_16.c1x1_a.bn.running_var", "features.conv4_16.c1x1_a.conv.weight", "features.conv4_16.c3x3_b.bn.weight", "features.conv4_16.c3x3_b.bn.bias", "features.conv4_16.c3x3_b.bn.running_mean", "features.conv4_16.c3x3_b.bn.running_var", "features.conv4_16.c3x3_b.conv.weight", "features.conv4_16.c1x1_c.bn.weight", "features.conv4_16.c1x1_c.bn.bias", "features.conv4_16.c1x1_c.bn.running_mean", "features.conv4_16.c1x1_c.bn.running_var", "features.conv4_16.c1x1_c.conv.weight", "features.conv4_17.c1x1_a.bn.weight", "features.conv4_17.c1x1_a.bn.bias", "features.conv4_17.c1x1_a.bn.running_mean", "features.conv4_17.c1x1_a.bn.running_var", "features.conv4_17.c1x1_a.conv.weight", "features.conv4_17.c3x3_b.bn.weight", "features.conv4_17.c3x3_b.bn.bias", "features.conv4_17.c3x3_b.bn.running_mean", "features.conv4_17.c3x3_b.bn.running_var", "features.conv4_17.c3x3_b.conv.weight", "features.conv4_17.c1x1_c.bn.weight", "features.conv4_17.c1x1_c.bn.bias", "features.conv4_17.c1x1_c.bn.running_mean", "features.conv4_17.c1x1_c.bn.running_var", "features.conv4_17.c1x1_c.conv.weight", "features.conv4_18.c1x1_a.bn.weight", "features.conv4_18.c1x1_a.bn.bias", "features.conv4_18.c1x1_a.bn.running_mean", "features.conv4_18.c1x1_a.bn.running_var", "features.conv4_18.c1x1_a.conv.weight", "features.conv4_18.c3x3_b.bn.weight", "features.conv4_18.c3x3_b.bn.bias", "features.conv4_18.c3x3_b.bn.running_mean", "features.conv4_18.c3x3_b.bn.running_var", "features.conv4_18.c3x3_b.conv.weight", "features.conv4_18.c1x1_c.bn.weight", "features.conv4_18.c1x1_c.bn.bias", "features.conv4_18.c1x1_c.bn.running_mean", "features.conv4_18.c1x1_c.bn.running_var", "features.conv4_18.c1x1_c.conv.weight", "features.conv4_19.c1x1_a.bn.weight", "features.conv4_19.c1x1_a.bn.bias", "features.conv4_19.c1x1_a.bn.running_mean", "features.conv4_19.c1x1_a.bn.running_var", "features.conv4_19.c1x1_a.conv.weight", "features.conv4_19.c3x3_b.bn.weight", "features.conv4_19.c3x3_b.bn.bias", "features.conv4_19.c3x3_b.bn.running_mean", "features.conv4_19.c3x3_b.bn.running_var", "features.conv4_19.c3x3_b.conv.weight", "features.conv4_19.c1x1_c.bn.weight", "features.conv4_19.c1x1_c.bn.bias", "features.conv4_19.c1x1_c.bn.running_mean", "features.conv4_19.c1x1_c.bn.running_var", "features.conv4_19.c1x1_c.conv.weight", "features.conv4_20.c1x1_a.bn.weight", "features.conv4_20.c1x1_a.bn.bias", "features.conv4_20.c1x1_a.bn.running_mean", "features.conv4_20.c1x1_a.bn.running_var", "features.conv4_20.c1x1_a.conv.weight", "features.conv4_20.c3x3_b.bn.weight", "features.conv4_20.c3x3_b.bn.bias", "features.conv4_20.c3x3_b.bn.running_mean", "features.conv4_20.c3x3_b.bn.running_var", "features.conv4_20.c3x3_b.conv.weight", "features.conv4_20.c1x1_c.bn.weight", "features.conv4_20.c1x1_c.bn.bias", "features.conv4_20.c1x1_c.bn.running_mean", "features.conv4_20.c1x1_c.bn.running_var", "features.conv4_20.c1x1_c.conv.weight", "features.conv5_1.c1x1_w_s2.bn.weight", "features.conv5_1.c1x1_w_s2.bn.bias", "features.conv5_1.c1x1_w_s2.bn.running_mean", "features.conv5_1.c1x1_w_s2.bn.running_var", "features.conv5_1.c1x1_w_s2.conv.weight", "features.conv5_1.c1x1_a.bn.weight", "features.conv5_1.c1x1_a.bn.bias", "features.conv5_1.c1x1_a.bn.running_mean", "features.conv5_1.c1x1_a.bn.running_var", "features.conv5_1.c1x1_a.conv.weight", "features.conv5_1.c3x3_b.bn.weight", "features.conv5_1.c3x3_b.bn.bias", "features.conv5_1.c3x3_b.bn.running_mean", "features.conv5_1.c3x3_b.bn.running_var", "features.conv5_1.c3x3_b.conv.weight", "features.conv5_1.c1x1_c.bn.weight", "features.conv5_1.c1x1_c.bn.bias", "features.conv5_1.c1x1_c.bn.running_mean", "features.conv5_1.c1x1_c.bn.running_var", "features.conv5_1.c1x1_c.conv.weight", "features.conv5_2.c1x1_a.bn.weight", "features.conv5_2.c1x1_a.bn.bias", "features.conv5_2.c1x1_a.bn.running_mean", "features.conv5_2.c1x1_a.bn.running_var", "features.conv5_2.c1x1_a.conv.weight", "features.conv5_2.c3x3_b.bn.weight", "features.conv5_2.c3x3_b.bn.bias", "features.conv5_2.c3x3_b.bn.running_mean", "features.conv5_2.c3x3_b.bn.running_var", "features.conv5_2.c3x3_b.conv.weight", "features.conv5_2.c1x1_c.bn.weight", "features.conv5_2.c1x1_c.bn.bias", "features.conv5_2.c1x1_c.bn.running_mean", "features.conv5_2.c1x1_c.bn.running_var", "features.conv5_2.c1x1_c.conv.weight", "features.conv5_3.c1x1_a.bn.weight", "features.conv5_3.c1x1_a.bn.bias", "features.conv5_3.c1x1_a.bn.running_mean", "features.conv5_3.c1x1_a.bn.running_var", "features.conv5_3.c1x1_a.conv.weight", "features.conv5_3.c3x3_b.bn.weight", "features.conv5_3.c3x3_b.bn.bias", "features.conv5_3.c3x3_b.bn.running_mean", "features.conv5_3.c3x3_b.bn.running_var", "features.conv5_3.c3x3_b.conv.weight", "features.conv5_3.c1x1_c.bn.weight", "features.conv5_3.c1x1_c.bn.bias", "features.conv5_3.c1x1_c.bn.running_mean", "features.conv5_3.c1x1_c.bn.running_var", "features.conv5_3.c1x1_c.conv.weight", "features.conv5_bn_ac.bn.weight", "features.conv5_bn_ac.bn.bias", "features.conv5_bn_ac.bn.running_mean", "features.conv5_bn_ac.bn.running_var", "classifier.weight", "classifier.bias".
Unexpected key(s) in state_dict: "patch_embed1.proj.weight", "patch_embed1.proj.bias", "patch_embed1.norm.weight", "patch_embed1.norm.bias", "patch_embed1.norm.running_mean", "patch_embed1.norm.running_var", "patch_embed1.norm.num_batches_tracked", "block1.0.layer_scale_1", "block1.0.layer_scale_2", "block1.0.norm1.weight", "block1.0.norm1.bias", "block1.0.norm1.running_mean", "block1.0.norm1.running_var", "block1.0.norm1.num_batches_tracked", "block1.0.attn.proj_1.weight", "block1.0.attn.proj_1.bias", "block1.0.attn.spatial_gating_unit.conv0.weight", "block1.0.attn.spatial_gating_unit.conv0.bias", "block1.0.attn.spatial_gating_unit.conv_spatial.weight", "block1.0.attn.spatial_gating_unit.conv_spatial.bias", "block1.0.attn.spatial_gating_unit.conv1.weight", "block1.0.attn.spatial_gating_unit.conv1.bias", "block1.0.attn.proj_2.weight", "block1.0.attn.proj_2.bias", "block1.0.norm2.weight", "block1.0.norm2.bias", "block1.0.norm2.running_mean", "block1.0.norm2.running_var", "block1.0.norm2.num_batches_tracked", "block1.0.mlp.fc1.weight", "block1.0.mlp.fc1.bias", "block1.0.mlp.dwconv.dwconv.weight", "block1.0.mlp.dwconv.dwconv.bias", "block1.0.mlp.fc2.weight", "block1.0.mlp.fc2.bias", "block1.1.layer_scale_1", "block1.1.layer_scale_2", "block1.1.norm1.weight", "block1.1.norm1.bias", "block1.1.norm1.running_mean", "block1.1.norm1.running_var", "block1.1.norm1.num_batches_tracked", "block1.1.attn.proj_1.weight", "block1.1.attn.proj_1.bias", "block1.1.attn.spatial_gating_unit.conv0.weight", "block1.1.attn.spatial_gating_unit.conv0.bias", "block1.1.attn.spatial_gating_unit.conv_spatial.weight", "block1.1.attn.spatial_gating_unit.conv_spatial.bias", "block1.1.attn.spatial_gating_unit.conv1.weight", "block1.1.attn.spatial_gating_unit.conv1.bias", "block1.1.attn.proj_2.weight", "block1.1.attn.proj_2.bias", "block1.1.norm2.weight", "block1.1.norm2.bias", "block1.1.norm2.running_mean", "block1.1.norm2.running_var", "block1.1.norm2.num_batches_tracked", "block1.1.mlp.fc1.weight", "block1.1.mlp.fc1.bias", "block1.1.mlp.dwconv.dwconv.weight", "block1.1.mlp.dwconv.dwconv.bias", "block1.1.mlp.fc2.weight", "block1.1.mlp.fc2.bias", "norm1.weight", "norm1.bias", "patch_embed2.proj.weight", "patch_embed2.proj.bias", "patch_embed2.norm.weight", "patch_embed2.norm.bias", "patch_embed2.norm.running_mean", "patch_embed2.norm.running_var", "patch_embed2.norm.num_batches_tracked", "block2.0.layer_scale_1", "block2.0.layer_scale_2", "block2.0.norm1.weight", "block2.0.norm1.bias", "block2.0.norm1.running_mean", "block2.0.norm1.running_var", "block2.0.norm1.num_batches_tracked", "block2.0.attn.proj_1.weight", "block2.0.attn.proj_1.bias", "block2.0.attn.spatial_gating_unit.conv0.weight", "block2.0.attn.spatial_gating_unit.conv0.bias", "block2.0.attn.spatial_gating_unit.conv_spatial.weight", "block2.0.attn.spatial_gating_unit.conv_spatial.bias", "block2.0.attn.spatial_gating_unit.conv1.weight", "block2.0.attn.spatial_gating_unit.conv1.bias", "block2.0.attn.proj_2.weight", "block2.0.attn.proj_2.bias", "block2.0.norm2.weight", "block2.0.norm2.bias", "block2.0.norm2.running_mean", "block2.0.norm2.running_var", "block2.0.norm2.num_batches_tracked", "block2.0.mlp.fc1.weight", "block2.0.mlp.fc1.bias", "block2.0.mlp.dwconv.dwconv.weight", "block2.0.mlp.dwconv.dwconv.bias", "block2.0.mlp.fc2.weight", "block2.0.mlp.fc2.bias", "block2.1.layer_scale_1", "block2.1.layer_scale_2", "block2.1.norm1.weight", "block2.1.norm1.bias", "block2.1.norm1.running_mean", "block2.1.norm1.running_var", "block2.1.norm1.num_batches_tracked", "block2.1.attn.proj_1.weight", "block2.1.attn.proj_1.bias", "block2.1.attn.spatial_gating_unit.conv0.weight", "block2.1.attn.spatial_gating_unit.conv0.bias", "block2.1.attn.spatial_gating_unit.conv_spatial.weight", "block2.1.attn.spatial_gating_unit.conv_spatial.bias", "block2.1.attn.spatial_gating_unit.conv1.weight", "block2.1.attn.spatial_gating_unit.conv1.bias", "block2.1.attn.proj_2.weight", "block2.1.attn.proj_2.bias", "block2.1.norm2.weight", "block2.1.norm2.bias", "block2.1.norm2.running_mean", "block2.1.norm2.running_var", "block2.1.norm2.num_batches_tracked", "block2.1.mlp.fc1.weight", "block2.1.mlp.fc1.bias", "block2.1.mlp.dwconv.dwconv.weight", "block2.1.mlp.dwconv.dwconv.bias", "block2.1.mlp.fc2.weight", "block2.1.mlp.fc2.bias", "norm2.weight", "norm2.bias", "patch_embed3.proj.weight", "patch_embed3.proj.bias", "patch_embed3.norm.weight", "patch_embed3.norm.bias", "patch_embed3.norm.running_mean", "patch_embed3.norm.running_var", "patch_embed3.norm.num_batches_tracked", "block3.0.layer_scale_1", "block3.0.layer_scale_2", "block3.0.norm1.weight", "block3.0.norm1.bias", "block3.0.norm1.running_mean", "block3.0.norm1.running_var", "block3.0.norm1.num_batches_tracked", "block3.0.attn.proj_1.weight", "block3.0.attn.proj_1.bias", "block3.0.attn.spatial_gating_unit.conv0.weight", "block3.0.attn.spatial_gating_unit.conv0.bias", "block3.0.attn.spatial_gating_unit.conv_spatial.weight", "block3.0.attn.spatial_gating_unit.conv_spatial.bias", "block3.0.attn.spatial_gating_unit.conv1.weight", "block3.0.attn.spatial_gating_unit.conv1.bias", "block3.0.attn.proj_2.weight", "block3.0.attn.proj_2.bias", "block3.0.norm2.weight", "block3.0.norm2.bias", "block3.0.norm2.running_mean", "block3.0.norm2.running_var", "block3.0.norm2.num_batches_tracked", "block3.0.mlp.fc1.weight", "block3.0.mlp.fc1.bias", "block3.0.mlp.dwconv.dwconv.weight", "block3.0.mlp.dwconv.dwconv.bias", "block3.0.mlp.fc2.weight", "block3.0.mlp.fc2.bias", "block3.1.layer_scale_1", "block3.1.layer_scale_2", "block3.1.norm1.weight", "block3.1.norm1.bias", "block3.1.norm1.running_mean", "block3.1.norm1.running_var", "block3.1.norm1.num_batches_tracked", "block3.1.attn.proj_1.weight", "block3.1.attn.proj_1.bias", "block3.1.attn.spatial_gating_unit.conv0.weight", "block3.1.attn.spatial_gating_unit.conv0.bias", "block3.1.attn.spatial_gating_unit.conv_spatial.weight", "block3.1.attn.spatial_gating_unit.conv_spatial.bias", "block3.1.attn.spatial_gating_unit.conv1.weight", "block3.1.attn.spatial_gating_unit.conv1.bias", "block3.1.attn.proj_2.weight", "block3.1.attn.proj_2.bias", "block3.1.norm2.weight", "block3.1.norm2.bias", "block3.1.norm2.running_mean", "block3.1.norm2.running_var", "block3.1.norm2.num_batches_tracked", "block3.1.mlp.fc1.weight", "block3.1.mlp.fc1.bias", "block3.1.mlp.dwconv.dwconv.weight", "block3.1.mlp.dwconv.dwconv.bias", "block3.1.mlp.fc2.weight", "block3.1.mlp.fc2.bias", "block3.2.layer_scale_1", "block3.2.layer_scale_2", "block3.2.norm1.weight", "block3.2.norm1.bias", "block3.2.norm1.running_mean", "block3.2.norm1.running_var", "block3.2.norm1.num_batches_tracked", "block3.2.attn.proj_1.weight", "block3.2.attn.proj_1.bias", "block3.2.attn.spatial_gating_unit.conv0.weight", "block3.2.attn.spatial_gating_unit.conv0.bias", "block3.2.attn.spatial_gating_unit.conv_spatial.weight", "block3.2.attn.spatial_gating_unit.conv_spatial.bias", "block3.2.attn.spatial_gating_unit.conv1.weight", "block3.2.attn.spatial_gating_unit.conv1.bias", "block3.2.attn.proj_2.weight", "block3.2.attn.proj_2.bias", "block3.2.norm2.weight", "block3.2.norm2.bias", "block3.2.norm2.running_mean", "block3.2.norm2.running_var", "block3.2.norm2.num_batches_tracked", "block3.2.mlp.fc1.weight", "block3.2.mlp.fc1.bias", "block3.2.mlp.dwconv.dwconv.weight", "block3.2.mlp.dwconv.dwconv.bias", "block3.2.mlp.fc2.weight", "block3.2.mlp.fc2.bias", "block3.3.layer_scale_1", "block3.3.layer_scale_2", "block3.3.norm1.weight", "block3.3.norm1.bias", "block3.3.norm1.running_mean", "block3.3.norm1.running_var", "block3.3.norm1.num_batches_tracked", "block3.3.attn.proj_1.weight", "block3.3.attn.proj_1.bias", "block3.3.attn.spatial_gating_unit.conv0.weight", "block3.3.attn.spatial_gating_unit.conv0.bias", "block3.3.attn.spatial_gating_unit.conv_spatial.weight", "block3.3.attn.spatial_gating_unit.conv_spatial.bias", "block3.3.attn.spatial_gating_unit.conv1.weight", "block3.3.attn.spatial_gating_unit.conv1.bias", "block3.3.attn.proj_2.weight", "block3.3.attn.proj_2.bias", "block3.3.norm2.weight", "block3.3.norm2.bias", "block3.3.norm2.running_mean", "block3.3.norm2.running_var", "block3.3.norm2.num_batches_tracked", "block3.3.mlp.fc1.weight", "block3.3.mlp.fc1.bias", "block3.3.mlp.dwconv.dwconv.weight", "block3.3.mlp.dwconv.dwconv.bias", "block3.3.mlp.fc2.weight", "block3.3.mlp.fc2.bias", "norm3.weight", "norm3.bias", "patch_embed4.proj.weight", "patch_embed4.proj.bias", "patch_embed4.norm.weight", "patch_embed4.norm.bias", "patch_embed4.norm.running_mean", "patch_embed4.norm.running_var", "patch_embed4.norm.num_batches_tracked", "block4.0.layer_scale_1", "block4.0.layer_scale_2", "block4.0.norm1.weight", "block4.0.norm1.bias", "block4.0.norm1.running_mean", "block4.0.norm1.running_var", "block4.0.norm1.num_batches_tracked", "block4.0.attn.proj_1.weight", "block4.0.attn.proj_1.bias", "block4.0.attn.spatial_gating_unit.conv0.weight", "block4.0.attn.spatial_gating_unit.conv0.bias", "block4.0.attn.spatial_gating_unit.conv_spatial.weight", "block4.0.attn.spatial_gating_unit.conv_spatial.bias", "block4.0.attn.spatial_gating_unit.conv1.weight", "block4.0.attn.spatial_gating_unit.conv1.bias", "block4.0.attn.proj_2.weight", "block4.0.attn.proj_2.bias", "block4.0.norm2.weight", "block4.0.norm2.bias", "block4.0.norm2.running_mean", "block4.0.norm2.running_var", "block4.0.norm2.num_batches_tracked", "block4.0.mlp.fc1.weight", "block4.0.mlp.fc1.bias", "block4.0.mlp.dwconv.dwconv.weight", "block4.0.mlp.dwconv.dwconv.bias", "block4.0.mlp.fc2.weight", "block4.0.mlp.fc2.bias", "block4.1.layer_scale_1", "block4.1.layer_scale_2", "block4.1.norm1.weight", "block4.1.norm1.bias", "block4.1.norm1.running_mean", "block4.1.norm1.running_var", "block4.1.norm1.num_batches_tracked", "block4.1.attn.proj_1.weight", "block4.1.attn.proj_1.bias", "block4.1.attn.spatial_gating_unit.conv0.weight", "block4.1.attn.spatial_gating_unit.conv0.bias", "block4.1.attn.spatial_gating_unit.conv_spatial.weight", "block4.1.attn.spatial_gating_unit.conv_spatial.bias", "block4.1.attn.spatial_gating_unit.conv1.weight", "block4.1.attn.spatial_gating_unit.conv1.bias", "block4.1.attn.proj_2.weight", "block4.1.attn.proj_2.bias", "block4.1.norm2.weight", "block4.1.norm2.bias", "block4.1.norm2.running_mean", "block4.1.norm2.running_var", "block4.1.norm2.num_batches_tracked", "block4.1.mlp.fc1.weight", "block4.1.mlp.fc1.bias", "block4.1.mlp.dwconv.dwconv.weight", "block4.1.mlp.dwconv.dwconv.bias", "block4.1.mlp.fc2.weight", "block4.1.mlp.fc2.bias", "norm4.weight", "norm4.bias", "head.weight", "head.bias".
Firstly many thanks to this great work. However, I find something weird in the codes, but I don't know whether it is intentional.
I find that SpatialAttention already has a shortcut connection in its forward function, but I also find another shortcut connection in Block's forward function. I think that the shortcuts are reduplicated twice for the same attention module, it should be better to remove the first residual connection here.
Please point out any possible mistakes in my comments, thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.