License: Apache License 2.0

Shell 0.75% Python 99.25%

van-classification's Introduction

Visual Attention Network (VAN) paper pdf

This is a PyTorch implementation of VAN proposed by our paper "Visual Attention Network".

Figure 1: Compare with different vision backbones on ImageNet-1K validation set.

Citation:

@article{guo2022visual,
  title={Visual Attention Network},
  author={Guo, Meng-Hao and Lu, Cheng-Ze and Liu, Zheng-Ning and Cheng, Ming-Ming and Hu, Shi-Min},
  journal={arXiv preprint arXiv:2202.09741},
  year={2022}
}

News:

2022.02.22 Release paper on ArXiv and code on github.

2022.02.25 Supported by Jimm

2022.03.15 Supported by Hugging Face.

2022.04 Supported by PaddleCls.

2022.05 Supported by OpenMMLab.

For More Code, please refer to Paper with code.

2022.07.08 Update paper on ArXiv. (ImageNet-22K results, SOTA for panoptic segmentation (58.2 PQ). Segmentation models are available.

Classification models see Here. We are working on it.

Abstract:

While originally designed for natural language processing (NLP) tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, the 2D nature of images brings three challenges for applying self-attention in computer vision. (1) Treating images as 1D sequences neglects their 2D structures. (2) The quadratic complexity is too expensive for high-resolution images. (3) It only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel large kernel attention (LKA) module to enable self-adaptive and long-range correlations in self-attention while avoiding the above issues. We further introduce a novel neural network based on LKA, namely Visual Attention Network (VAN). While extremely simple and efficient, VAN outperforms the state-of-the-art vision transformers (ViTs) and convolutional neural networks (CNNs) with a large margin in extensive experiments, including image classification, object detection, semantic segmentation, instance segmentation, etc.

Figure 2: Decomposition diagram of large-kernel convolution. A standard convolution can be decomposed into three parts: a depth-wise convolution (DW-Conv), a depth-wise dilation convolution (DW-D-Conv) and a 1×1 convolution (1×1 Conv).

Figure 3: The structure of different modules: (a) the proposed Large Kernel Attention (LKA); (b) non-attention module; (c) the self-attention module (d) a stage of our Visual Attention Network (VAN). CFF means convolutional feed-forward network. The difference between (a) and (b) is the element-wise multiply. It is worth noting that (c) is designed for 1D sequences. .

Image Classification

Data prepare: ImageNet with the following folder structure.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

2. VAN Models (IN-1K)

Model	#Params(M)	GFLOPs	Top1 Acc(%)	Download
VAN-B0	4.1	0.9	75.4	Google Drive, Tsinghua Cloud, Hugging Face 🤗
VAN-B1	13.9	2.5	81.1	Google Drive, Tsinghua Cloud, Hugging Face 🤗
VAN-B2	26.6	5.0	82.8	Google Drive, Tsinghua Cloud,Hugging Face 🤗,
VAN-B3	44.8	9.0	83.9	Google Drive, Tsinghua Cloud, Hugging Face 🤗
VAN-B4	TODO	TODO	TODO	TODO

3.Requirement

1. Pytorch >= 1.7
2. timm == 0.4.12

4. Train

We use 8 GPUs for training by default. Run command (It has been writen in train.sh):

MODEL=van_tiny # van_{tiny, small, base, large}
DROP_PATH=0.1 # drop path rates [0.1, 0.1, 0.1, 0.2] for [tiny, small, base, large]
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash distributed_train.sh 8 /path/to/imagenet \
	  --model $MODEL -b 128 --lr 1e-3 --drop-path $DROP_PATH

5. Validate

Run command (It has been writen in eval.sh) as:

MODEL=van_tiny # van_{tiny, small, base, large}
python3 validate.py /path/to/imagenet --model $MODEL \
  --checkpoint /path/to/model -b 128

6.Acknowledgment

Our implementation is mainly based on pytorch-image-models and PoolFormer. Thanks for their authors.

LICENSE

This repo is under the Apache-2.0 license. For commercial use, please contact the authors.

van-classification's People

Contributors

Stargazers

Watchers

Forkers

lgyoung achang146 menghaoguo cv-ip seominseok0429 leyrio devin-coder joyies future-superstar 86236291 zqyjason jasonyank quanliu1991 leofengxin fengxingxiang cuiheng1234 techthiyanes fokhruli ipanda555 mymuli mdhasanai ak391 1349115419 shkarupa-alex atlasgooo2 hust-wayne wpu93 weizhang-indi lsh9832 cdd1993 serissa xyz100h shiweiliuiiiiiii sunshine411 a140775 waiting-gy syjeon121 qiaoshin laoyangui wenruicai perfyperfect ceshine francescosaveriozuppichini logit507 bingyuanw lv-tuan kenic-yxz zhufenglong2 freedomcao shutengw annwfsly hongfel3 ephemeral182 neo-karl zhouwy19 houqb queenie88 kkahatapitiya jokergss wongxin fusica celsopitta jasonaidm piyushagni5 wguanjin forest-repo makex1n fyting implus dl-cnn caczhtus myth-coder lihua199710 thanhpham-1998 kuiba12138 lg2578 imanmousaei edwintenagyei367 cuteboyqq jy-go alkarnjn raywang-iat dongzhilin123 117lzy tinyy115

van-classification's Issues

Unstable as backbone for semantic segmentation

I've tried to use Van Large as backbone for binary semantic segmentation and found it very unstable.

Model: UpperNet. Previously well tested with Resnet50 and SwinBase.
Features: same as in https://github.com/Visual-Attention-Network/VAN-Segmentation

Just switching backbone to van large fails after 5 of 7 epochs: model generate nan outputs.
In the same time F1 and accuracy grows from 1 to 5 epochs (this is not a divergency).

Right now i have no time to dive deeper to find nan's source, so this is just a feedback.

Decomposition diagram of large-kernel convolution

Is there a mistake here? The figure is 7x7, not 13x13？

code for visualization

Thanks for your great work! I noticed that the visualization results of Grad-CAM are wonderful.
I would appreciate it if you could release the code!

预训练模型

哪位大神知道下载下来的models如何使用啊

Attention vs Add in LKA

In table 3, changing attention (mul) to add reduces VAN performance from 75.4 to 74.6. I think this is really huge. However, in the ablation study, you stated that "Besides, replacing attention with adding operation is also not achieving a lower accuracy". Is it okay to say it like that since the performance drop is 0.8

Can't treat add as a type of attention function? In Attention Mechanisms in Computer Vision: A Survey, we have the formula:

I can treat function f here is an addition operation can't I?

how to understand large kernel convolution decomposition？

Thank you so much for this enlightening and inspiring work.I don't quite understand why LKC(i.e. Large Kernel Convolution) = DW-Conv + DW-D-Conv + 1x1Conv.

My current understanding can refer to the following table:

So according to my understanding, what you want to express in Fig.2 is: DW-Conv, DW-D-Conv and 1x1Conv each have a part of LKC properties, and have lower computational complexity.That is to say, the plus sign and equal sign in Fig.2 are for the addition of Properties.

Is my understanding correct? If it is not correct, please give a more intuitive understanding. Thank you very much!

The model used for classification problem

Hello ! what's the model you used for the classification problem? I know the VANs is the backbone, the train.py indicates the model default is ResNet?

when release VAN-Detection?

When to use the freeze_patch_emb method

Hello, I have doubt about when to use the freeze_patch_emb function, did you freeze the param when you train your model?

found reduplicated shortcut connections for the same attention module

Firstly many thanks to this great work. However, I find something weird in the codes, but I don't know whether it is intentional.

I find that SpatialAttention already has a shortcut connection in its forward function, but I also find another shortcut connection in Block's forward function. I think that the shortcuts are reduplicated twice for the same attention module, it should be better to remove the first residual connection here.

Please point out any possible mistakes in my comments, thanks.

Why not use SyncBN

Thanks for your amazing work!It converges fastly and get amazing accuracy on imagenet.
In your pth file, I found args.sync_bn is False.
In your train.py, I found the model in your code broadcast buffers about bn before each forward instead of SyncBN.
So, I have a small question, why don't you use SyncBN?
Thanks very much!

Class activation mapping (CAM) Target Layer

How to choose the target layer to compute CAM on your VAN？

[Question] Object detection and instance segmentation

Hi, is there already a repo, or will there be upcoming ones, for the object detection and/or instance segmentation tasks? Thanks.

Issue about training

I have tried to train VAN-Classification on ImageNet as README does, but I find the top1 acc converges in 56% in stead of 75.4% . ( I trained on 1 GPU in stead of 8 GPUs, could this influence the result ? )

Why Sigmoid isn't used in the LKA module?

First, thanks for the great and helpful work!
I wonder why Sigmoid isn't used in the LKA module which can be written as:

Since in the previous works Sigmoid is always used as the last part of the attention module like SE-block[1] or CBAM[2]:

Are there any ablation studies about this change? Whether the fact that Sigmoid harms the performance results in this removal.
I cannot come up with why to remove Sigmoid except slight reduction in computational cost to the best of my knowledge. I would really appreciate it if you can answer my question!

[1]:Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.
[2]:Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3-19.

Why use .clone() for shortcut connection?

The forward functions for LKA and Attention both used u = x.clone() to detach u from the computation graph.
Is this a new trick? What advantages do you observe?
Thanks.

Question about semantic segmentation

Thanks for publishing such an excellent work ! It will be perfect if you could provide the semantic segmentation version. Really appreciate for your help !

Loading pretrained model error

Hello,
Thanks for your great work.
When I try to run validate.py using the model "van_small_811.pth.tar" downloading from google drive, I got the following error:

Error(s) in loading state_dict for DPN:
Missing key(s) in state_dict: "features.conv1_1.conv.weight", "features.conv1_1.bn.weight", "features.conv1_1.bn.bias", "features.conv1_1.bn.running_mean", "features.conv1_1.bn.running_var", "features.conv2_1.c1x1_w_s1.bn.weight", "features.conv2_1.c1x1_w_s1.bn.bias", "features.conv2_1.c1x1_w_s1.bn.running_mean", "features.conv2_1.c1x1_w_s1.bn.running_var", "features.conv2_1.c1x1_w_s1.conv.weight", "features.conv2_1.c1x1_a.bn.weight", "features.conv2_1.c1x1_a.bn.bias", "features.conv2_1.c1x1_a.bn.running_mean", "features.conv2_1.c1x1_a.bn.running_var", "features.conv2_1.c1x1_a.conv.weight", "features.conv2_1.c3x3_b.bn.weight", "features.conv2_1.c3x3_b.bn.bias", "features.conv2_1.c3x3_b.bn.running_mean", "features.conv2_1.c3x3_b.bn.running_var", "features.conv2_1.c3x3_b.conv.weight", "features.conv2_1.c1x1_c.bn.weight", "features.conv2_1.c1x1_c.bn.bias", "features.conv2_1.c1x1_c.bn.running_mean", "features.conv2_1.c1x1_c.bn.running_var", "features.conv2_1.c1x1_c.conv.weight", "features.conv2_2.c1x1_a.bn.weight", "features.conv2_2.c1x1_a.bn.bias", "features.conv2_2.c1x1_a.bn.running_mean", "features.conv2_2.c1x1_a.bn.running_var", "features.conv2_2.c1x1_a.conv.weight", "features.conv2_2.c3x3_b.bn.weight", "features.conv2_2.c3x3_b.bn.bias", "features.conv2_2.c3x3_b.bn.running_mean", "features.conv2_2.c3x3_b.bn.running_var", "features.conv2_2.c3x3_b.conv.weight", "features.conv2_2.c1x1_c.bn.weight", "features.conv2_2.c1x1_c.bn.bias", "features.conv2_2.c1x1_c.bn.running_mean", "features.conv2_2.c1x1_c.bn.running_var", "features.conv2_2.c1x1_c.conv.weight", "features.conv2_3.c1x1_a.bn.weight", "features.conv2_3.c1x1_a.bn.bias", "features.conv2_3.c1x1_a.bn.running_mean", "features.conv2_3.c1x1_a.bn.running_var", "features.conv2_3.c1x1_a.conv.weight", "features.conv2_3.c3x3_b.bn.weight", "features.conv2_3.c3x3_b.bn.bias", "features.conv2_3.c3x3_b.bn.running_mean", "features.conv2_3.c3x3_b.bn.running_var", "features.conv2_3.c3x3_b.conv.weight", "features.conv2_3.c1x1_c.bn.weight", "features.conv2_3.c1x1_c.bn.bias", "features.conv2_3.c1x1_c.bn.running_mean", "features.conv2_3.c1x1_c.bn.running_var", "features.conv2_3.c1x1_c.conv.weight", "features.conv3_1.c1x1_w_s2.bn.weight", "features.conv3_1.c1x1_w_s2.bn.bias", "features.conv3_1.c1x1_w_s2.bn.running_mean", "features.conv3_1.c1x1_w_s2.bn.running_var", "features.conv3_1.c1x1_w_s2.conv.weight", "features.conv3_1.c1x1_a.bn.weight", "features.conv3_1.c1x1_a.bn.bias", "features.conv3_1.c1x1_a.bn.running_mean", "features.conv3_1.c1x1_a.bn.running_var", "features.conv3_1.c1x1_a.conv.weight", "features.conv3_1.c3x3_b.bn.weight", "features.conv3_1.c3x3_b.bn.bias", "features.conv3_1.c3x3_b.bn.running_mean", "features.conv3_1.c3x3_b.bn.running_var", "features.conv3_1.c3x3_b.conv.weight", "features.conv3_1.c1x1_c.bn.weight", "features.conv3_1.c1x1_c.bn.bias", "features.conv3_1.c1x1_c.bn.running_mean", "features.conv3_1.c1x1_c.bn.running_var", "features.conv3_1.c1x1_c.conv.weight", "features.conv3_2.c1x1_a.bn.weight", "features.conv3_2.c1x1_a.bn.bias", "features.conv3_2.c1x1_a.bn.running_mean", "features.conv3_2.c1x1_a.bn.running_var", "features.conv3_2.c1x1_a.conv.weight", "features.conv3_2.c3x3_b.bn.weight", "features.conv3_2.c3x3_b.bn.bias", "features.conv3_2.c3x3_b.bn.running_mean", "features.conv3_2.c3x3_b.bn.running_var", "features.conv3_2.c3x3_b.conv.weight", "features.conv3_2.c1x1_c.bn.weight", "features.conv3_2.c1x1_c.bn.bias", "features.conv3_2.c1x1_c.bn.running_mean", "features.conv3_2.c1x1_c.bn.running_var", "features.conv3_2.c1x1_c.conv.weight", "features.conv3_3.c1x1_a.bn.weight", "features.conv3_3.c1x1_a.bn.bias", "features.conv3_3.c1x1_a.bn.running_mean", "features.conv3_3.c1x1_a.bn.running_var", "features.conv3_3.c1x1_a.conv.weight", "features.conv3_3.c3x3_b.bn.weight", "features.conv3_3.c3x3_b.bn.bias", "features.conv3_3.c3x3_b.bn.running_mean", "features.conv3_3.c3x3_b.bn.running_var", "features.conv3_3.c3x3_b.conv.weight", "features.conv3_3.c1x1_c.bn.weight", "features.conv3_3.c1x1_c.bn.bias", "features.conv3_3.c1x1_c.bn.running_mean", "features.conv3_3.c1x1_c.bn.running_var", "features.conv3_3.c1x1_c.conv.weight", "features.conv3_4.c1x1_a.bn.weight", "features.conv3_4.c1x1_a.bn.bias", "features.conv3_4.c1x1_a.bn.running_mean", "features.conv3_4.c1x1_a.bn.running_var", "features.conv3_4.c1x1_a.conv.weight", "features.conv3_4.c3x3_b.bn.weight", "features.conv3_4.c3x3_b.bn.bias", "features.conv3_4.c3x3_b.bn.running_mean", "features.conv3_4.c3x3_b.bn.running_var", "features.conv3_4.c3x3_b.conv.weight", "features.conv3_4.c1x1_c.bn.weight", "features.conv3_4.c1x1_c.bn.bias", "features.conv3_4.c1x1_c.bn.running_mean", "features.conv3_4.c1x1_c.bn.running_var", "features.conv3_4.c1x1_c.conv.weight", "features.conv4_1.c1x1_w_s2.bn.weight", "features.conv4_1.c1x1_w_s2.bn.bias", "features.conv4_1.c1x1_w_s2.bn.running_mean", "features.conv4_1.c1x1_w_s2.bn.running_var", "features.conv4_1.c1x1_w_s2.conv.weight", "features.conv4_1.c1x1_a.bn.weight", "features.conv4_1.c1x1_a.bn.bias", "features.conv4_1.c1x1_a.bn.running_mean", "features.conv4_1.c1x1_a.bn.running_var", "features.conv4_1.c1x1_a.conv.weight", "features.conv4_1.c3x3_b.bn.weight", "features.conv4_1.c3x3_b.bn.bias", "features.conv4_1.c3x3_b.bn.running_mean", "features.conv4_1.c3x3_b.bn.running_var", "features.conv4_1.c3x3_b.conv.weight", "features.conv4_1.c1x1_c.bn.weight", "features.conv4_1.c1x1_c.bn.bias", "features.conv4_1.c1x1_c.bn.running_mean", "features.conv4_1.c1x1_c.bn.running_var", "features.conv4_1.c1x1_c.conv.weight", "features.conv4_2.c1x1_a.bn.weight", "features.conv4_2.c1x1_a.bn.bias", "features.conv4_2.c1x1_a.bn.running_mean", "features.conv4_2.c1x1_a.bn.running_var", "features.conv4_2.c1x1_a.conv.weight", "features.conv4_2.c3x3_b.bn.weight", "features.conv4_2.c3x3_b.bn.bias", "features.conv4_2.c3x3_b.bn.running_mean", "features.conv4_2.c3x3_b.bn.running_var", "features.conv4_2.c3x3_b.conv.weight", "features.conv4_2.c1x1_c.bn.weight", "features.conv4_2.c1x1_c.bn.bias", "features.conv4_2.c1x1_c.bn.running_mean", "features.conv4_2.c1x1_c.bn.running_var", "features.conv4_2.c1x1_c.conv.weight", "features.conv4_3.c1x1_a.bn.weight", "features.conv4_3.c1x1_a.bn.bias", "features.conv4_3.c1x1_a.bn.running_mean", "features.conv4_3.c1x1_a.bn.running_var", "features.conv4_3.c1x1_a.conv.weight", "features.conv4_3.c3x3_b.bn.weight", "features.conv4_3.c3x3_b.bn.bias", "features.conv4_3.c3x3_b.bn.running_mean", "features.conv4_3.c3x3_b.bn.running_var", "features.conv4_3.c3x3_b.conv.weight", "features.conv4_3.c1x1_c.bn.weight", "features.conv4_3.c1x1_c.bn.bias", "features.conv4_3.c1x1_c.bn.running_mean", "features.conv4_3.c1x1_c.bn.running_var", "features.conv4_3.c1x1_c.conv.weight", "features.conv4_4.c1x1_a.bn.weight", "features.conv4_4.c1x1_a.bn.bias", "features.conv4_4.c1x1_a.bn.running_mean", "features.conv4_4.c1x1_a.bn.running_var", "features.conv4_4.c1x1_a.conv.weight", "features.conv4_4.c3x3_b.bn.weight", "features.conv4_4.c3x3_b.bn.bias", "features.conv4_4.c3x3_b.bn.running_mean", "features.conv4_4.c3x3_b.bn.running_var", "features.conv4_4.c3x3_b.conv.weight", "features.conv4_4.c1x1_c.bn.weight", "features.conv4_4.c1x1_c.bn.bias", "features.conv4_4.c1x1_c.bn.running_mean", "features.conv4_4.c1x1_c.bn.running_var", "features.conv4_4.c1x1_c.conv.weight", "features.conv4_5.c1x1_a.bn.weight", "features.conv4_5.c1x1_a.bn.bias", "features.conv4_5.c1x1_a.bn.running_mean", "features.conv4_5.c1x1_a.bn.running_var", "features.conv4_5.c1x1_a.conv.weight", "features.conv4_5.c3x3_b.bn.weight", "features.conv4_5.c3x3_b.bn.bias", "features.conv4_5.c3x3_b.bn.running_mean", "features.conv4_5.c3x3_b.bn.running_var", "features.conv4_5.c3x3_b.conv.weight", "features.conv4_5.c1x1_c.bn.weight", "features.conv4_5.c1x1_c.bn.bias", "features.conv4_5.c1x1_c.bn.running_mean", "features.conv4_5.c1x1_c.bn.running_var", "features.conv4_5.c1x1_c.conv.weight", "features.conv4_6.c1x1_a.bn.weight", "features.conv4_6.c1x1_a.bn.bias", "features.conv4_6.c1x1_a.bn.running_mean", "features.conv4_6.c1x1_a.bn.running_var", "features.conv4_6.c1x1_a.conv.weight", "features.conv4_6.c3x3_b.bn.weight", "features.conv4_6.c3x3_b.bn.bias", "features.conv4_6.c3x3_b.bn.running_mean", "features.conv4_6.c3x3_b.bn.running_var", "features.conv4_6.c3x3_b.conv.weight", "features.conv4_6.c1x1_c.bn.weight", "features.conv4_6.c1x1_c.bn.bias", "features.conv4_6.c1x1_c.bn.running_mean", "features.conv4_6.c1x1_c.bn.running_var", "features.conv4_6.c1x1_c.conv.weight", "features.conv4_7.c1x1_a.bn.weight", "features.conv4_7.c1x1_a.bn.bias", "features.conv4_7.c1x1_a.bn.running_mean", "features.conv4_7.c1x1_a.bn.running_var", "features.conv4_7.c1x1_a.conv.weight", "features.conv4_7.c3x3_b.bn.weight", "features.conv4_7.c3x3_b.bn.bias", "features.conv4_7.c3x3_b.bn.running_mean", "features.conv4_7.c3x3_b.bn.running_var", "features.conv4_7.c3x3_b.conv.weight", "features.conv4_7.c1x1_c.bn.weight", "features.conv4_7.c1x1_c.bn.bias", "features.conv4_7.c1x1_c.bn.running_mean", "features.conv4_7.c1x1_c.bn.running_var", "features.conv4_7.c1x1_c.conv.weight", "features.conv4_8.c1x1_a.bn.weight", "features.conv4_8.c1x1_a.bn.bias", "features.conv4_8.c1x1_a.bn.running_mean", "features.conv4_8.c1x1_a.bn.running_var", "features.conv4_8.c1x1_a.conv.weight", "features.conv4_8.c3x3_b.bn.weight", "features.conv4_8.c3x3_b.bn.bias", "features.conv4_8.c3x3_b.bn.running_mean", "features.conv4_8.c3x3_b.bn.running_var", "features.conv4_8.c3x3_b.conv.weight", "features.conv4_8.c1x1_c.bn.weight", "features.conv4_8.c1x1_c.bn.bias", "features.conv4_8.c1x1_c.bn.running_mean", "features.conv4_8.c1x1_c.bn.running_var", "features.conv4_8.c1x1_c.conv.weight", "features.conv4_9.c1x1_a.bn.weight", "features.conv4_9.c1x1_a.bn.bias", "features.conv4_9.c1x1_a.bn.running_mean", "features.conv4_9.c1x1_a.bn.running_var", "features.conv4_9.c1x1_a.conv.weight", "features.conv4_9.c3x3_b.bn.weight", "features.conv4_9.c3x3_b.bn.bias", "features.conv4_9.c3x3_b.bn.running_mean", "features.conv4_9.c3x3_b.bn.running_var", "features.conv4_9.c3x3_b.conv.weight", "features.conv4_9.c1x1_c.bn.weight", "features.conv4_9.c1x1_c.bn.bias", "features.conv4_9.c1x1_c.bn.running_mean", "features.conv4_9.c1x1_c.bn.running_var", "features.conv4_9.c1x1_c.conv.weight", "features.conv4_10.c1x1_a.bn.weight", "features.conv4_10.c1x1_a.bn.bias", "features.conv4_10.c1x1_a.bn.running_mean", "features.conv4_10.c1x1_a.bn.running_var", "features.conv4_10.c1x1_a.conv.weight", "features.conv4_10.c3x3_b.bn.weight", "features.conv4_10.c3x3_b.bn.bias", "features.conv4_10.c3x3_b.bn.running_mean", "features.conv4_10.c3x3_b.bn.running_var", "features.conv4_10.c3x3_b.conv.weight", "features.conv4_10.c1x1_c.bn.weight", "features.conv4_10.c1x1_c.bn.bias", "features.conv4_10.c1x1_c.bn.running_mean", "features.conv4_10.c1x1_c.bn.running_var", "features.conv4_10.c1x1_c.conv.weight", "features.conv4_11.c1x1_a.bn.weight", "features.conv4_11.c1x1_a.bn.bias", "features.conv4_11.c1x1_a.bn.running_mean", "features.conv4_11.c1x1_a.bn.running_var", "features.conv4_11.c1x1_a.conv.weight", "features.conv4_11.c3x3_b.bn.weight", "features.conv4_11.c3x3_b.bn.bias", "features.conv4_11.c3x3_b.bn.running_mean", "features.conv4_11.c3x3_b.bn.running_var", "features.conv4_11.c3x3_b.conv.weight", "features.conv4_11.c1x1_c.bn.weight", "features.conv4_11.c1x1_c.bn.bias", "features.conv4_11.c1x1_c.bn.running_mean", "features.conv4_11.c1x1_c.bn.running_var", "features.conv4_11.c1x1_c.conv.weight", "features.conv4_12.c1x1_a.bn.weight", "features.conv4_12.c1x1_a.bn.bias", "features.conv4_12.c1x1_a.bn.running_mean", "features.conv4_12.c1x1_a.bn.running_var", "features.conv4_12.c1x1_a.conv.weight", "features.conv4_12.c3x3_b.bn.weight", "features.conv4_12.c3x3_b.bn.bias", "features.conv4_12.c3x3_b.bn.running_mean", "features.conv4_12.c3x3_b.bn.running_var", "features.conv4_12.c3x3_b.conv.weight", "features.conv4_12.c1x1_c.bn.weight", "features.conv4_12.c1x1_c.bn.bias", "features.conv4_12.c1x1_c.bn.running_mean", "features.conv4_12.c1x1_c.bn.running_var", "features.conv4_12.c1x1_c.conv.weight", "features.conv4_13.c1x1_a.bn.weight", "features.conv4_13.c1x1_a.bn.bias", "features.conv4_13.c1x1_a.bn.running_mean", "features.conv4_13.c1x1_a.bn.running_var", "features.conv4_13.c1x1_a.conv.weight", "features.conv4_13.c3x3_b.bn.weight", "features.conv4_13.c3x3_b.bn.bias", "features.conv4_13.c3x3_b.bn.running_mean", "features.conv4_13.c3x3_b.bn.running_var", "features.conv4_13.c3x3_b.conv.weight", "features.conv4_13.c1x1_c.bn.weight", "features.conv4_13.c1x1_c.bn.bias", "features.conv4_13.c1x1_c.bn.running_mean", "features.conv4_13.c1x1_c.bn.running_var", "features.conv4_13.c1x1_c.conv.weight", "features.conv4_14.c1x1_a.bn.weight", "features.conv4_14.c1x1_a.bn.bias", "features.conv4_14.c1x1_a.bn.running_mean", "features.conv4_14.c1x1_a.bn.running_var", "features.conv4_14.c1x1_a.conv.weight", "features.conv4_14.c3x3_b.bn.weight", "features.conv4_14.c3x3_b.bn.bias", "features.conv4_14.c3x3_b.bn.running_mean", "features.conv4_14.c3x3_b.bn.running_var", "features.conv4_14.c3x3_b.conv.weight", "features.conv4_14.c1x1_c.bn.weight", "features.conv4_14.c1x1_c.bn.bias", "features.conv4_14.c1x1_c.bn.running_mean", "features.conv4_14.c1x1_c.bn.running_var", "features.conv4_14.c1x1_c.conv.weight", "features.conv4_15.c1x1_a.bn.weight", "features.conv4_15.c1x1_a.bn.bias", "features.conv4_15.c1x1_a.bn.running_mean", "features.conv4_15.c1x1_a.bn.running_var", "features.conv4_15.c1x1_a.conv.weight", "features.conv4_15.c3x3_b.bn.weight", "features.conv4_15.c3x3_b.bn.bias", "features.conv4_15.c3x3_b.bn.running_mean", "features.conv4_15.c3x3_b.bn.running_var", "features.conv4_15.c3x3_b.conv.weight", "features.conv4_15.c1x1_c.bn.weight", "features.conv4_15.c1x1_c.bn.bias", "features.conv4_15.c1x1_c.bn.running_mean", "features.conv4_15.c1x1_c.bn.running_var", "features.conv4_15.c1x1_c.conv.weight", "features.conv4_16.c1x1_a.bn.weight", "features.conv4_16.c1x1_a.bn.bias", "features.conv4_16.c1x1_a.bn.running_mean", "features.conv4_16.c1x1_a.bn.running_var", "features.conv4_16.c1x1_a.conv.weight", "features.conv4_16.c3x3_b.bn.weight", "features.conv4_16.c3x3_b.bn.bias", "features.conv4_16.c3x3_b.bn.running_mean", "features.conv4_16.c3x3_b.bn.running_var", "features.conv4_16.c3x3_b.conv.weight", "features.conv4_16.c1x1_c.bn.weight", "features.conv4_16.c1x1_c.bn.bias", "features.conv4_16.c1x1_c.bn.running_mean", "features.conv4_16.c1x1_c.bn.running_var", "features.conv4_16.c1x1_c.conv.weight", "features.conv4_17.c1x1_a.bn.weight", "features.conv4_17.c1x1_a.bn.bias", "features.conv4_17.c1x1_a.bn.running_mean", "features.conv4_17.c1x1_a.bn.running_var", "features.conv4_17.c1x1_a.conv.weight", "features.conv4_17.c3x3_b.bn.weight", "features.conv4_17.c3x3_b.bn.bias", "features.conv4_17.c3x3_b.bn.running_mean", "features.conv4_17.c3x3_b.bn.running_var", "features.conv4_17.c3x3_b.conv.weight", "features.conv4_17.c1x1_c.bn.weight", "features.conv4_17.c1x1_c.bn.bias", "features.conv4_17.c1x1_c.bn.running_mean", "features.conv4_17.c1x1_c.bn.running_var", "features.conv4_17.c1x1_c.conv.weight", "features.conv4_18.c1x1_a.bn.weight", "features.conv4_18.c1x1_a.bn.bias", "features.conv4_18.c1x1_a.bn.running_mean", "features.conv4_18.c1x1_a.bn.running_var", "features.conv4_18.c1x1_a.conv.weight", "features.conv4_18.c3x3_b.bn.weight", "features.conv4_18.c3x3_b.bn.bias", "features.conv4_18.c3x3_b.bn.running_mean", "features.conv4_18.c3x3_b.bn.running_var", "features.conv4_18.c3x3_b.conv.weight", "features.conv4_18.c1x1_c.bn.weight", "features.conv4_18.c1x1_c.bn.bias", "features.conv4_18.c1x1_c.bn.running_mean", "features.conv4_18.c1x1_c.bn.running_var", "features.conv4_18.c1x1_c.conv.weight", "features.conv4_19.c1x1_a.bn.weight", "features.conv4_19.c1x1_a.bn.bias", "features.conv4_19.c1x1_a.bn.running_mean", "features.conv4_19.c1x1_a.bn.running_var", "features.conv4_19.c1x1_a.conv.weight", "features.conv4_19.c3x3_b.bn.weight", "features.conv4_19.c3x3_b.bn.bias", "features.conv4_19.c3x3_b.bn.running_mean", "features.conv4_19.c3x3_b.bn.running_var", "features.conv4_19.c3x3_b.conv.weight", "features.conv4_19.c1x1_c.bn.weight", "features.conv4_19.c1x1_c.bn.bias", "features.conv4_19.c1x1_c.bn.running_mean", "features.conv4_19.c1x1_c.bn.running_var", "features.conv4_19.c1x1_c.conv.weight", "features.conv4_20.c1x1_a.bn.weight", "features.conv4_20.c1x1_a.bn.bias", "features.conv4_20.c1x1_a.bn.running_mean", "features.conv4_20.c1x1_a.bn.running_var", "features.conv4_20.c1x1_a.conv.weight", "features.conv4_20.c3x3_b.bn.weight", "features.conv4_20.c3x3_b.bn.bias", "features.conv4_20.c3x3_b.bn.running_mean", "features.conv4_20.c3x3_b.bn.running_var", "features.conv4_20.c3x3_b.conv.weight", "features.conv4_20.c1x1_c.bn.weight", "features.conv4_20.c1x1_c.bn.bias", "features.conv4_20.c1x1_c.bn.running_mean", "features.conv4_20.c1x1_c.bn.running_var", "features.conv4_20.c1x1_c.conv.weight", "features.conv5_1.c1x1_w_s2.bn.weight", "features.conv5_1.c1x1_w_s2.bn.bias", "features.conv5_1.c1x1_w_s2.bn.running_mean", "features.conv5_1.c1x1_w_s2.bn.running_var", "features.conv5_1.c1x1_w_s2.conv.weight", "features.conv5_1.c1x1_a.bn.weight", "features.conv5_1.c1x1_a.bn.bias", "features.conv5_1.c1x1_a.bn.running_mean", "features.conv5_1.c1x1_a.bn.running_var", "features.conv5_1.c1x1_a.conv.weight", "features.conv5_1.c3x3_b.bn.weight", "features.conv5_1.c3x3_b.bn.bias", "features.conv5_1.c3x3_b.bn.running_mean", "features.conv5_1.c3x3_b.bn.running_var", "features.conv5_1.c3x3_b.conv.weight", "features.conv5_1.c1x1_c.bn.weight", "features.conv5_1.c1x1_c.bn.bias", "features.conv5_1.c1x1_c.bn.running_mean", "features.conv5_1.c1x1_c.bn.running_var", "features.conv5_1.c1x1_c.conv.weight", "features.conv5_2.c1x1_a.bn.weight", "features.conv5_2.c1x1_a.bn.bias", "features.conv5_2.c1x1_a.bn.running_mean", "features.conv5_2.c1x1_a.bn.running_var", "features.conv5_2.c1x1_a.conv.weight", "features.conv5_2.c3x3_b.bn.weight", "features.conv5_2.c3x3_b.bn.bias", "features.conv5_2.c3x3_b.bn.running_mean", "features.conv5_2.c3x3_b.bn.running_var", "features.conv5_2.c3x3_b.conv.weight", "features.conv5_2.c1x1_c.bn.weight", "features.conv5_2.c1x1_c.bn.bias", "features.conv5_2.c1x1_c.bn.running_mean", "features.conv5_2.c1x1_c.bn.running_var", "features.conv5_2.c1x1_c.conv.weight", "features.conv5_3.c1x1_a.bn.weight", "features.conv5_3.c1x1_a.bn.bias", "features.conv5_3.c1x1_a.bn.running_mean", "features.conv5_3.c1x1_a.bn.running_var", "features.conv5_3.c1x1_a.conv.weight", "features.conv5_3.c3x3_b.bn.weight", "features.conv5_3.c3x3_b.bn.bias", "features.conv5_3.c3x3_b.bn.running_mean", "features.conv5_3.c3x3_b.bn.running_var", "features.conv5_3.c3x3_b.conv.weight", "features.conv5_3.c1x1_c.bn.weight", "features.conv5_3.c1x1_c.bn.bias", "features.conv5_3.c1x1_c.bn.running_mean", "features.conv5_3.c1x1_c.bn.running_var", "features.conv5_3.c1x1_c.conv.weight", "features.conv5_bn_ac.bn.weight", "features.conv5_bn_ac.bn.bias", "features.conv5_bn_ac.bn.running_mean", "features.conv5_bn_ac.bn.running_var", "classifier.weight", "classifier.bias".
Unexpected key(s) in state_dict: "patch_embed1.proj.weight", "patch_embed1.proj.bias", "patch_embed1.norm.weight", "patch_embed1.norm.bias", "patch_embed1.norm.running_mean", "patch_embed1.norm.running_var", "patch_embed1.norm.num_batches_tracked", "block1.0.layer_scale_1", "block1.0.layer_scale_2", "block1.0.norm1.weight", "block1.0.norm1.bias", "block1.0.norm1.running_mean", "block1.0.norm1.running_var", "block1.0.norm1.num_batches_tracked", "block1.0.attn.proj_1.weight", "block1.0.attn.proj_1.bias", "block1.0.attn.spatial_gating_unit.conv0.weight", "block1.0.attn.spatial_gating_unit.conv0.bias", "block1.0.attn.spatial_gating_unit.conv_spatial.weight", "block1.0.attn.spatial_gating_unit.conv_spatial.bias", "block1.0.attn.spatial_gating_unit.conv1.weight", "block1.0.attn.spatial_gating_unit.conv1.bias", "block1.0.attn.proj_2.weight", "block1.0.attn.proj_2.bias", "block1.0.norm2.weight", "block1.0.norm2.bias", "block1.0.norm2.running_mean", "block1.0.norm2.running_var", "block1.0.norm2.num_batches_tracked", "block1.0.mlp.fc1.weight", "block1.0.mlp.fc1.bias", "block1.0.mlp.dwconv.dwconv.weight", "block1.0.mlp.dwconv.dwconv.bias", "block1.0.mlp.fc2.weight", "block1.0.mlp.fc2.bias", "block1.1.layer_scale_1", "block1.1.layer_scale_2", "block1.1.norm1.weight", "block1.1.norm1.bias", "block1.1.norm1.running_mean", "block1.1.norm1.running_var", "block1.1.norm1.num_batches_tracked", "block1.1.attn.proj_1.weight", "block1.1.attn.proj_1.bias", "block1.1.attn.spatial_gating_unit.conv0.weight", "block1.1.attn.spatial_gating_unit.conv0.bias", "block1.1.attn.spatial_gating_unit.conv_spatial.weight", "block1.1.attn.spatial_gating_unit.conv_spatial.bias", "block1.1.attn.spatial_gating_unit.conv1.weight", "block1.1.attn.spatial_gating_unit.conv1.bias", "block1.1.attn.proj_2.weight", "block1.1.attn.proj_2.bias", "block1.1.norm2.weight", "block1.1.norm2.bias", "block1.1.norm2.running_mean", "block1.1.norm2.running_var", "block1.1.norm2.num_batches_tracked", "block1.1.mlp.fc1.weight", "block1.1.mlp.fc1.bias", "block1.1.mlp.dwconv.dwconv.weight", "block1.1.mlp.dwconv.dwconv.bias", "block1.1.mlp.fc2.weight", "block1.1.mlp.fc2.bias", "norm1.weight", "norm1.bias", "patch_embed2.proj.weight", "patch_embed2.proj.bias", "patch_embed2.norm.weight", "patch_embed2.norm.bias", "patch_embed2.norm.running_mean", "patch_embed2.norm.running_var", "patch_embed2.norm.num_batches_tracked", "block2.0.layer_scale_1", "block2.0.layer_scale_2", "block2.0.norm1.weight", "block2.0.norm1.bias", "block2.0.norm1.running_mean", "block2.0.norm1.running_var", "block2.0.norm1.num_batches_tracked", "block2.0.attn.proj_1.weight", "block2.0.attn.proj_1.bias", "block2.0.attn.spatial_gating_unit.conv0.weight", "block2.0.attn.spatial_gating_unit.conv0.bias", "block2.0.attn.spatial_gating_unit.conv_spatial.weight", "block2.0.attn.spatial_gating_unit.conv_spatial.bias", "block2.0.attn.spatial_gating_unit.conv1.weight", "block2.0.attn.spatial_gating_unit.conv1.bias", "block2.0.attn.proj_2.weight", "block2.0.attn.proj_2.bias", "block2.0.norm2.weight", "block2.0.norm2.bias", "block2.0.norm2.running_mean", "block2.0.norm2.running_var", "block2.0.norm2.num_batches_tracked", "block2.0.mlp.fc1.weight", "block2.0.mlp.fc1.bias", "block2.0.mlp.dwconv.dwconv.weight", "block2.0.mlp.dwconv.dwconv.bias", "block2.0.mlp.fc2.weight", "block2.0.mlp.fc2.bias", "block2.1.layer_scale_1", "block2.1.layer_scale_2", "block2.1.norm1.weight", "block2.1.norm1.bias", "block2.1.norm1.running_mean", "block2.1.norm1.running_var", "block2.1.norm1.num_batches_tracked", "block2.1.attn.proj_1.weight", "block2.1.attn.proj_1.bias", "block2.1.attn.spatial_gating_unit.conv0.weight", "block2.1.attn.spatial_gating_unit.conv0.bias", "block2.1.attn.spatial_gating_unit.conv_spatial.weight", "block2.1.attn.spatial_gating_unit.conv_spatial.bias", "block2.1.attn.spatial_gating_unit.conv1.weight", "block2.1.attn.spatial_gating_unit.conv1.bias", "block2.1.attn.proj_2.weight", "block2.1.attn.proj_2.bias", "block2.1.norm2.weight", "block2.1.norm2.bias", "block2.1.norm2.running_mean", "block2.1.norm2.running_var", "block2.1.norm2.num_batches_tracked", "block2.1.mlp.fc1.weight", "block2.1.mlp.fc1.bias", "block2.1.mlp.dwconv.dwconv.weight", "block2.1.mlp.dwconv.dwconv.bias", "block2.1.mlp.fc2.weight", "block2.1.mlp.fc2.bias", "norm2.weight", "norm2.bias", "patch_embed3.proj.weight", "patch_embed3.proj.bias", "patch_embed3.norm.weight", "patch_embed3.norm.bias", "patch_embed3.norm.running_mean", "patch_embed3.norm.running_var", "patch_embed3.norm.num_batches_tracked", "block3.0.layer_scale_1", "block3.0.layer_scale_2", "block3.0.norm1.weight", "block3.0.norm1.bias", "block3.0.norm1.running_mean", "block3.0.norm1.running_var", "block3.0.norm1.num_batches_tracked", "block3.0.attn.proj_1.weight", "block3.0.attn.proj_1.bias", "block3.0.attn.spatial_gating_unit.conv0.weight", "block3.0.attn.spatial_gating_unit.conv0.bias", "block3.0.attn.spatial_gating_unit.conv_spatial.weight", "block3.0.attn.spatial_gating_unit.conv_spatial.bias", "block3.0.attn.spatial_gating_unit.conv1.weight", "block3.0.attn.spatial_gating_unit.conv1.bias", "block3.0.attn.proj_2.weight", "block3.0.attn.proj_2.bias", "block3.0.norm2.weight", "block3.0.norm2.bias", "block3.0.norm2.running_mean", "block3.0.norm2.running_var", "block3.0.norm2.num_batches_tracked", "block3.0.mlp.fc1.weight", "block3.0.mlp.fc1.bias", "block3.0.mlp.dwconv.dwconv.weight", "block3.0.mlp.dwconv.dwconv.bias", "block3.0.mlp.fc2.weight", "block3.0.mlp.fc2.bias", "block3.1.layer_scale_1", "block3.1.layer_scale_2", "block3.1.norm1.weight", "block3.1.norm1.bias", "block3.1.norm1.running_mean", "block3.1.norm1.running_var", "block3.1.norm1.num_batches_tracked", "block3.1.attn.proj_1.weight", "block3.1.attn.proj_1.bias", "block3.1.attn.spatial_gating_unit.conv0.weight", "block3.1.attn.spatial_gating_unit.conv0.bias", "block3.1.attn.spatial_gating_unit.conv_spatial.weight", "block3.1.attn.spatial_gating_unit.conv_spatial.bias", "block3.1.attn.spatial_gating_unit.conv1.weight", "block3.1.attn.spatial_gating_unit.conv1.bias", "block3.1.attn.proj_2.weight", "block3.1.attn.proj_2.bias", "block3.1.norm2.weight", "block3.1.norm2.bias", "block3.1.norm2.running_mean", "block3.1.norm2.running_var", "block3.1.norm2.num_batches_tracked", "block3.1.mlp.fc1.weight", "block3.1.mlp.fc1.bias", "block3.1.mlp.dwconv.dwconv.weight", "block3.1.mlp.dwconv.dwconv.bias", "block3.1.mlp.fc2.weight", "block3.1.mlp.fc2.bias", "block3.2.layer_scale_1", "block3.2.layer_scale_2", "block3.2.norm1.weight", "block3.2.norm1.bias", "block3.2.norm1.running_mean", "block3.2.norm1.running_var", "block3.2.norm1.num_batches_tracked", "block3.2.attn.proj_1.weight", "block3.2.attn.proj_1.bias", "block3.2.attn.spatial_gating_unit.conv0.weight", "block3.2.attn.spatial_gating_unit.conv0.bias", "block3.2.attn.spatial_gating_unit.conv_spatial.weight", "block3.2.attn.spatial_gating_unit.conv_spatial.bias", "block3.2.attn.spatial_gating_unit.conv1.weight", "block3.2.attn.spatial_gating_unit.conv1.bias", "block3.2.attn.proj_2.weight", "block3.2.attn.proj_2.bias", "block3.2.norm2.weight", "block3.2.norm2.bias", "block3.2.norm2.running_mean", "block3.2.norm2.running_var", "block3.2.norm2.num_batches_tracked", "block3.2.mlp.fc1.weight", "block3.2.mlp.fc1.bias", "block3.2.mlp.dwconv.dwconv.weight", "block3.2.mlp.dwconv.dwconv.bias", "block3.2.mlp.fc2.weight", "block3.2.mlp.fc2.bias", "block3.3.layer_scale_1", "block3.3.layer_scale_2", "block3.3.norm1.weight", "block3.3.norm1.bias", "block3.3.norm1.running_mean", "block3.3.norm1.running_var", "block3.3.norm1.num_batches_tracked", "block3.3.attn.proj_1.weight", "block3.3.attn.proj_1.bias", "block3.3.attn.spatial_gating_unit.conv0.weight", "block3.3.attn.spatial_gating_unit.conv0.bias", "block3.3.attn.spatial_gating_unit.conv_spatial.weight", "block3.3.attn.spatial_gating_unit.conv_spatial.bias", "block3.3.attn.spatial_gating_unit.conv1.weight", "block3.3.attn.spatial_gating_unit.conv1.bias", "block3.3.attn.proj_2.weight", "block3.3.attn.proj_2.bias", "block3.3.norm2.weight", "block3.3.norm2.bias", "block3.3.norm2.running_mean", "block3.3.norm2.running_var", "block3.3.norm2.num_batches_tracked", "block3.3.mlp.fc1.weight", "block3.3.mlp.fc1.bias", "block3.3.mlp.dwconv.dwconv.weight", "block3.3.mlp.dwconv.dwconv.bias", "block3.3.mlp.fc2.weight", "block3.3.mlp.fc2.bias", "norm3.weight", "norm3.bias", "patch_embed4.proj.weight", "patch_embed4.proj.bias", "patch_embed4.norm.weight", "patch_embed4.norm.bias", "patch_embed4.norm.running_mean", "patch_embed4.norm.running_var", "patch_embed4.norm.num_batches_tracked", "block4.0.layer_scale_1", "block4.0.layer_scale_2", "block4.0.norm1.weight", "block4.0.norm1.bias", "block4.0.norm1.running_mean", "block4.0.norm1.running_var", "block4.0.norm1.num_batches_tracked", "block4.0.attn.proj_1.weight", "block4.0.attn.proj_1.bias", "block4.0.attn.spatial_gating_unit.conv0.weight", "block4.0.attn.spatial_gating_unit.conv0.bias", "block4.0.attn.spatial_gating_unit.conv_spatial.weight", "block4.0.attn.spatial_gating_unit.conv_spatial.bias", "block4.0.attn.spatial_gating_unit.conv1.weight", "block4.0.attn.spatial_gating_unit.conv1.bias", "block4.0.attn.proj_2.weight", "block4.0.attn.proj_2.bias", "block4.0.norm2.weight", "block4.0.norm2.bias", "block4.0.norm2.running_mean", "block4.0.norm2.running_var", "block4.0.norm2.num_batches_tracked", "block4.0.mlp.fc1.weight", "block4.0.mlp.fc1.bias", "block4.0.mlp.dwconv.dwconv.weight", "block4.0.mlp.dwconv.dwconv.bias", "block4.0.mlp.fc2.weight", "block4.0.mlp.fc2.bias", "block4.1.layer_scale_1", "block4.1.layer_scale_2", "block4.1.norm1.weight", "block4.1.norm1.bias", "block4.1.norm1.running_mean", "block4.1.norm1.running_var", "block4.1.norm1.num_batches_tracked", "block4.1.attn.proj_1.weight", "block4.1.attn.proj_1.bias", "block4.1.attn.spatial_gating_unit.conv0.weight", "block4.1.attn.spatial_gating_unit.conv0.bias", "block4.1.attn.spatial_gating_unit.conv_spatial.weight", "block4.1.attn.spatial_gating_unit.conv_spatial.bias", "block4.1.attn.spatial_gating_unit.conv1.weight", "block4.1.attn.spatial_gating_unit.conv1.bias", "block4.1.attn.proj_2.weight", "block4.1.attn.proj_2.bias", "block4.1.norm2.weight", "block4.1.norm2.bias", "block4.1.norm2.running_mean", "block4.1.norm2.running_var", "block4.1.norm2.num_batches_tracked", "block4.1.mlp.fc1.weight", "block4.1.mlp.fc1.bias", "block4.1.mlp.dwconv.dwconv.weight", "block4.1.mlp.dwconv.dwconv.bias", "block4.1.mlp.fc2.weight", "block4.1.mlp.fc2.bias", "norm4.weight", "norm4.bias", "head.weight", "head.bias".

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

Hi, thanks for your excellent code and paper.I am reproducing this paper using my own dataset, but I am getting this error.The pytorch version is the correct one.

Question related to permutation operation used in the model

Hi, it looks like all the dimension permutation operations on PatchEmbed and Block are unnecessary, making the whole network a convolutional neural network rather than a transformer. Is that true? I was wondering your opinion on this argument. Thanks:)

question on the number of parameter

I read your paper and like it. but I have a one question.
In this equation(3), there is (2d-1)^2 operations.
It is depth-wise convolution so I think this may be (2d-1)^2 * C.
why the number of parameter is (2d-1)^2? Am I misunderstanding about LKA structure?

where is the detection model?

Code for detection

Thanks for your great work!
It will be so kind if you release the codes for VAN detection. :)

Attention模块代码中的shortcut

作者您好！
我注意到代码中的Attention模块里面存在一个论文中未标注的shortcut被用于残差连接。

想请问一下，这个连接是必要的吗

22k finetune 1k pretrained weights

I found imagenet 22k pretrained models on TsingHua Cloud.
But I can't find 1k finetuned models.
Do you have any release plans or links?

configuration for pre-trained models

Thank you for releasing this code and pre-trained models.

Could you verify what the data processing configuration was for your pre-trained models? Is it the following, which I get when running your example training script?

Data processing configuration for current model + dataset:
        input_size: (3, 224, 224)
        interpolation: bicubic
        mean: (0.5, 0.5, 0.5)
        std: (0.5, 0.5, 0.5)
        crop_pct: 0.9

OverlapPatchEmbed

For classification，can you tell me the total GPU usage about the base version，Thanks！

Port to TF/Keras

Can someone help with porting the models and weights from PyTorch to Keras?

Thanks in advance!

Yusuf

Failed to load van_base_828.pth.tar

Env:

torch == 1.10.1
timm == 0.4.12

Script:

ckpt_file = 'ckpt/van_base_828.pth.tar'
ckpt = torch.load(ckpt_file, map_location="cpu")

Error message:

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

Since torch == 1.10.1 satisfies the requirement (torch >= 1.7) and we do need the latest pytorch version for other parts of our project. Anyone can help? Thanks!

acc not match(72.03 vs 75.4)

It's same that the release model is not the best one.
python validate.py "MYDATA/ImageNet" --model "van_tiny" --checkpoint "weights/van_tiny_754.pth.tar" -b 200
The result is following: Acc@1 72.032 (27.968) Acc@5 91.126 (8.874)

I check the state_dict of checkpoint and found that ema version is not exist, and I think the ema version's accuracy is same with paper.

so will you check it ?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.