implus / sknet Goto Github PK

Code for our CVPR 2019 paper: Selective Kernel Networks; See zhihu：https://zhuanlan.zhihu.com/p/59690223

C++ 62.00% Cuda 38.00%

sknet's Introduction

SKNet: Selective Kernel Networks _(paper)

By Xiang Li^[1,2], Wenhai Wang^[3,2], Xiaolin Hu^[4] and Jian Yang^[1]

[PCALab, Nanjing University of Science and Technology]^[1] Momenta^[2] [Nanjing University]^[3] [Tsinghua University]^[4].

Approach

Figure 1: The Diagram of a Selective Kernel Convolution module.

Implementation

In this repository, all the models are implemented by Caffe.

We use the data augmentation strategies with SENet.

There are two new layers introduced for efficient training and inference, these are Axpy and CuDNNBatchNorm layers.

The Axpy layer is already implemented in SENet.
The [CuDNNBatchNorm] is mainly borrowed from GENet.

Trained Models

Table 2. Single crop validation error on ImageNet-1k (center 224x224/320x320 crop from resized image with shorter side = 256).

Model	Top-1 224x	Top-1 320x	#P	GFLOPs
ResNeXt-50	22.23	21.05	25.0M	4.24
AttentionNeXt-56	21.76	–	31.9M	6.32
InceptionV3	–	21.20	27.1M	5.73
ResNeXt-50 + BAM	21.70	20.15	25.4M	4.31
ResNeXt-50 + CBAM	21.40	20.38	27.7M	4.25
SENet-50	21.12	19.71	27.7M	4.25
SKNet-50	20.79	19.32	27.5M	4.47
ResNeXt-101	21.11	19.86	44.3M	7.99
Attention-92	–	19.50	51.3M	10.43
DPN-92	20.70	19.30	37.7M	6.50
DPN-98	20.20	18.90	61.6M	11.70
InceptionV4	–	20.00	42.0M	12.31
Inception-ResNetV2	–	19.90	55.0M	13.22
ResNeXt-101 + BAM	20.67	19.15	44.6M	8.05
ResNeXt-101 + CBAM	20.60	19.42	49.2M	8.00
SENet-101	20.58	18.61	49.2M	8.00
SKNet-101	20.19	18.40	48.9M	8.46

Download:

Model	caffe model
SKNet-50	GoogleDrive
SKNet-101	GoogleDrive

20190323_Update: SKNet-101 model is deleted by mistake. We are retraining a model and it will come soon in 2-3 days. 20190326_Update: SKNet-101 model is ready.

Attention weights correspond to object scales in low/middle layers

We look deep into the selection distributions from the perspective of classes on SK_2_3 (low), SK_3_4 (middle), SK_5_3 (high) layers:

Figure 2: Average mean attention difference (mean attention value of kernel 5x5 minus that of kernel 3x3) on SK units of SKNet-50, for each of 1,000 categories using all validation samples on ImageNet. On low or middle level SK units (e.g., SK\_2\_3, SK\_3\_4), 5x5 kernels are clearly imposed with more emphasis if the target object becomes larger (1.0x -> 1.5x).

More details of attention distributions on specific images are as follows:

Citation

If you use Selective Kernel Convolution in your research, please cite the paper:

@inproceedings{li2019selective,
  title={Selective Kernel Networks},
  author={Li, Xiang and Wang, Wenhai and Hu, Xiaolin and Yang, Jian},
  journal={IEEE Conference on Computer Vision and Pattern Recognition},
  year={2019}
}

sknet's People

Contributors

Stargazers

Watchers

Forkers

chiukin qaz734913414 salt-fly davis-love-ai haonan-qin mm1327 ml-lab gaimjkp moonmyth futureprecd littink xialuxi ishine dl-85 pchank ezhangle cjnjuwhy fendaq kayuksel bingxinyang skyneta aaron1993 laycoding xiaoketongxue hufengshuo07 yueyedeai yichengwang125 leifengsoul hfxunlp shenmbsw simbazad sweaterr pangzironglai chendan003 hongwang01 sumnotes hahahahaliveness deandon shiyuan0806 g4nym3de zhaojp-frank duweidai zccdl wenhuach yekyli cassiaaaaaa dugusiqing chenliqiong zp1018 liyancas virtualearth miracle-fmh ding1995 wolfworld6 yongsongh leofengxin liu-ca zhongkang97 xiesibo liu-qi333 lawrencewxj lexieewei p517332051 571502680 nindoumon 174614361 coolbay fresty sunxingxingtf boosting clhne georgehefashan chisyliu lxmwust yhy1993824 tommylitlle sjjgwan hanyc0914 xinxin12345 mymuli rayleijingsi panyongqi aksenventwo xinlingqiu cv-ip muwutufu litingsjj lovecodestudent fxydream noticeable chenkarl qianrenjian hongbo-sun hucui2022 qinzhengmei jimba86 ts0923 yanbinbi dapao1988 olansdo

sknet's Issues

model

What folder is your model in?

Which version of caffe、cuda and cuda？

Question: initialize A and B

Can you tell me the way to initialize soft Attention vector A/B in the paper?
(Since I donot use caffe so it's hard for me to find the part of initialize the matrix)
Thanks in advance

any details about architectures for cifar10

I can't find the architecture and hyper-parameters(M, G, r) used for cifar dataset from the paper and this repo. Can you describe some details, thx !!

Why SKNet101 conv3_x/B_fc1's output channel is 16？

I find the output channel of conv3_x/B_fc1 is 16 which is quite confusing. As sknet paper mentioned, the first fully connected layer(B_fc1)'s output channel d following the equation(4) which is

d = max(C/r, L),

where C is the input feature's channel number, r is the reduction factor(r=16) and L denotes the minimal value(L=32). According to the above equation, the conv3_x/B_fc1's output channel should equal to L which is 32(d = max(256/16, 32)).

Question about cuDNN batch normalization

Hi, could you please briefly explain what statistics are store in the cuDNN batch norm layer?
For the standard batch norm layer, it saves mean, variance and scale. but when I retrieving the parameter in your trained model, it is 4 dimensions. I assume it stands for different factors but hard to understand what it is... Thanks in advance!

Is there pytorch version of SKNet

Questions about the fuse process

Thanks for your impressive work! After reading your code and paper, I have some questions about the fusion design. Referring to the SENET, they implement the self-attention by global pooling, and two Convs to set the channel-wise descriptor to C1. While, in your paper, you created two matrices to change the dimension back to C1. Why? How about using two single Convs to change the dimension to C*1? Especially, it looks like more efficient.

‘DiagonalAffineMap’ does not name a type virtual inline DiagonalAffineMap<Dtype> coord_map()

Hi，implus!

When I make caffe, I ran into the problem as follows:

make all
PROTOC src/caffe/proto/caffe.proto
CXX .build_release/src/caffe/proto/caffe.pb.cc
CXX src/caffe/layer_factory.cpp
CXX src/caffe/solvers/nesterov_solver.cpp
...
CXX src/caffe/layers/reduction_layer.cpp
CXX src/caffe/layers/inner_product_layer.cpp
CXX src/caffe/layers/cudnn_batch_norm_layer.cpp
In file included from src/caffe/layers/cudnn_batch_norm_layer.cpp:10:0:
./include/caffe/layers/cudnn_batch_norm_layer.hpp:34:18: error: ‘DiagonalAffineMap’ does not name a type
virtual inline DiagonalAffineMap coord_map() {
^
make: *** [.build_release/src/caffe/layers/cudnn_batch_norm_layer.o] Error 1

Would you mind uploading your caffe package?

How to modify caffe-gpu in anaconda with given addition files?

I installed caffe-gpu via Anaconda in my linux machine.

I was just wondering how to add Axpy and CuDNNBatchNorm into caffe-gpu package in Anaconda and recompile to use this network?

caffe model preprocess

Does anyone have preprocessing code for caffe model. I tried resize and crop, but the results are wrong. What I tested is a sknet-resnet50.prototxt

Problems about cudnn_batch_norm_layer.cu

~/code/SKNet/caffe$ make all
NVCC src/caffe/layers/cudnn_batch_norm_layer.cu
src/caffe/layers/cudnn_batch_norm_layer.cu(29): error: class "caffe::Caffe" has no member "cudnn_handle"

src/caffe/layers/cudnn_batch_norm_layer.cu(46): error: class "caffe::Caffe" has no member "cudnn_handle"

src/caffe/layers/cudnn_batch_norm_layer.cu(106): error: class "caffe::Caffe" has no member "cudnn_handle"

3 errors detected in the compilation of "/tmp/tmpxft_00001f9f_00000000-11_cudnn_batch_norm_layer.compute_61.cpp1.ii".
make: *** [.build_release/cuda/src/caffe/layers/cudnn_batch_norm_layer.o] Error 1

Parameters

Hi. How do you calculate the model parameters and GFLops? Thanks.

SKnet101 Model

Excuse me, I found something wrong with SKnet101 Model.

The website it link to is https://github.com/implus/SKNet/blob/master, not like the other one is about drive.google.com

Question about the performance of the pytorch model

I referenced the code from pppLang and did two sknet50-imagenet experiments on pytorch's model, both of them are about 0.5% lower than the accuracy in your paper.
So I want to know are there any details during training? Or is it just a platform difference?
My code is at here.

Thanks in advance.