GithubHelp home page GithubHelp logo

piyushagni5 / van-classification Goto Github PK

View Code? Open in Web Editor NEW

This project forked from visual-attention-network/van-classification

0.0 0.0 0.0 966 KB

License: Apache License 2.0

Shell 0.75% Python 99.25%

van-classification's Introduction

Visual Attention Network (VAN) paper pdf

This is a PyTorch implementation of VAN proposed by our paper "Visual Attention Network".

Comparsion

Figure 1: Compare with different vision backbones on ImageNet-1K validation set.

Citation:

@article{guo2022visual,
  title={Visual Attention Network},
  author={Guo, Meng-Hao and Lu, Cheng-Ze and Liu, Zheng-Ning and Cheng, Ming-Ming and Hu, Shi-Min},
  journal={arXiv preprint arXiv:2202.09741},
  year={2022}
}

News:

2022.02.22 Release paper on ArXiv and code on github.

2022.02.25 Supported by Jimm

2022.03.15 Supported by Hugging Face.

2022.04 Supported by PaddleCls.

2022.05 Supported by OpenMMLab.

For More Code, please refer to Paper with code.

2022.07.08 Update paper on ArXiv. (ImageNet-22K results, SOTA for panoptic segmentation (58.2 PQ). Segmentation models are available.

Classification models see Here. We are working on it.

Abstract:

While originally designed for natural language processing (NLP) tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, the 2D nature of images brings three challenges for applying self-attention in computer vision. (1) Treating images as 1D sequences neglects their 2D structures. (2) The quadratic complexity is too expensive for high-resolution images. (3) It only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel large kernel attention (LKA) module to enable self-adaptive and long-range correlations in self-attention while avoiding the above issues. We further introduce a novel neural network based on LKA, namely Visual Attention Network (VAN). While extremely simple and efficient, VAN outperforms the state-of-the-art vision transformers (ViTs) and convolutional neural networks (CNNs) with a large margin in extensive experiments, including image classification, object detection, semantic segmentation, instance segmentation, etc.

Decomposition

Figure 2: Decomposition diagram of large-kernel convolution. A standard convolution can be decomposed into three parts: a depth-wise convolution (DW-Conv), a depth-wise dilation convolution (DW-D-Conv) and a 1×1 convolution (1×1 Conv).

LKA

Figure 3: The structure of different modules: (a) the proposed Large Kernel Attention (LKA); (b) non-attention module; (c) the self-attention module (d) a stage of our Visual Attention Network (VAN). CFF means convolutional feed-forward network. The difference between (a) and (b) is the element-wise multiply. It is worth noting that (c) is designed for 1D sequences. .

Image Classification

Data prepare: ImageNet with the following folder structure.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

2. VAN Models (IN-1K)

Model #Params(M) GFLOPs Top1 Acc(%) Download
VAN-B0 4.1 0.9 75.4 Google Drive, Tsinghua Cloud, Hugging Face 🤗
VAN-B1 13.9 2.5 81.1 Google Drive, Tsinghua Cloud, Hugging Face 🤗
VAN-B2 26.6 5.0 82.8 Google Drive, Tsinghua Cloud,Hugging Face 🤗,
VAN-B3 44.8 9.0 83.9 Google Drive, Tsinghua Cloud, Hugging Face 🤗
VAN-B4 TODO TODO TODO TODO

3.Requirement

1. Pytorch >= 1.7
2. timm == 0.4.12

4. Train

We use 8 GPUs for training by default. Run command (It has been writen in train.sh):

MODEL=van_tiny # van_{tiny, small, base, large}
DROP_PATH=0.1 # drop path rates [0.1, 0.1, 0.1, 0.2] for [tiny, small, base, large]
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash distributed_train.sh 8 /path/to/imagenet \
	  --model $MODEL -b 128 --lr 1e-3 --drop-path $DROP_PATH

5. Validate

Run command (It has been writen in eval.sh) as:

MODEL=van_tiny # van_{tiny, small, base, large}
python3 validate.py /path/to/imagenet --model $MODEL \
  --checkpoint /path/to/model -b 128

6.Acknowledgment

Our implementation is mainly based on pytorch-image-models and PoolFormer. Thanks for their authors.

LICENSE

This repo is under the Apache-2.0 license. For commercial use, please contact the authors.

van-classification's People

Contributors

ak391 avatar ceshine avatar menghaoguo avatar shkarupa-alex avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.