GithubHelp home page GithubHelp logo

woodminus / vit-attentive-bench Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 153 KB

A comprehensive benchmarking of various attention mechanisms used in Vision Transformers

License: Apache License 2.0

Python 87.37% C++ 4.90% Cuda 7.72%

vit-attentive-bench's Introduction

Vision Transformer Attention Mechanism Benchmark

This repository, maintained by woodminus, is a comprehensive benchmarking of various attention mechanisms used in Vision Transformers. It not only provides a re-implementation but also furnishes a performance benchmark on parameters, FLOPs and CPU/GPU throughput of different attention mechanisms.

Requirements

  • Pytorch 1.8+
  • timm
  • ninja
  • einops
  • fvcore
  • matplotlib

Testing Environment

  • NVIDIA RTX 3090
  • Intel® Core™ i9-10900X CPU @ 3.70GHz
  • Memory 32GB
  • Ubuntu 22.04
  • PyTorch 1.8.1 + CUDA 11.1

Setting

  • input: 14 x 14 = 196 tokens (1/16 scale feature maps in common ImageNet-1K training)
  • batch size for speed testing (images/s): 64
  • embedding dimension:768
  • number of heads: 12

Testing

For example, to test HiLo attention,

cd attentions/
python hilo.py

By default, the script will test models on both CPU and GPU. FLOPs is measured by fvcore. You may want to edit the source file as needed.

Outputs:

Number of Params: 2.2 M
FLOPs = 298.3 M
throughput averaged with 30 times
batch_size 64 throughput on CPU 1029
throughput averaged with 30 times
batch_size 64 throughput on GPU 5104

Supported Attentions

  • Numerous attention mechanisms along with their respective papers and codes.

Single Attention Layer Benchmark

| Name | Params (M) | FLOPs (M) | CPU Speed | GPU Speed | Demo |

  • Various attention mechanisms along with their respective computational parameters and speed.

Note: Each method has its own hyperparameters. For a fair comparison on 1/16 scale feature maps, all methods in the above table adopt their default 1/16 scale settings, as shown in their released code repo. For example, when dealing with 1/16 scale feature maps, HiLo in LITv2 adopt a window size of 2 and alpha of 0.9. Future works will consider more scales and memory benchmarking.

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

vit-attentive-bench's People

Contributors

woodminus avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.