GithubHelp home page GithubHelp logo

tomato's Introduction

ToMato: Token Merging at Once

ToMato

About our code

ViT(Vision Transformer) shows outstanding performance in various vision tasks by splitting images into patches and passing them through transformer blocks. However, the large model size and computational cost of ViT result in high inference latency and hindered acceleration. To accelerate ViT efficiently, we introduce ToMato(Token Merging at Once), a simple framework that recursively merges tokens by comparing similarity to adjacent tokens at the first transformer block. Applying the ToMato to DeiT-base model, we find that this reduces latency by 22.19% while maintaining high Top-1 accuracy of 80.14%.

How to install

git clone our repository to your computer

git clone https://github.com/Transformer04/ToMato.git

How to test

If you want to evaluate the accuracy of our model, enter <test_batch.py> file and change the directory path to your dataset in line 40. Then, run test_batch.py

python test_batch.py

Datasets

Test and validation were conducted using the Imagenet-mini-1000 dataset. The dataset can be checked at the following link. https://www.kaggle.com/datasets/ifigotin/imagenetmini-1000

Experiment Results

Here are some expected results when using the timm implementation off-the-shelf on ImageNet-1k val using a V100:

Model Top-1 acc (%) Top-5 acc (%) Latency (s)
DeiT-B 81.41 953 13.2132
ToMe-B 84.57 309 13
OURS-B 85.82 95 7

Visualization

License and Contributing

This code has been implemented with reference to ToMe's code. Official PyTorch implemention of ToMe from the paper: Token Merging: Your ViT but Faster.
Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Christoph Feichtenhofer, Judy Hoffman.

Please refer to the CC-BY-NC 4.0. For contributing, see contributing and the code of conduct.

@inproceedings{bolya2022tome,
  title={Token Merging: Your {ViT} but Faster},
  author={Bolya, Daniel and Fu, Cheng-Yang and Dai, Xiaoliang and Zhang, Peizhao and Feichtenhofer, Christoph and Hoffman, Judy},
  booktitle={International Conference on Learning Representations},
  year={2023}
}

tomato's People

Contributors

sooyoungk01 avatar dbolya avatar chengyangfu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.