GithubHelp home page GithubHelp logo

chrischen1023 / biformer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from rayleizhu/biformer

0.0 0.0 0.0 1.78 MB

[CVPR 2023] Official code release of our paper "BiFormer: Vision Transformer with Bi-Level Routing Attention"

Home Page: https://arxiv.org/abs/2303.08810

License: MIT License

Shell 0.15% Python 99.85%

biformer's Introduction

Official PyTorch implementation of BiFormer, from the following paper:

BiFormer: Vision Transformer with Bi-Level Routing Attention. CVPR 2023.
Lei Zhu, Xinjiang Wang, Zhanghan Ke, Wayne Zhang, and Rynson Lau


News

  • 2023-04-11: object detection code is released. It achives significantly better results than the paper reported due to a bug fix.

  • 2023-03-24: For better memory and computation efficieny, we are diving into the optimization of BRA with CUDA. Please stay tuned.

    • Collaborations and contributions are welcome, especially if you are an expert in CUDA/cutlass. There is a chance to co-author a paper.
  • 2023-03-24: For better readability, BRA and BiFormer-STL has been refactored. See ops/bra_nchw.py and models/biformer_stl_nchw.py. We still keep the legacy (and a little bit messy) implementation for compatiability of previously released checkpoints.

Results and Pre-trained Models

ImageNet-1K trained models

name resolution acc@1 #params FLOPs model log tensorboard log*
BiFormer-T 224x224 81.4 13.1 M 2.2 G model log -
BiFormer-S 224x224 83.8 25.5 M 4.5 G model log tensorboard.dev
BiFormer-B 224x224 84.3 56.8 M 9.8 G model log -
BiFormer-STL 224x224 82.7 28.4 M 4.6 G model log -
BiFormer-STL-nchw 224x224 82.7 28.4 M 4.6 G model log tensorboard.dev

* : reproduced after the acceptance of our paper.

Here the BiFormer-STL(Swin-Tiny-Layout) model is used in our ablation study. We hope it provides a good start proint for developing your own awsome attention mechanisms.

All files can be accessed from onedrive.

Installation

Please check INSTALL.md for installation instructions.

Evaluation

We did evaluation on a slurm cluster environment, using the command below:

python hydra_main.py \
    data_path=./data/in1k input_size=224  batch_size=128 dist_eval=true \
    +slurm=${CLUSTER_ID} slurm.nodes=1 slurm.ngpus=8 \
    eval=true load_release=true model='biformer_small'

To test on a local machine, you may try

python -m torch.distributed.launch --nproc_per_node=8 main.py \
  --data_path ./data/in1k --input_size 224 --batch_size 128 --dist_eval \
  --eval --load_release --model biformer_small

This should give

* Acc@1 83.754 Acc@5 96.638 loss 0.869
Accuracy of the network on the 50000 test images: 83.8%

Note: By setting load_release=true, the released checkpoints will be automatically downloaded, so you do not need to download manually in advance.

Training

To launch training on a slurm cluster, use the command below:

python hydra_main.py \
    data_path=./data/in1k input_size=224  batch_size=128 dist_eval=true \
    +slurm=${CLUSTER_ID} slurm.nodes=1 slurm.ngpus=8 \
    model='biformer_small'  drop_path=0.15 lr=5e-4

Note: Our codebase automatically generates output directory for experiment logs and checkpoints, according to the passed arguments. For example, the command above will produce an output directory like

$ tree -L 3 outputs/ 
outputs/
└── cls
    └── batch_size.128-drop_path.0.15-input_size.224-lr.5e-4-model.biformer_small-slurm.ngpus.8-slurm.nodes.2
        └── 20230307-21:33:26

Acknowledgement

This repository is built using the timm library, and ConvNext, UniFormer repositories.

License

This project is released under the MIT license. Please see the LICENSE file for more information.

Citation

If you find this repository helpful, please consider citing:

@Article{zhu2023biformer,
  author  = {Lei Zhu and Xinjiang Wang and Zhanghan Ke and Wayne Zhang and Rynson Lau},
  title   = {BiFormer: Vision Transformer with Bi-Level Routing Attention},
  journal = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year    = {2023},
}

TODOs

  • Add camera-ready paper link
  • IN1k standard training code, log, and pretrained checkpoints
  • IN1k token-labeling code
  • Semantic segmentation code
  • Object detection code
  • Swin-Tiny-Layout (STL) models
  • Refactor BRA and BiFormer code
  • Visualization demo
  • More efficient implementation with triton. See triton issue #1279
  • More efficient implementation (fusing gather and attention) with CUDA

biformer's People

Contributors

rayleizhu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.