GithubHelp home page GithubHelp logo

dumpmemory / neighborhood-attention-transformer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from shi-labs/neighborhood-attention-transformer

0.0 0.0 0.0 26.01 MB

[Preprint] Neighborhood Attention Transformer, 2022

Home Page: https://arxiv.org/abs/2204.07143

License: MIT License

Python 99.62% Shell 0.38%

neighborhood-attention-transformer's Introduction

Neighborhood Attention Transformers

PWC PWC PWC PWC PWC PWC PWC PWC

PWC PWC PWC

NAT-Intro NAT-Intro

Powerful hierarchical vision transformers based on sliding window attention.

Neighborhood Attention (NA, local attention) was introduced in our original paper, NAT, and runs efficiently with our extension to PyTorch, NATTEN.

We recently introduced a new model, DiNAT, which extends NA by dilating neighborhoods (DiNA, sparse global attention, a.k.a. dilated local attention).

Combinations of NA/DiNA are capable of preserving locality, maintaining translational equivariance, expanding the receptive field exponentially, and capturing longer-range inter-dependencies, leading to significant performance boosts in downstream vision tasks, such as StyleNAT for image generation.

News

March 25, 2023

  • Neighborhood Attention Transformer was accepted to CVPR 2023!

November 18, 2022

  • NAT and DiNAT are now available through HuggingFace's transformers.
    • NAT and DiNAT classification models are also available on the HuggingFace's Model Hub: NAT | DiNAT

November 11, 2022

October 8, 2022

  • NATTEN is now available as a pip package!
    • You can now install NATTEN with pre-compiled wheels, and start using it in seconds.
    • NATTEN will be maintained and developed as a separate project to support broader usage of sliding window attention, even beyond computer vision.

September 29, 2022

Dilated Neighborhood Attention ๐Ÿ”ฅ

DiNAT-Abs DiNAT-Abs

A new hierarchical vision transformer based on Neighborhood Attention (local attention) and Dilated Neighborhood Attention (sparse global attention) that enjoys significant performance boost in downstream tasks.

Check out the DiNAT README.

Neighborhood Attention Transformer

NAT-Abs NAT-Abs

Our original paper, Neighborhood Attention Transformer (NAT), the first efficient sliding-window local attention.

How Neighborhood Attention works

Neighborhood Attention localizes the query token's (red) receptive field to its nearest neighboring tokens in the key-value pair (green). This is equivalent to dot-product self attention when the neighborhood size is identical to the image dimensions. Note that the edges are special (edge) cases.

720p_fast_dm 720p_fast_lm

Citation

@inproceedings{hassani2023neighborhood,
	title        = {Neighborhood Attention Transformer},
	author       = {Ali Hassani and Steven Walton and Jiachen Li and Shen Li and Humphrey Shi},
	booktitle    = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	month        = {June},
	year         = {2023},
	pages        = {6185-6194}
}
@article{hassani2022dilated,
	title        = {Dilated Neighborhood Attention Transformer},
	author       = {Ali Hassani and Humphrey Shi},
	year         = 2022,
	url          = {https://arxiv.org/abs/2209.15001},
	eprint       = {2209.15001},
	archiveprefix = {arXiv},
	primaryclass = {cs.CV}
}
@article{walton2022stylenat,
	title        = {StyleNAT: Giving Each Head a New Perspective},
	author       = {Steven Walton and Ali Hassani and Xingqian Xu and Zhangyang Wang and Humphrey Shi},
	year         = 2022,
	url          = {https://arxiv.org/abs/2211.05770},
	eprint       = {2211.05770},
	archiveprefix = {arXiv},
	primaryclass = {cs.CV}
}

neighborhood-attention-transformer's People

Contributors

alexmehta avatar alihassanijr avatar honghuis avatar ozoooooh avatar stevenwalton avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.