GithubHelp home page GithubHelp logo

cyhuauin / rest Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wofmanaf/rest

0.0 0.0 0.0 38 KB

This is an official implementation for "ResT: An Efficient Transformer for Visual Recognition".

License: Apache License 2.0

Python 100.00%

rest's Introduction

ResT

By Qing-Long Zhang and Yu-Bin Yang

[State Key Laboratory for Novel Software Technology at Nanjing University]

This repo is the official implementation of "ResT: An Efficient Transformer for Visual Recognition". It currently includes code and models for the following tasks:

Image Classification: Included in this repo. See get_started.md for a quick start.

Object Detection and Instance Segmentation: Based on detectron2.

ResT is initially described in arxiv, which capably serves as a general-purpose backbone for computer vision. It can tackle input images with arbitrary size. Besides, ResT compressed the memory of standard MSA and model the interaction between multi-heads while keeping the diversity ability.

Main Results on ImageNet with Pretrained Models

ImageNet-1K Pretrained Models

name resolution acc@1 acc@5 #params FLOPs FPS 1K model
ResT-Lite 224x224 77.2 93.7 10.5M 1.4G 1246 baidu
ResT-Small 224x224 79.6 94.9 13.7M 1.9G 1043 baidu
ResT-Base 224x224 81.6 95.7 30.3M 4.3G 673 baidu
ResT-Large 224x224 83.6 96.3 51.6M 7.9G 429 baidu

Note: access code for baidu is rest. pretrained models for google drive.

Main Results on Downstream Tasks

COCO Object Detection (2017 val)

Attention: The results of downstream tasks are very sensitive to the training settings, a fluctuation of 1.0 is OK!

Backbone Method pretrain Lr Schd box mAP mask mAP #params model
ResT-Small RetinaNet ImageNet-1K 1x 40.3 - 23.4M baidu
ResT-Base RetinaNet ImageNet-1K 1x 42.0 - 40.5M baidu
ResT-Small Mask R-CNN ImageNet-1K 1x 39.6 37.2 33.3M baidu
ResT-Base Mask R-CNN ImageNet-1K 1x 41.6 38.7 49.8M baidu

Note: This is the results with LN (Comparison is shown in the Appendix part). For training with rest backbones, you need to convert the original pre-trained weights to d2 format by convert_to_d2.py. Access code for baidu is rest.

Citing ResT

@article{zhql2021ResT,
  title={ResT: An Efficient Transformer for Visual Recognition},
  author={Zhang, Qinglong and Yang, Yubin},
  journal={arXiv preprint arXiv:2105.13677v3},
  year={2021}
}

rest's People

Contributors

wofmanaf avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.