GithubHelp home page GithubHelp logo

wdrink / stts Goto Github PK

View Code? Open in Web Editor NEW
44.0 4.0 3.0 22.36 MB

Official PyTorch implementation of the ECCV 2022 paper: Efficient Video Transformers with Spatial-Temporal Token Selection.

License: MIT License

Shell 1.64% Python 98.36%

stts's Introduction

Official PyTorch implementation of STTS, from the following paper:

Efficient Video Transformers with Spatial-Temporal Token Selection, ECCV 2022.

Junke Wang*,Xitong Yang*, Hengduo Li, Li Liu, Zuxuan Wu, Yu-Gang Jiang.

Fudan University, University of Maryland, BirenTech Research


We present STTS, a token selection framework that dynamically selects a few informative tokens in both temporal and spatial dimensions conditioned on input video samples.

Model Zoo

MViT with STTS on Kinetics-400

name acc@1 FLOPs model
MViT-T00.9-S40.9 78.1 56.4 model
MViT-T00.8-S40.9 77.9 47.2 model
MViT-T00.6-S40.9 77.5 38.1 model
MViT-T00.5-S40.7 76.6 23.3 model
MViT-T00.4-S40.6 75.6 12.1 model

VideoSwin with STTS on Kinetics-400

name acc@1 FLOPs model
VideoSwin-T00.9 81.9 252.5 model
VideoSwin-T00.8 81.6 223.4 model
VideoSwin-T00.6 81.4 181.4 model
VideoSwin-T00.5 81.1 121.6 model
VideoSwin-T00.4 80.7 91.4 model

Installation

Please check MViT and VideoSwin for installation instructions and data preparation.

Training and Evaluation

MViT

For both training and evaluation with MViT as backbone, you could use:

cd MViT

python tools/run_net.py --cfg path_to_your_config

For example, to evaluate MViT-T00.6-S40.9, run:

python tools/run_net.py --cfg configs/Kinetics/t0_0.6_s4_0.9.yaml

VideoSwin

For training, you could use:

cd VideoSwin

bash tools/dist_train.sh path_to_your_config $NUM_GPUS --checkpoint path_to_your_checkpoint --validate --test-last

while for evaluation, you could use:

bash tools/dist_test.sh path_to_your_config path_to_your_checkpoint $NUM_GPUS --eval top_k_accuracy

For example, to evaluate VideoSwin-T00.9 on a single node with 8 gpus, run:

cd VideoSwin

bash tools/dist_test.sh configs/Kinetics/t0_0.875.py ./checkpoints/t0_0.875.pth 8 --eval top_k_accuracy

License

This project is released under the MIT license. Please see the LICENSE file for more information.

Citation

If you find this repository helpful, please consider citing:

@inproceedings{wang2021efficient,
  title={Efficient video transformers with spatial-temporal token selection},
  author={Wang, Junke and Yang, Xitong and Li, Hengduo and Li, Liu and Wu, Zuxuan and Jiang, Yu-Gang},
  booktitle={ECCV},
  year={2022}
}

stts's People

Contributors

wdrink avatar

Stargazers

YuMeng Su avatar LiXia Du avatar 드럼통에든 블루베리 avatar  avatar Vicky Liau avatar Mo Jiawei avatar Yuanxing Xu avatar  avatar  avatar  avatar  avatar michel avatar darkpromise avatar yukaneko avatar shuoyang avatar  avatar  avatar cgoe avatar ruining tang avatar Ellery Queen avatar Bowen Yuan avatar Yin Chen avatar  avatar Frank Star avatar Qichao Ying avatar Huabin avatar  avatar Fengyuan Dai avatar lsh-atom avatar Xinyu Zhou avatar Jung-Woo Chang avatar  avatar Rui Tian avatar Jiange Yang avatar rawlaw avatar Binhui Xie (谢斌辉) avatar Wei Wang avatar Shuhan Tan avatar nobody avatar Vateye avatar TianqiTang avatar 周博通 avatar Zhendong Wang avatar  avatar

Watchers

James Cloos avatar Páll Haraldsson avatar Xitong Yang avatar  avatar

stts's Issues

scorer network

Congratulations on a great work, I would like to ask you about the score network in that file.

Welcome update to OpenMMLab 2.0

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

OpenMMLab 1.0 branch OpenMMLab 2.0 branch
MMEngine 0.x
MMCV 1.x 2.x
MMDetection 0.x 、1.x、2.x 3.x
MMAction2 0.x 1.x
MMClassification 0.x 1.x
MMSegmentation 0.x 1.x
MMDetection3D 0.x 1.x
MMEditing 0.x 1.x
MMPose 0.x 1.x
MMDeploy 0.x 1.x
MMTracking 0.x 1.x
MMOCR 0.x 1.x
MMRazor 0.x 1.x
MMSelfSup 0.x 1.x
MMRotate 1.x 1.x
MMYOLO 0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.