GithubHelp home page GithubHelp logo

whuhxb / sst Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tusen-ai/sst

0.0 0.0 0.0 1.26 MB

Paper and Codes for “Embracing Single Stride 3D Object Detector with Sparse Transformer” (CVPR 2022)

License: Apache License 2.0

Python 89.15% Dockerfile 0.03% C++ 6.51% Cuda 4.22% Shell 0.08%

sst's Introduction

SST: Single-stride Sparse Transformer

PWC PWC PWC

This is the official implementation of paper:

Embracing Single Stride 3D Object Detector with Sparse Transformer

Authors: Lue Fan, Ziqi Pang, Tianyuan Zhang, Yu-Xiong Wang, Hang Zhao, Feng Wang, Naiyan Wang, Zhaoxiang Zhang

Paper Link中文解读

NEWS

  • 🔥 SST is accepted at CVPR 2022.
  • Support Weighted NMS (CPU version) in RangeDet, improving performance of vehicle class by ~1 AP. See Usage section.
  • We refactored the code to provide more clear function prototypes and a better understanding. See ./configs/sst_refactor
  • Supported voxel-based region partition in ./configs/sst_refactor. Users can easily use voxel-based SST by modifying the recover_bev function in the backbone.
  • Waymo Leaderboard results updated in SST_v1

Visualization of a sequence by AB3DMOT tracking:

demo-min

Introduction and Highlights

  • SST is a single-stride network, which maintains original feature resolution from the beginning to the end of the network. Due to the characterisric of single stride, SST achieves exciting performances on small object detection (Pedestrian, Cyclist).
  • For simplicity, except for backbone, SST is almost the same with the basic PointPillars in MMDetection3D. With such a basic setting, SST achieves state-of-the-art performance in Pedestrian and Cyclist and outperforms PointPillars more than 10 AP only at a cost of 1.5x latency.
  • SST consists of 6 Regional Sparse Attention (SRA) blocks, which deal with the sparse voxel set. It's similar to Submanifold Sparse Convolution (SSC), but much more powerful than SSC. It's locality and sparsity guarantee the efficiency in the single stride setting.
  • The SRA can also be used in many other task to process sparse point clouds. Our implementation of SRA only relies on the pure Python APIs in PyTorch without engineering efforts as taken in the CUDA implementation of sparse convolution.
  • Better utilizing rich point observations. Benefiting more from multi-sweep point clouds due to single stride.
  • Large room for further improvements. For example, second stage, anchor-free head, IoU scores and advanced techniques from many kinds of vision transformers, etc.

Usage

PyTorch >= 1.9 is recommended for a better support of the checkpoint technique. (or you can manually replace the interface of checkpoint in torch < 1.9 with the one in torch >= 1.9.)

Our implementation is based on MMDetection3D, so just follow their getting_started and simply run the script: run.sh. Then you will get a basic result of SST after 5~7 hours (depends on your devices).

We only provide the single-stage model here, as for our two-stage models, please follow LiDAR-RCNN. It's also a good choice to apply other powerful second stage detectors to our single-stage SST.

We borrow Weighted NMS from RangeDet and observe ~1 AP improvement on our best Vehicle model. To use it, you are supposed to clone RangeDet, and simply run pip install -v -e . in its root directory. Then refer to config/sst/sst_waymoD5_1x_car_8heads_wnms.py to modify your config and enable Weight NMS. Note we only implement the CPU version for now, so it is relatively slow. Do NOT use it on 3-class models, which will lead to performance drop.

Play with your first single-stride model

In ./configs/sst/, we provide a basic config sst_waymoD5_1x_ped_cyc_8heads_3f to show the power of our single-stride network on small object detection (Pedestrian and Cyclist). With this config (only 20% training data for 12 epoch), we can get a very good performance, which is better than most other published methods (WOD validation split):

Ped AP/APH Cyc AP/APH
Level 1 80.51/75.48 70.44/69.43
Level 2 72.18/67.51 67.94/67.00

(Based on PointPillars, single stage, 3sweeps, 20% training data for 12 epochs, taking ~7 hours with 8 2080Ti GPUs)

Main results

All the results of single stage models are reproducible with this repo. We also find that some improvements can usually be obtained by replacing your pillar-based conv backbone with SST. So please let us know if you have trouble reproducing the results. Discussions are definitely welcome if you could not obtain satisfactory performances with SST in your projects.

Waymo Leaderboard

#Sweeps Veh_L1 Ped_L1 Cyc_L1 Veh_L2 Ped_L2 Cyc_L2
SST_TS_3f 3 80.99 83.30 75.69 73.08 76.93 73.22

Please visit the website for detailed results: SST_v1

One stage model (based on PointPillars) on Waymo validation split

#Sweeps Veh_L1 Ped_L1 Cyc_L1 Veh_L2 Ped_L2 Cyc_L2
SST_1f 1 73.57 80.01 70.72 64.80 71.66 68.01
SST_3f 3 75.16 83.24 75.96 66.52 76.17 73.59

Note that we train the 3 classes together, so the performance above is a little bit lower than that reported in our paper.

TODO

  • Build SRA block with similar API as Sparse Convolution for more convenient usage.

Citation

Please consider citing our work as follows if it is helpful.

@article{fan2021embracing,
  title={Embracing Single Stride 3D Object Detector with Sparse Transformer},
  author={Fan, Lue and Pang, Ziqi and Zhang, Tianyuan and Wang, Yu-Xiong and Zhao, Hang and Wang, Feng and Wang, Naiyan and Zhang, Zhaoxiang},
  journal={arXiv preprint arXiv:2112.06375},
  year={2021}
}

Acknowledgments

This project is based on the following codebases.

Thank the authors of CenterPoint for providing their detailed results.

sst's People

Contributors

abyssaledge avatar winstywang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.