GithubHelp home page GithubHelp logo

ricessssss / focalformer3d Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nvlabs/focalformer3d

0.0 0.0 0.0 193 KB

Official PyTorch implementation of FocalFormer3D [ICCV 2023]

License: Other

Shell 0.29% C++ 0.65% Python 96.64% C 0.13% Cuda 2.28%

focalformer3d's Introduction

FocalFormer3D: Focusing on Hard Instance for 3D Object Detection

FocalFormer3D: Focusing on Hard Instance for 3D Object Detection, ICCV 2023.

Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Anima Anandkumar, Jiaya Jia, Jose M. Alvarez

[Paper]

False negatives (FN) in 3D object detection, {\em e.g.}, missing predictions of pedestrians, vehicles, or other obstacles, can lead to potentially dangerous situations in autonomous driving. While being fatal, this issue is understudied in many current 3D detection methods. In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies \textit{FN} in a multi-stage manner and guides the models to focus on excavating difficult instances. For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall. FocalFormer3D features a multi-stage query generation to discover hard objects and a box-level transformer decoder to efficiently distinguish objects from massive object candidates. Experimental results on the nuScenes and Waymo datasets validate the superior performance of FocalFormer3D. The advantage leads to strong performance on both detection and tracking, in both LiDAR and multi-modal settings. Notably, FocalFormer3D achieves a 70.5 mAP and 73.9 NDS on nuScenes detection benchmark, while the nuScenes tracking benchmark shows 72.1 AMOTA, both ranking 1st place on the nuScenes LiDAR leaderboard.

News

Results

3D Object Detection (nuScenes Detection & Tracking Leaderboard)

Model Modality mAP NDS AMOTA Result Files
FocalFormer3D-F C+L 72.4 74.5 73.9 Detection / Tracking
FocalFormer3D L 68.7 72.6 71.5 Detection / Tracking
FocalFormer3D-TTA L 70.5 73.9 72.1 Detection / Tracking

3D Object Detection (on nuScenes validation set)

Model Modality mAP NDS Checkpoint
FocalFormer3D-F C+L 70.5 73.1 FocalFormer3D_LC.pth
FocalFormer3D L 66.4 70.9 FocalFormer3D_L.pth
DeformFormer3D L 65.5 70.7 DeformFormer3D_L.pth
DeformFormer3D-R50 C 30.0 36.3 DeformFormer3D_C_R50.pth

3D Object Detection (on Waymo validation set)

Since WOD does not allow distributing the pretrained weights, we only report the model results trained on the full training split and 20% training split of the Waymo dataset. The official metrics mAP/mAPH on the Waymo validation set are reported as follows.

Model Modality Overall_L2 Veh_L2 Ped_L2 Cyc_L2
FocalFormer3D (1/5 split) L 68.1 / 65.6 66.4 / 65.9 69.0 / 62.8 69.0 / 67.9
DeformFormer3D (1/5 split) L 67.2 / 64.5 65.8 / 65.3 68.3 / 61.9 67.4 / 66.4
FocalFormer3D L 71.5 / 69.0 68.1 / 67.6 72.7 / 66.8 73.7 / 72.6
DeformFormer3D L 70.9 / 68.4 67.7 / 67.3 72.4 / 66.4 72.6 / 71.4

Get Started

a. Installation and data preparation.

This implementation is build upon mmdetection3d, please follow the steps in install.md to prepare the environment.

b. Get the pretrained nuImage-pretrained image backbone weights. Downloads the pretrained backbone weights to pretrained/.

c. Train and evaluation.

# train deformformer3d with 8 GPUs
bash tools/dist_train.sh projects/configs/focalformer3d/DeformFormer3D_L.py 8
# finetune focalformer3d with 8 GPUs
bash tools/dist_train.sh projects/configs/focalformer3d/FocalFormer3D_L.py 8
# test with 8 GPUs
bash tools/dist_test.sh projects/configs/focalformer3d/FocalFormer3D_L.py ${CHECKPOINT_FILE} 8 

d. For Test-time Aug

The test-time augmentation configuration can be found in FocalFormer3D_LC_TTA.py. Available augmentations include double-flipping and 3-scale scaling.

e. For Submission

Update your configuration file and re-train the model, follow these steps:

  1. Replace all instances of xxx_train.pkl with xxx_trainval.pkl in the config file.
  2. Retrain the model using the procedure detailed above.

At the inference stage,

  1. Substitute all instances of xxx_val.pkl with xxx_test.pkl in the config file.
  2. Execute the following command:
# test with 8 GPUs
bash tools/dist_test.sh projects/configs/focalformer3d/FocalFormer3D_L.py ${CHECKPOINT_FILE} 8 --format-only

The test results will be saved in the ./work_dirs/submissions/ directory, as specified in tools/test.py.

TODO

  • Release Code
  • Test-Time Aug

Acknowledgement

Many thanks to the following open-source projects:

LICENSE

Copyright ยฉ 2023, NVIDIA Corporation. All rights reserved.

This work is made available under the Nvidia Source Code License-NC. Click here to view a copy of this license.

The pre-trained models are shared under CC-BY-NC-SA-4.0. If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing

Reference

@InProceedings{focalformer3d,
  title={FocalFormer3D: Focusing on Hard Instance for 3D Object Detection},
  author={Chen, Yilun and Yu, Zhiding and Chen, Yukang and Lan, Shiyi and Anandkumar, Anima and Jia, Jiaya and Alvarez, Jose M},
  journal={ICCV},
  year={2023}
}

focalformer3d's People

Contributors

chenyilun95 avatar eltociear avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.