Progressive Coordinate Transforms for Monocular 3D Object Detection, NeurIPS 2021

License: Apache License 2.0

Python 78.36% Shell 0.05% C++ 21.59%

3d-detection waymo-open-dataset kitti-dataset neurips-2021

progressive-coordinate-transforms's Introduction

Progressive Coordinate Transforms for Monocular 3D Object Detection

This repository is the official implementation of PCT.

Introduction

In this paper, we propose a novel and lightweight approach, dubbed Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations for monocular 3D object detection. Specifically, a localization boosting mechanism with confidence-aware loss is introduced to progressively refine the localization prediction. In addition, semantic image representation is also exploited to compensate for the usage of patch proposals. Despite being lightweight and simple, our strategy allows us to establish a new state-of-the-art among the monocular 3D detectors on the competitive KITTI benchmark. At the same time, our proposed PCT shows great generalization to most coordinate-based 3D detection frameworks.

Requirements

Installation

Download this repository (tested under python3.7, pytorch1.3.1 and ubuntu 16.04.7). There are also some dependencies like cv2, yaml, tqdm, etc., and please install them accordingly:

cd #root
pip install -r requirements

Then, you need to compile the evaluation script:

cd root/tools/kitti_eval
sh compile.sh

Prepare your data

First, you should download the KITTI dataset, and organize the data as follows (* indicates an empty directory to store the data generated in subsequent steps):


#ROOT
  |data
    |KITTI
      |2d_detections
      |ImageSets
      |pickle_files *
      |object
        |training
          |calib
          |image_2
          |label
          |depth *
          |pseudo_lidar (optional for Pseudo-LiDAR)*
          |velodyne (optional for FPointNet)
        |testing
          |calib
          |image_2
          |depth *
          |pseudo_lidar (optional for Pseudo-LiDAR)*
          |velodyne (optional for FPointNet)

Second, you need to prepare your depth maps and put them to data/KITTI/object/training/depth. For ease of use, we also provide the estimated depth maps (these data generated from the pretrained models provided by DORN and Pseudo-LiDAR).

Monocular (DORN)	Stereo (PSMNet)
trainval(~1.6G), test(~1.6G)	trainval(~2.5G)

Then, you need to generate image 2D features for the 2D bounding boxes and put them to data/KITTI/pickle_files/org. We train the 2D detector according to the 2D detector in RTM3D. You can also use your own 2D detector for training and inference.

Finally, generate the training data using provided scripts :

cd #root/tools/data_prepare
python patch_data_prepare_val.py --gen_train --gen_val --gen_val_detection --car_only
mv *.pickle ../../data/KITTI/pickle_files

Prepare Waymo dataset

We also provide Waymo Usage for monocular 3D detection.

Training

Move to the workplace and train the mode (also need to modify the path of pickle files in config file):

 cd #root
 cd experiments/pct
 python ../../tools/train_val.py --config config_val.yaml

Evaluation

Generate the results using the trained model:

 python ../../tools/train_val.py --config config_val.yaml --e

and evalute the generated results using:

../../tools/kitti_eval/evaluate_object_3d_offline_ap11 ../../data/KITTI/object/training/label_2 ./output

../../tools/kitti_eval/evaluate_object_3d_offline_ap40 ../../data/KITTI/object/training/label_2 ./output

we provide the generated results for evaluation due to the tedious process of data preparation process. Unzip the output.zip and then execute the above evaluation commonds. Result is:

Models	AP3D11@mod.	AP3D11@easy	AP3D11@hard
PatchNet + PCT	27.53 / 34.65	38.39 / 47.16	24.44 / 28.47

Acknowledgements

This code benefits from the excellent work PatchNet, and use the off-the-shelf models provided by DORN and RTM3D.

Citation

@article{wang2021pct,
  title={Progressive Coordinate Transforms for Monocular 3D Object Detection},
  author={Li Wang, Li Zhang, Yi Zhu, Zhi Zhang, Tong He, Mu Li, Xiangyang Xue},
  journal={arXiv preprint arXiv:2108.05793},
  year={2021}
}

Contact

For questions regarding PCT-3D, feel free to post here or directly contact the authors ([email protected]).

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

progressive-coordinate-transforms's People

Contributors

Stargazers

Watchers

Forkers

willy0919 msathishkumar1990 trendingtechnology cv-ip collector-m jlqzzz fudan-zvg cv-det tuskaw

progressive-coordinate-transforms's Issues

How much time does it take to convert Waymo to KITTI format?

Thank you for amazing work. I wanted to know how much time does it take to convert Waymo to KITTI format using the script

python converter.py --save_dir datasets/waymo_open_organized/ --split validation

The validation one seems to take a lot of time on my machine, and so wanted to confirm.

How may I get the pickle_files

The inference time of PCT-3D on the Waymo dataset

Could you provide the inference time of PatchNet and PCT-3D on the Waymo dataset (depth estimator + 3D detection)? Thank you very much.

Waymo Results: mAP for all classes or only for the vehicle class

Hi PCT authors,
I had a small query regarding the Waymo results. Table 7 of your paper reports the mAP on Waymo dataset. Do you report the mAP/ mAPH of all the classes or is it only the mAP/mAPH for the vehicle (car) class ?

PS- Another paper CaDDN only reports mAP on the vehicle (car) class in their Table 2.

Waymo evaluation: Metrics of all Level 1 Objects same as Metrics of [0, 30) Level 1 Objects

Hi PCT authors,
I am using your waymo_eval.py for evaluating my Waymo model. Here is the output

OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/AP: 0.34
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/APH: 0.33
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/AP: 0.02
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/APH: 0.02
RANGE_TYPE_VEHICLE_[0, 30)_LEVEL_1/AP: 0.34
RANGE_TYPE_VEHICLE_[0, 30)_LEVEL_1/APH: 0.33
RANGE_TYPE_VEHICLE_[0, 30)_LEVEL_2/AP: 0.04
RANGE_TYPE_VEHICLE_[0, 30)_LEVEL_2/APH: 0.04
RANGE_TYPE_VEHICLE_[30, 50)_LEVEL_1/AP: 0.12
RANGE_TYPE_VEHICLE_[30, 50)_LEVEL_1/APH: 0.12
RANGE_TYPE_VEHICLE_[30, 50)_LEVEL_2/AP: 0.00
RANGE_TYPE_VEHICLE_[30, 50)_LEVEL_2/APH: 0.00
RANGE_TYPE_VEHICLE_[50, +inf)_LEVEL_1/AP: 0.05
RANGE_TYPE_VEHICLE_[50, +inf)_LEVEL_1/APH: 0.05
RANGE_TYPE_VEHICLE_[50, +inf)_LEVEL_2/AP: 0.00
RANGE_TYPE_VEHICLE_[50, +inf)_LEVEL_2/APH: 0.00

You should quickly notice that the AP for all Level 1 Vehicle = 0.34 is the same as the AP for [0,30) Level 1 Vehicle = 0.34. This strange behavior also shows up for the Level 1 Vehicle APH and other Level 1 classes (which I have not shown here). Generally, the AP for all Level 1 Vehicle is less than the AP for [0,30) Level 1 Vehicle as correctly reported in Table 7 of your paper.

I am unable to understand this behavior and so wanted to ask if you saw similar stuff on your end.

PS- Level 2 metrics do NOT show this behavior. e.g., in the above output, AP for all Level 2 objects (0.02), is less than AP for [0,30) Level 2 objects (0.04) as expected.

I am using anaconda and following are the packages in my conda environment:

blas                      1.0                         mkl    anaconda
cudatoolkit               10.1.243             h6bb024c_0    anaconda
cudnn                     7.6.5                cuda10.1_0    anaconda
google-auth               1.22.1                     py_0    anaconda
google-auth-oauthlib      0.4.1                      py_2    anaconda
google-pasta              0.2.0                      py_0    anaconda
protobuf                  3.13.0.1         py36he6710b0_1    anaconda
py-opencv                 3.4.2            py36hb342d67_1
python                    3.6.13               h12debd9_1  
tensorboard               2.2.1              pyh532a8cf_0    anaconda
tensorflow                2.1.0           gpu_py36h2e5cdaa_0    anaconda
tensorflow-gpu            2.1.0                h0d30ee6_0    anaconda

Could you provide the output.zip file or pretrained model checkpoints.

Hi. I noticed that you mentioned in the README.md file that

we provide the generated results for evaluation due to the tedious process of data preparation process. Unzip the output.zip and then execute the above evaluation commands.
...

However, I did not find a link to the result file. Would you like to share the detection results or pretrained model with us. Thank you very much.

Which version of Waymo did you use in your paper?

Thanks for sharing your great work! Could you tell me that which version of Waymo did you use in your paper, V1.0, V1.1, or V1.2?

Thank you!

Some confusion and a request

First of all, thank you for your excellent work. But I have some confusion and a request.
1.kitti_dataset 27 lines of the code, you load label in 'ddmp', not the provided "label_2". What preprocessing did you do to the label? I did not find an explanation in your paper.
2. As you said in the paper, the performance of the 2D detector has no positive correlation with the final 3D detection accuracy. So how do I choose a 2D detector, because I cannot choose the best 2D detector?
3. Have you done other coordinate-based detector experiments, because the paper only reports PatchNet+PCT.
4. Can you provide the feature files of the two-dimensional detection in training and testing so that I can run the code?

The waymo performance much lower than CaDDN?

It seems that your method achieves comparative performance with CaDDN on KITTI datasets.

How to generate each learner's label?

How to construct the location loss function?

about waymo result

Hi, you mentioned you use adabin trained on waymo. So how you do that, since waymo don't provide the gt depth map. Another question is did you train the total model, depthcompletion, 2d detection and 3d detection in an end-to-end manner?

ModuleNotFoundError: No module named 'lib.helpers.decorator_helper_level'

When I run python ../../tools/train_val.py --config config_val.yaml,
I get the error as follow.

Traceback (most recent call last):
File "../../tools/train_val.py", line 19, in
from lib.helpers.trainer_helper import Trainer
File "/newnfs/zzwu/08_3d_code/progressive-coordinate-transforms/lib/helpers/trainer_helper.py", line 11, in
from lib.helpers.decorator_helper_level import decorator_level
ModuleNotFoundError: No module named 'lib.helpers.decorator_helper_level'

you need to generate image 2D features for the 2D bounding boxes and put them to data/KITTI/pickle_files/org

amazon-science / progressive-coordinate-transforms Goto Github PK