GithubHelp home page GithubHelp logo

panwangaz / voxel-mae Goto Github PK

View Code? Open in Web Editor NEW

This project forked from chaytonmin/occupancy-mae

0.0 0.0 0.0 18.66 MB

Voxel-MAE: Masked Autoencoders for Pre-training Large-scale Point Clouds

License: Apache License 2.0

Shell 0.29% C++ 6.14% Python 82.53% C 0.41% Cuda 10.43% Dockerfile 0.19%

voxel-mae's Introduction

Voxel-MAE: Masked Autoencoders for Pre-training Large-scale Point Clouds

Repository for our arxiv paper "Voxel-MAE: Masked Autoencoders for Pre-training Large-scale Point Clouds".

Introduction

Mask-based pre-training has achieved great success for self-supervised learning in image, video and language, without manually annotated supervision. However, it has not yet been studied about large-scale point clouds with redundant spatial information in autonomous driving. As the number of large-scale point clouds is huge, it is impossible to reconstruct the input point clouds. In this paper, we propose a mask voxel classification network for large-scale point clouds pre-training. Our key idea is to divide the point clouds into voxel representations and classify whether the voxel contains point clouds. This simple strategy makes the network to be voxel-aware of the object shape, thus improving the performance of downstream task, such as 3D object detection. Our Voxel-MAE with even a 90% masking ratio can still learn representative features for the high spatial redundancy of largescale point clouds. We also validate the effectiveness of Voxel-MAE in unsupervised domain adaptative task, which proves the generalization ability of Voxel-MAE. Our Voxel-MAE proves that it is feasible to pre-train large-scale point clouds without data annotations to enhance the perception ability of the autonomous vehicle. Extensive experiments show great effectiveness of our pre-trained model with 3D object detectors (SECOND, CenterPoint and PV-RCNN) on two popular datasets (KITTI, Waymo).

Flowchart of Voxel-MAE

Installation

Please refer to INSTALL.md for the installation of OpenPCDet(v0.5).

Getting Started

Please refer to GETTING_STARTED.md .

Usage

First Pre-training Voxel-MAE

KITTI:

Train with multiple GPUs:
bash ./scripts/dist_train_voxel_mae.sh ${NUM_GPUS}  --cfg_file cfgs/kitti_models/voxel_mae_kitti.yaml --batch_size ${BATCH_SIZE}

Train with a single GPU:
python3 train_voxel_mae.py  --cfg_file cfgs/kitti_models/voxel_mae_kitti.yaml --batch_size 4

Waymo:

python3 train_voxel_mae.py  --cfg_file cfgs/kitti_models/voxel_mae_waymo.yaml --batch_size 4

Then traing OpenPCDet

Same as OpenPCDet with pre-trained model from our Voxel-MAE.

bash ./scripts/dist_train.sh ${NUM_GPUS}  --cfg_file cfgs/kitti_models/second.yaml --batch_size ${BATCH_SIZE} --pretrained_model ../output/kitti/voxel_mae/ckpt/check_point_10.pth

Performance

KITTI 3D Object Detection

Pre-trained model will be provided soon.

The results are the 3D detection performance of moderate difficulty on the val set of KITTI dataset. Results of OpenPCDet are from here .

Car@R11 Pedestrian@R11 Cyclist@R11 Voxel-MAE 3D Detection
SECOND 78.62 52.98 67.15
Voxel-MAE+SECOND 78.89 53.32 68.00
SECOND-IoU 79.09 55.74 71.31
Voxel-MAE+SECOND-IoU 79.21 55.82 72.18
PV-RCNN 83.61 57.90 70.47
Voxel-MAE+PV-RCNN 83.75 59.36 71.99

Waymo Open Dataset Baselines

Similar to OpenPCDet , all models are trained with a single frame of 20% data (~32k frames) of all the training samples , and the results of each cell here are mAP/mAPH calculated by the official Waymo evaluation metrics on the whole validation set (version 1.2).

Performance@(train with 20% Data) Vec_L1 Vec_L2 Ped_L1 Ped_L2 Cyc_L1 Cyc_L2 Voxel-MAE 3D Detection
SECOND 70.96/70.34 62.58/62.02 65.23/54.24 57.22/47.49 57.13/55.62 54.97/53.53
Voxel-MAE+SECOND 71.18/70.56 62.88/62.31 67.19/55.61 59.05/48.77 57.71/56.21 55.58/54.13 model model
CenterPoint 71.33/70.76 63.16/62.65 72.09/65.49 64.27/58.23 68.68/67.39 66.11/64.87
Voxel-MAE+CenterPoint 71.87/71.31 64.03/63.51 73.90/67.10 65.80/59.61 70.28/69.02 67.75/66.52 model model
PV-RCNN (AnchorHead) 75.41/74.74 67.44/66.80 71.98/61.24 63.70/53.95 65.88/64.25 63.39/61.82
Voxel-MAE+PV-RCNN (AnchorHead 75.93/75.27 67.97/67.34 74.03/63.55 64.92/55.55 67.17/65.54 64.63/63.07 model model
PV-RCNN (CenterHead) 75.95/75.43 68.02/67.54 75.94/69.40 67.66/61.62 70.18/68.98 67.73/66.57
Voxel-MAE+PV-RCNN (CenterHead) 77.34/76.82 68.70/68.23 77.70/71.15 69.54/63.45 70.54/69.39 68.10/66.99 model model
PV-RCNN++ 77.82/77.32 69.07/68.62 77.99/71.36 69.92/63.74 71.80/70.71 69.31/68.26
Voxel-MAE+PV-RCNN++ 78.24/77.74 69.56/69.11 79.84/73.22 71.06/64.97 71.75/70.64 69.26/68.20 model model

License

Our codes are released under the Apache 2.0 license.

Acknowledgement

This repository is based on OpenPCDet.

Citation

If you find this project useful in your research, please consider cite:

@ARTICLE{Voxel-MAE,
    title={Voxel-MAE: Masked Autoencoders for Pre-training Large-scale Point Clouds},
    author={{Min}, Chen and {Zhao}, Dawei and {Xiao}, Liang and {Nie}, Yiming and {Dai}, Bin}},
    journal = {arXiv e-prints},
    year={2022}
}

voxel-mae's People

Contributors

chaytonmin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.