GithubHelp home page GithubHelp logo

eltociear / 3dtrans Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pjlab-adg/3dtrans

0.0 1.0 0.0 10.34 MB

An Open-source Codebase for exploring Continuous-learning/Pre-training-oriented Autonomous Driving Task

License: Apache License 2.0

Shell 0.69% C++ 2.34% Python 92.78% C 0.16% Cuda 3.97% Dockerfile 0.07%

3dtrans's Introduction

arXiv arXiv arXiv GitHub issues PRs Welcome

3DTrans: An Open-source Codebase for Continuous Learning towards Autonomous Driving Task

3DTrans includes Transfer Learning Techniques and Scalable Pre-training Techniques for tackling the continuous learning issue on autonomous driving as follows.

  1. We implement the Transfer Learning Techniques consisting of four functions:
  • Unsupervised Domain Adaptation (UDA) for 3D Point Clouds
  • Active Domain Adaptation (ADA) for 3D Point Clouds
  • Semi-Supervised Domain Adaptation (SSDA) for 3D Point Clouds
  • Multi-dateset Domain Fusion (MDF) for 3D Point Clouds
  1. We implement the Scalable Pre-training Techniques which can continuously enhance the model performance for the downstream tasks, as more pre-training data are fed into our pre-training network:
  • AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset
  • SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving

Team Home:

  • A Team Home for Member Information and Profile, Project Link

Overview

News 🔥

  • We will release the Reconstruction-Simulation Dataset obtained using the ReSimAD method (updated on Sep. 2023).
  • We will release all codes of AD-PT here, see AD-PT for all details (updated on Sep. 2023).
  • We released the AD-PT pre-trained checkpoints, see AD-PT pre-trained checkpoints for pre-trained checkpoints (updated on Aug. 2023).
  • Based on 3DTrans, we achieved significant performance gains on a series of downstream perception benchmarks including Waymo, nuScenes, and KITTI, under different baseline models like PV-RCNN++, SECOND, CenterPoint, PV-RCNN (updated on Jun. 2023).
  • Our 3DTrans supported the Semi-Supervised Domain Adaptation (SSDA) for 3D Object Detection (updated on Nov. 2022).
  • Our 3DTrans supported the Active Domain Adaptation (ADA) of 3D Object Detection for achieving a good trade-off between high performance and annotation cost (updated on Oct. 2022).
  • Our 3DTrans supported several typical transfer learning techniques (such as TQS, CLUE, SN, ST3D, Pseudo-labeling, SESS, and Mean-Teacher) for autonomous driving-related model adaptation and transfer.
  • Our 3DTrans supported the Multi-domain Dataset Fusion (MDF) of 3D Object Detection for enabling the existing 3D models to effectively learn from multiple off-the-shelf 3D datasets (updated on Sep. 2022).
  • Our 3DTrans supported the Unsupervised Domain Adaptation (UDA) of 3D Object Detection for deploying a well-trained source model to an unlabeled target domain (updated on July 2022).
  • We calculate the distribution of the object-size for each public AD dataset in object-size statistics

We expect this repository will inspire the research of 3D model generalization since it will push the limits of perceptual performance. 🗼

Installation for 3DTrans

You may refer to INSTALL.md for the installation of 3DTrans.

Getting Started

  • Please refer to Readme for Datasets to prepare the dataset and convert the data into the 3DTrans format. Besides, 3DTrans supports the reading and writing data from Ceph Petrel-OSS, please refer to Readme for Datasets for more details.

  • Please refer to Readme for UDA for understanding the problem definition of UDA and performing the UDA adaptation process.

  • Please refer to Readme for ADA for understanding the problem definition of ADA and performing the ADA adaptation process.

  • Please refer to Readme for SSDA for understanding the problem definition of SSDA and performing the SSDA adaptation process.

  • Please refer to Readme for MDF for understanding the problem definition of MDF and performing the MDF joint-training process.

Model Zoo

We could not provide the Waymo-related pretrained models due to Waymo Dataset License Agreement, but you could easily achieve similar performance by training with the corresponding configs.

UDA Results:

Here, we report the cross-dataset (Waymo-to-KITTI) adaptation results using the BEV/3D AP performance as the evaluation metric. Please refer to Readme for UDA for experimental results of more cross-domain settings.

  • All LiDAR-based models are trained with 4 NVIDIA A100 GPUs and are available for download.
  • For Waymo dataset training, we train the model using 20% data.
  • The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
  • Pre-SN represents that we perform the SN (statistical normalization) operation during the pre-training source-only model stage.
  • Post-SN represents that we perform the SN (statistical normalization) operation during the adaptation stage.
training time Adaptation Car@R40 download
PointPillar ~7.1 hours Source-only with SN 74.98 / 49.31 -
PointPillar ~0.6 hours Pre-SN 81.71 / 57.11 model-57M
PV-RCNN ~23 hours Source-only with SN 69.92 / 60.17 -
PV-RCNN ~23 hours Source-only 74.42 / 40.35 -
PV-RCNN ~3.5 hours Pre-SN 84.00 / 74.57 model-156M
PV-RCNN ~1 hours Post-SN 84.94 / 75.20 model-156M
Voxel R-CNN ~16 hours Source-only with SN 75.83 / 55.50 -
Voxel R-CNN ~16 hours Source-only 64.88 / 19.90 -
Voxel R-CNN ~2.5 hours Pre-SN 82.56 / 67.32 model-201M
Voxel R-CNN ~2.2 hours Post-SN 85.44 / 76.78 model-201M
PV-RCNN++ ~20 hours Source-only with SN 67.22 / 56.50 -
PV-RCNN++ ~20 hours Source-only 67.68 / 20.82 -
PV-RCNN++ ~2.2 hours Post-SN 86.86 / 79.86 model-193M

ADA Results:

Here, we report the Waymo-to-KITTI adaptation results using the BEV/3D AP performance. Please refer to Readme for ADA for experimental results of more cross-domain settings.

  • All LiDAR-based models are trained with 4 NVIDIA A100 GPUs and are available for download.
  • For Waymo dataset training, we train the model using 20% data.
  • The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
training time Adaptation Car@R40 download
PV-RCNN ~23h@4 A100 Source Only 67.95 / 27.65 -
PV-RCNN ~1.5h@2 A100 Bi3D (1% annotation budget) 87.12 / 78.03 Model-58M
PV-RCNN ~10h@2 A100 Bi3D (5% annotation budget) 89.53 / 81.32 Model-58M
PV-RCNN ~1.5h@2 A100 TQS 82.00 / 72.04 Model-58M
PV-RCNN ~1.5h@2 A100 CLUE 82.13 / 73.14 Model-50M
PV-RCNN ~10h@2 A100 Bi3D+ST3D 87.83 / 81.23 Model-58M
Voxel R-CNN ~16h@4 A100 Source Only 64.87 / 19.90 -
Voxel R-CNN ~1.5h@2 A100 Bi3D (1% annotation budget) 88.09 / 79.14 Model-72M
Voxel R-CNN ~6h@2 A100 Bi3D (5% annotation budget) 90.18 / 81.34 Model-72M
Voxel R-CNN ~1.5h@2 A100 TQS 78.26 / 67.11 Model-72M
Voxel R-CNN ~1.5h@2 A100 CLUE 81.93 / 70.89 Model-72M

SSDA Results:

We report the target domain results on Waymo-to-nuScenes adaptation using the BEV/3D AP performance as the evaluation metric, and Waymo-to-ONCE adaptation using ONCE evaluation metric. Please refer to Readme for SSDA for experimental results of more cross-domain settings.

  • The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
  • For Waymo dataset training, we train the model using 20% data.
  • second_5%_FT denotes that we use 5% nuScenes training data to fine-tune the Second model.
  • second_5%_SESS denotes that we utilize the SESS: Self-Ensembling Semi-Supervised method to adapt our baseline model.
  • second_5%_PS denotes that we fine-tune the source-only model to nuScenes datasets using 5% labeled data, and perform the pseudo-labeling process on the remaining 95% unlabeled nuScenes data.
training time Adaptation Car@R40 download
Second ~11 hours source-only(Waymo) 27.85 / 16.43 -
Second ~0.4 hours second_5%_FT 45.95 / 26.98 model-61M
Second ~1.8 hours second_5%_SESS 47.77 / 28.74 model-61M
Second ~1.7 hours second_5%_PS 47.72 / 29.37 model-61M
PV-RCNN ~24 hours source-only(Waymo) 40.31 / 23.32 -
PV-RCNN ~1.0 hours pvrcnn_5%_FT 49.58 / 34.86 model-150M
PV-RCNN ~5.5 hours pvrcnn_5%_SESS 49.92 / 35.28 model-150M
PV-RCNN ~5.4 hours pvrcnn_5%_PS 49.84 / 35.07 model-150M
PV-RCNN++ ~16 hours source-only(Waymo) 31.96 / 19.81 -
PV-RCNN++ ~1.2 hours pvplus_5%_FT 49.94 / 34.28 model-185M
PV-RCNN++ ~4.2 hours pvplus_5%_SESS 51.14 / 35.25 model-185M
PV-RCNN++ ~3.6 hours pvplus_5%_PS 50.84 / 35.39 model-185M
  • For Waymo-to-ONCE adaptation, we employ 8 NVIDIA A100 GPUs for model training.
  • PS denotes that we pseudo-label the unlabeled ONCE and re-train the model on pseudo-labeled data.
  • SESS denotes that we utilize the SESS method to adapt the baseline.
  • For ONCE, the IoU thresholds for evaluation are 0.7, 0.3, 0.5 for Vehicle, Pedestrian, Cyclist.
Training ONCE Data Methods Vehicle@AP Pedestrian@AP Cyclist@AP download
Centerpoint Labeled (4K) Train from scracth 74.93 46.21 67.36 model-96M
Centerpoint_Pede Labeled (4K) PS - 49.14 - model-96M
PV-RCNN++ Labeled (4K) Train from scracth 79.78 35.91 63.18 model-188M
PV-RCNN++ Small Dataset (100K) SESS 80.02 46.24 66.41 model-188M

MDF Results:

Here, we report the Waymo-and-nuScenes consolidation results. The models are jointly trained on Waymo and nuScenes datasets, and evaluated on Waymo using the mAP/mAHPH LEVEL_2 and nuScenes using the BEV/3D AP. Please refer to Readme for MDF for more results.

  • All LiDAR-based models are trained with 8 NVIDIA A100 GPUs and are available for download.
  • The multi-domain dataset fusion (MDF) training time is measured with 8 NVIDIA A100 GPUs and PyTorch 1.8.1.
  • For Waymo dataset training, we train the model using 20% training data for saving training time.
  • PV-RCNN-nuScenes represents that we train the PV-RCNN model only using nuScenes dataset, and PV-RCNN-DM indicates that we merge the Waymo and nuScenes datasets and train on the merged dataset. Besides, PV-RCNN-DT denotes the domain attention-aware multi-dataset training.
Baseline MDF Methods Waymo@Vehicle Waymo@Pedestrian Waymo@Cyclist nuScenes@Car nuScenes@Pedestrian nuScenes@Cyclist
PV-RCNN-nuScenes only nuScenes 35.59 / 35.21 3.95 / 2.55 0.94 / 0.92 57.78 / 41.10 24.52 / 18.56 10.24 / 8.25
PV-RCNN-Waymo only Waymo 66.49 / 66.01 64.09 / 58.06 62.09 / 61.02 32.99 / 17.55 3.34 / 1.94 0.02 / 0.01
PV-RCNN-DM Direct Merging 57.82 / 57.40 48.24 / 42.81 54.63 / 53.64 48.67 / 30.43 12.66 / 8.12 1.67 / 1.04
PV-RCNN-Uni3D Uni3D 66.98 / 66.50 65.70 / 59.14 61.49 / 60.43 60.77 / 42.66 27.44 / 21.85 13.50 / 11.87
PV-RCNN-DT Domain Attention 67.27 / 66.77 65.86 / 59.38 61.38 / 60.34 60.83 / 43.03 27.46 / 22.06 13.82 / 11.52
Baseline MDF Methods Waymo@Vehicle Waymo@Pedestrian Waymo@Cyclist nuScenes@Car nuScenes@Pedestrian nuScenes@Cyclist
Voxel-RCNN-nuScenes only nuScenes 31.89 / 31.65 3.74 / 2.57 2.41 / 2.37 53.63 / 39.05 22.48 / 17.85 10.86 / 9.70
Voxel-RCNN-Waymo only Waymo 67.05 / 66.41 66.75 / 60.83 63.13 / 62.15 34.10 / 17.31 2.99 / 1.69 0.05 / 0.01
Voxel-RCNN-DM Direct Merging 58.26 / 57.87 52.72 / 47.11 50.26 / 49.50 51.40 / 31.68 15.04 / 9.99 5.40 / 3.87
Voxel-RCNN-Uni3D Uni3D 66.76 / 66.29 66.62 / 60.51 63.36 / 62.42 60.18 / 42.23 30.08 / 24.37 14.60 / 12.32
Voxel-RCNN-DT Domain Attention 66.96 / 66.50 68.23 / 62.00 62.57 / 61.64 60.42 / 42.81 30.49 / 24.92 15.91 / 13.35
Baseline MDF Methods Waymo@Vehicle Waymo@Pedestrian Waymo@Cyclist nuScenes@Car nuScenes@Pedestrian nuScenes@Cyclist
PV-RCNN++ DM Direct Merging 63.79 / 63.38 55.03 / 49.75 59.88 / 58.99 50.91 / 31.46 17.07 / 12.15 3.10 / 2.20
PV-RCNN++-Uni3D Uni3D 68.55 / 68.08 69.83 / 63.60 64.90 / 63.91 62.51 / 44.16 33.82 / 27.18 22.48 / 19.30
PV-RCNN++-DT Domain Attention 68.51 / 68.05 69.81 / 63.58 64.39 / 63.43 62.33 / 44.16 33.44 / 26.94 21.64 / 18.52

AD-PT Results on Waymo

AD-PT demonstrates strong generalization learning ability on 3D points. We first pre-train the 3D backbone and 2D backbone using the AD-PT on ONCE dataset (from 100K to 1M data), and fine-tune the model on different datasets. Here, we report the results of fine-tuning on Waymo.

Data amount Overall Vehicle Pedestrian Cyclist
SECOND (From scratch) 3% 52.00 / 37.70 58.11 / 57.44 51.34 / 27.38 46.57 / 28.28
SECOND (AD-PT) 3% 55.41 / 51.78 60.53 / 59.93 54.91 / 45.78 50.79 / 49.65
SECOND (From scratch) 20% 60.62 / 56.86 64.26 / 63.73 59.72 / 50.38 57.87 / 56.48
SECOND (AD-PT) 20% 61.26 / 57.69 64.54 / 64.00 60.25 / 51.21 59.00 / 57.86
CenterPoint (From scratch) 3% 59.00 / 56.29 57.12 / 56.57 58.66 / 52.44 61.24 / 59.89
CenterPoint (AD-PT) 3% 61.21 / 58.46 60.35 / 59.79 60.57 / 54.02 62.73 / 61.57
CenterPoint (From scratch) 20% 66.47 / 64.01 64.91 / 64.42 66.03 / 60.34 68.49 / 67.28
CenterPoint (AD-PT) 20% 67.17 / 64.65 65.33 / 64.83 67.16 / 61.20 69.39 / 68.25
PV-RCNN++ (From scratch) 3% 63.81 / 61.10 64.42 / 63.93 64.33 / 57.79 62.69 / 61.59
PV-RCNN++ (AD-PT) 3% 68.33 / 65.69 68.17 / 67.70 68.82 / 62.39 68.00 / 67.00
PV-RCNN++ (From scratch) 20% 69.97 / 67.58 69.18 / 68.75 70.88 / 65.21 69.84 / 68.77
PV-RCNN++ (AD-PT) 20% 71.55 / 69.23 70.62 / 70.19 72.36 / 66.82 71.69 / 70.70

ReSimAD

Here, we give the Download Link of our reconstruction-simulation dataset by the ReSimAD, consisting of nuScenes-like, KITTI-like, ONCE-like, and Waymo-like datasets that generate target-domain-like simulation points.

Specifically, please refer to LiDARSimLib for the technical details of simulating the target-domain-like points based on the reconstructed meshes. For perception module, please refer to PV-RCNN and PV-RCNN++ for model training and evaluation.

We report the zero-shot cross-dataset (Waymo-to-nuScenes) adaptation results using the BEV/3D AP performance as the evaluation metric for a fair comparison. Please refer to ReSimAD for more details.

Methods training time Adaptation Car@R40 Ckpt
PV-RCNN ~23 hours Source-only 31.02 / 17.75 Not Avaliable (Waymo License)
PV-RCNN ~8 hours ST3D 36.42 / 22.99 -
PV-RCNN ~8 hours ReSimAD 37.85 / 21.33 ReSimAD_ckpt
PV-RCNN++ ~20 hours Source-only 29.93 / 18.77 Not Avaliable (Waymo License)
PV-RCNN++ ~2.2 hours ST3D 34.68 / 17.17 -
PV-RCNN++ ~8 hours ReSimAD 40.73 / 23.72 ReSimAD_ckpt

Visualization Tools for 3DTrans

Acknowledge

  • Our code is heavily based on OpenPCDet v0.5.2. Thanks OpenPCDet Development Team for their awesome codebase.

  • Our pre-training 3D point cloud task is based on ONCE Dataset. Thanks ONCE Development Team for their inspiring data release.

Technical Papers

@software{Zhang_3DTrans_An_Open-source,
author = {Zhang, Bo and Yuan, Jiakang and Yan, Xiangchao},
license = {Apache-2.0},
title = {{3DTrans: An Open-source Codebase for Continuous Learning towards Autonomous Driving Task}},
url = {https://github.com/BOBrown/CCDA_LGFA}
}
@inproceedings{zhang2023uni3d,
  title={Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection},
  author={Zhang, Bo and Yuan, Jiakang and Shi, Botian and Chen, Tao and Li, Yikang and Qiao, Yu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={9253--9262},
  year={2023}
}
@inproceedings{yuan2023bi3d,
  title={Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection},
  author={Yuan, Jiakang and Zhang, Bo and Yan, Xiangchao and Chen, Tao and Shi, Botian and Li, Yikang and Qiao, Yu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={15599--15608},
  year={2023}
}
@article{yuan2023AD-PT,
  title={AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset},
  author={Yuan, Jiakang and Zhang, Bo and Yan, Xiangchao and Chen, Tao and Shi, Botian and Li, Yikang and Qiao, Yu},
  journal={arXiv preprint arXiv:2306.00612},
  year={2023}
}
@inproceedings{huang2023sug,
  title={SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification},
  author={Huang, Siyuan and Zhang, Bo and Shi, Botian and Gao, Peng and Li, Yikang and Li, Hongsheng},
  booktitle={Proceedings of the 31th ACM International Conference on Multimedia},
  year={2023}
}

3dtrans's People

Contributors

bobrown avatar jiakangyuan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.