zhangyp15 / occformer Goto Github PK

[ICCV 2023] OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

Home Page: https://arxiv.org/abs/2304.05316

License: Apache License 2.0

Python 90.73% Shell 0.67% Jupyter Notebook 0.05% Dockerfile 0.02% Makefile 0.03% Batchfile 0.04% CSS 0.01% C++ 5.00% Cuda 3.47%

autonomous-driving multi-camera occupancy-prediction semantic-segmentation semantic-scene-completion semantic-scene-understanding

occformer's Introduction

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

News

[2023/04/20] Update more pretrained weights.
[2023/04/12] Paper is on Arxiv.
[2023/04/11] Code and demo release.

Introduction

The vision-based perception for autonomous driving has undergone a transformation from the bird-eye-view (BEV) representations to the 3D semantic occupancy. Compared with the BEV planes, the 3D semantic occupancy further provides structural information along the vertical direction. This paper presents OccFormer, a dual-path transformer network to effectively process the 3D volume for semantic occupancy prediction. OccFormer achieves a long-range, dynamic, and efficient encoding of the camera-generated 3D voxel features. It is obtained by decomposing the heavy 3D processing into the local and global transformer pathways along the horizontal plane. For the occupancy decoder, we adapt the vanilla Mask2Former for 3D semantic occupancy by proposing preserve-pooling and class-guided sampling, which notably mitigate the sparsity and class imbalance. Experimental results demonstrate that OccFormer significantly outperforms existing methods for semantic scene completion on SemanticKITTI dataset and for LiDAR semantic segmentation on nuScenes dataset.

Demo

nuScenes:

SemanticKITTI:

Benchmark Results

LiDAR Segmentation on nuScenes test set: Semantic Scene Completion on SemanticKITTI test set:

Getting Started

[1] Check installation for installation. Our code is mainly based on mmdetection3d.

[2] Check data_preparation for preparing SemanticKITTI and nuScenes datasets.

[3] Check train_and_eval for training and evaluation.

[4] Check predict_and_visualize for prediction and visualization.

[5] Check test_submission for preparing the test submission to SemanticKITTI SSC and nuScenes LiDAR Segmentation.

Model Zoo

We provide the pretrained weights on SemanticKITTI and nuScenes datasets, reproduced with the released codebase.

Dataset	Backbone	SC IoU	SSC mIoU	LiDARSeg mIoU	Model Weights	Training Logs
SemanticKITTI	EfficientNetB7	36.42(val), 34.46(test)	13.50(val), 12.37(test)	-	Link	Link
nuScenes	R50	-	-	68.1	Link	Link
nuScenes	R101-DCN	-	-	70.0	Link	Link

For SemanticKITTI dataset, the validation performance may fluctuate around 13.2 ~ 13.6 (SSC mIoU) considering the limited training samples.

Related Projects

TPVFormer: Tri-perspective view (TPV) representation for 3D semantic occupancy.

OpenOccupancy: A large scale benchmark extending nuScenes for surrounding semantic occupancy perception.

Acknowledgement

This project is developed based on the following open-sourced projects: MonoScene, BEVDet, BEVFormer, Mask2Former. Thanks for their excellent work.

Citation

If you find this project helpful, please consider giving this repo a star or citing the following paper:

@article{zhang2023occformer,
  title={OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction},
  author={Zhang, Yunpeng and Zhu, Zheng and Du, Dalong},
  journal={arXiv preprint arXiv:2304.05316},
  year={2023}
}

occformer's People

Contributors

Stargazers

Watchers

occformer's Issues

“KeyError: 'MaskHungarianAssigner is already registered in bbox_assigner'”

Hi,
Thank you for sharing the great work!
When I run the scipt bash tools/dist_test.sh projects/configs/occformer_nusc/occformer_nusc_r50_256x704.py /rockywin.wang/occ_net/OccFormer/occformer_nusc_r50.pth 4 --pred-save /rockywin.wang/occ_net/OccFormer/01_pred_dir
I met the error below.
The version of MMCV is 1.4.0.

te/attempt_3/0/error.json
Traceback (most recent call last):
  File "tools/test.py", line 19, in <module>
    from projects.mmdet3d_plugin.datasets.builder import build_dataloader
  File "/rockywin.wang/occ_net/OccFormer/projects/mmdet3d_plugin/__init__.py", line 1, in <module>
    from .occformer import *
  File "/rockywin.wang/occ_net/OccFormer/projects/mmdet3d_plugin/occformer/__init__.py", line 5, in <module>
    from .mask2former import *
  File "/rockywin.wang/occ_net/OccFormer/projects/mmdet3d_plugin/occformer/mask2former/__init__.py", line 2, in <module>
    from .assigners import *
  File "/rockywin.wang/occ_net/OccFormer/projects/mmdet3d_plugin/occformer/mask2former/assigners/__init__.py", line 1, in <module>
    from .mask_hungarian_assigner import MaskHungarianAssigner
  File "/rockywin.wang/occ_net/OccFormer/projects/mmdet3d_plugin/occformer/mask2former/assigners/mask_hungarian_assigner.py", line 12, in <module>
    class MaskHungarianAssigner(BaseAssigner):
  File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/registry.py", line 312, in _register
    module_class=cls, module_name=name, force=force)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/registry.py", line 246, in _register_module
    raise KeyError(f'{name} is already registered '
KeyError: 'MaskHungarianAssigner is already registered in bbox_assigner'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 172562) of binary: /opt/conda/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed

ModuleNotFoundError: No module named 'visualize_nusc_release'

Hi,
When I run the script python projects/mmdet3d_plugin/visualize/visualize_nusc_video.py $YOUR_SAVE_PATH $YOUR_VISUALIZE_SAVE_PATH,I met the error below.

Traceback (most recent call last):
  File "projects/mmdet3d_plugin/visualize/visualize_nusc_video.py", line 8, in <module>
    from visualize_nusc_release import draw_nusc_occupancy
ModuleNotFoundError: No module named 'visualize_nusc_release'

I cannot find and import the visualize_nusc_release module.

When I try to execute the code after installing PyQT5 using pip install PyQT5 the following error occurs.

QObject::moveToThread: Current thread (0xd4bcc0) is not the object's thread (0x1416720).
Cannot move to target thread (0xd4bcc0)

qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/home/limlab/anaconda3/envs/occformer/lib/python3.7/site-packages/cv2/qt/plugins" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: xcb, eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx, webgl.


Aborted (core dumped)

please help me

How to generate the nuscenes_infos_temporal_trainval.pkl?

Hi,
Thank you for sharing the great work.
I want to generate the nuscenes_infos_temporal_trainval.pkl in the code.
Can you help me?

furthest_point_sample.py error

Hi, thank you for sharing your work.
However, when I use
python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data --version v1.0 --canbus ./data
to generate the info, some thing goes wrong like:
ImportError: /home/guixingtai/anaconda3/envs/occformer/lib/python3.7/site-packages/mmdet3d-0.17.1-py3.7-linux-x86_64.egg/mmdet3d/ops/furthest_point_sample/furthest_point_sample_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _Z39furthest_point_sampling_kernel_launcheriiiPKfPfPiP11CUstream_st

can you help me?
thanks

Expected dtype object, got 'numpy.dtype[uint8]'

When I try to run test.py with one GPU, the terminal reports Expected dtype object, got 'numpy.dtype[uint8] in Line 107 of projects/mmdet3d_plugin/datasets/pipelines/loading_nusc_occ.py. But i have checked Line122-123, nb.jit('u1[:,:,:](u1[:,:,:],i8[:,:])', nopython=True, cache=True, parallel=False) and the datatype matches the processed_label and label_voxel_pair. Can you help me solve this issue? Thx!

Thanks for your great work, I'm a little confused as to why the truck's iou for the validation set is an astonishing 25.53, but the test set is only 1.6

Thanks for your great work, I'm a little confused as to why the truck's iou for the validation set is an astonishing 25.53, but the test set is only 1.6.

What is the data_velodyne under SemanticKITTI folder

What is the data_velodyne under the SemanticKITTI folder? How can I get this?

"FileNotFoundError: [Errno 2] No such file or directory: 'data/nuscenes/panoptic/v1.0-trainval/9c1a84e36b054b92ba0b1fcfbc023796_panoptic.npz'"

Hi, Thank you for your sharing the great code.
When I run the script bash tools/dist_train.sh projects/configs/occformer_nusc/occformer_nusc_panoptic_r50_256x704_my.py 2,
I met the error . How can I generate the *_panoptic.npz.

Have you ever try origin bev_encoder？

Have you ever try origin bev_encoder of bevdepth, how about it's performance?

The influence of not using the c++ extension which provided in the original implementation of Deformable DETR?

Hi there, it's a great work.
I'd like to know if there is any impact, such as slow training or inference speed, etc, for only using the "CPU version of multi-scale deformable attention".

How generate the occupancy label visualization?

Hi,
Thank you for sharing the great work!
I want to visualize the occupancy label like the demo.
But I don't know how to do it.
Can you help me?

Can OccFormer have a per-voxel segmentation config instead of mask - cls

Hi there, I was wondering whether the Mask2FormerOccHead can be adapted to use a simple per-voxel loss such as CE (or Geo-Scal loss from MonoScene). I find that using mask and cls paradigm in Mask2Former is kinda difficult to apply known loss function in voxel space. Can you think of a way to adapt this in your code? Thank you

fov_mask

i wanna to visualize one photo with mask ,but i don't know how to generate the fov_mask.(the photo is come from the Semantic kitti)
where provide the method among the source code of Occformer?
Please,I have a terrible time on this problem

I need to generate the kind of picture with vision shadow

Clarification on Training Data for the Reported Results on SemanticKITTI Test

Hello,

I would like to seek clarification regarding the training data used for the reported results on the SemanticKITTI test. Specifically, I would like to know whether the results were obtained by training solely on the "train" subset or by combining both the "train" and "val" subsets.

Thanks for your assistance in providing this information.

Efficiency of the method

Congratulations on the amazing work.

I am curious about the efficiency of the method. Could you shed some light on the inference time and the memory requirement?

What's the results without the query-based Transformer decoder?

Congratulations on creating such an excellent and solid work!

However, I'm wondering about the results achieved without the query-based Transformer decoder, or in other words, the isolated impact of the Transformer Occupancy Decoder.
Given that the Dual-path Transformer Encoder guides the Voxel Features with the BEV feature, it seems that the voxel features should already possess sufficient fine-grained features for SSC.
Additionally, the absence of instance-level annotations could possibly reduce the impact of the Transformer Decoder and one-to-one matching.

I would appreciate any insights you may have on this matter.

“AttributeError: 'NuScenes' object has no attribute 'lidarseg_name2idx_mapping'”

Hi,
When I ran the scrript python projects/mmdet3d_plugin/tools/validate_lidarseg_submission.py --result-path $YOUR_SAVE_PATH --dataroot data/nuscenes --zip-out ., I met the error below.

  File "/rockywin.wang/occ_net/OccFormer/projects/mmdet3d_plugin/tools/validate_lidarseg_submission_my.py", line 158, in <module>
    zip_out=zip_out_)
  File "/rockywin.wang/occ_net/OccFormer/projects/mmdet3d_plugin/tools/validate_lidarseg_submission_my.py", line 35, in validate_submission
    mapper = LidarsegClassMapper(nusc)
  File "/opt/conda/lib/python3.7/site-packages/nuscenes/eval/lidarseg/utils.py", line 140, in __init__
    self.fine_idx_2_coarse_idx_mapping = self.get_fine_idx_2_coarse_idx()
  File "/opt/conda/lib/python3.7/site-packages/nuscenes/eval/lidarseg/utils.py", line 241, in get_fine_idx_2_coarse_idx
    for fine_name, fine_idx in self.nusc.lidarseg_name2idx_mapping.items():
AttributeError: 'NuScenes' object has no attribute 'lidarseg_name2idx_mapping'

The below is my data structure.

depth

Hello, i run the code of visualization gt_depths with image, why it shows this way. I haven't done the depth estimation before, but i saw the depth images are not shown as this.