GithubHelp home page GithubHelp logo

haomo-ai / cam4docc Goto Github PK

View Code? Open in Web Editor NEW
165.0 12.0 10.0 11.54 MB

[CVPR 2024] Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications

License: MIT License

Python 98.34% C++ 0.63% Cuda 0.81% Shell 0.22%
3d-occupancy 3d-occupancy-prediction occupancy-prediction 4d-occupancy occupancy-forecasting surrounded-camera

cam4docc's Introduction

Cam4DOcc

The official code an data for the benchmark with baselines for our paper: Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications

This work has been accepted by CVPR 2024 πŸŽ‰

Junyi Ma#, Xieyuanli Chen#, Jiawei Huang, Jingyi Xu, Zhen Luo, Jintao Xu, Weihao Gu, Rui Ai, Hesheng Wang*

Citation

If you use Cam4DOcc in an academic work, please cite our paper:

@inproceedings{ma2024cvpr,
	author = {Junyi Ma and Xieyuanli Chen and Jiawei Huang and Jingyi Xu and Zhen Luo and Jintao Xu and Weihao Gu and Rui Ai and Hesheng Wang},
	title = {{Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications}},
	booktitle = {Proc.~of the IEEE/CVF Conf.~on Computer Vision and Pattern Recognition (CVPR)},
	year = 2024
}

Installation

We follow the installation instructions of our codebase OpenOccupancy, which are also posted here
  • Create a conda virtual environment and activate it
conda create -n cam4docc python=3.7 -y
conda activate cam4docc
  • Install PyTorch and torchvision (tested on torch==1.10.1 & cuda=11.3)
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
  • Install gcc>=5 in conda env
conda install -c omgarcia gcc-6
  • Install mmcv, mmdet, and mmseg
pip install mmcv-full==1.4.0
pip install mmdet==2.14.0
pip install mmsegmentation==0.14.1
  • Install mmdet3d from the source code
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
git checkout v0.17.1 # Other versions may not be compatible.
python setup.py install
  • Install other dependencies
pip install timm
pip install open3d-python
pip install PyMCubes
pip install spconv-cu113
pip install fvcore
pip install setuptools==59.5.0

pip install lyft_dataset_sdk # for lyft dataset
  • Install occupancy pooling
git clone [email protected]:haomo-ai/Cam4DOcc.git
cd Cam4DOcc
export PYTHONPATH=β€œ.”
python setup.py develop

Data Structure

nuScenes dataset

Lyft dataset

  • Please link your Lyft dataset to the data folder.
  • The required folders are listed below.

Note that the folders under cam4docc will be generated automatically once you first run our training or evaluation scripts.

Cam4DOcc
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ nuscenes/
β”‚   β”‚   β”œβ”€β”€ maps/
β”‚   β”‚   β”œβ”€β”€ samples/
β”‚   β”‚   β”œβ”€β”€ sweeps/
β”‚   β”‚   β”œβ”€β”€ lidarseg/
β”‚   β”‚   β”œβ”€β”€ v1.0-test/
β”‚   β”‚   β”œβ”€β”€ v1.0-trainval/
β”‚   β”‚   β”œβ”€β”€ nuscenes_occ_infos_train.pkl
β”‚   β”‚   β”œβ”€β”€ nuscenes_occ_infos_val.pkl
β”‚   β”œβ”€β”€ nuScenes-Occupancy/
β”‚   β”œβ”€β”€ lyft/
β”‚   β”‚   β”œβ”€β”€ maps/
β”‚   β”‚   β”œβ”€β”€ train_data/
β”‚   β”‚   β”œβ”€β”€ images/   # from train images, containing xxx.jpeg
β”‚   β”œβ”€β”€ cam4docc
β”‚   β”‚   β”œβ”€β”€ GMO/
β”‚   β”‚   β”‚   β”œβ”€β”€ segmentation/
β”‚   β”‚   β”‚   β”œβ”€β”€ instance/
β”‚   β”‚   β”‚   β”œβ”€β”€ flow/
β”‚   β”‚   β”œβ”€β”€ MMO/
β”‚   β”‚   β”‚   β”œβ”€β”€ segmentation/
β”‚   β”‚   β”‚   β”œβ”€β”€ instance/
β”‚   β”‚   β”‚   β”œβ”€β”€ flow/
β”‚   β”‚   β”œβ”€β”€ GMO_lyft/
β”‚   β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   β”œβ”€β”€ MMO_lyft/
β”‚   β”‚   β”‚   β”œβ”€β”€ ...

Alternatively, you could manually modify the path parameters in the config files instead of using the default data structure, which are also listed here:

occ_path = "./data/nuScenes-Occupancy"
depth_gt_path = './data/depth_gt'
train_ann_file = "./data/nuscenes/nuscenes_occ_infos_train.pkl"
val_ann_file = "./data/nuscenes/nuscenes_occ_infos_val.pkl"
cam4docc_dataset_path = "./data/cam4docc/"
nusc_root = './data/nuscenes/'

Training and Evaluation

We directly integrate the Cam4DOcc dataset generation pipeline into the dataloader, so you can directly run training or evaluate scripts and just wait 😏

Optionally, you can set only_generate_dataset=True in the config files to only generate the Cam4DOcc data without model training and inference.

Train OCFNetV1.1 with 8 GPUs

OCFNetV1.1 can forecast inflated GMO and others. In this case, vehicle and human are considered as one unified category.

For the nuScenes dataset, please run

bash run.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1.py 8

For the Lyft dataset, please run

bash run.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1_lyft.py 8

Train OCFNetV1.2 with 8 GPUs

OCFNetV1.2 can forecast inflated GMO including bicycle, bus, car, construction, motorcycle, trailer, truck, pedestrian, and others. In this case, vehicle and human are divided into multiple categories for clearer evaluation on forecasting performance.

For the nuScenes dataset, please run

bash run.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.2.py 8

For the Lyft dataset, please run

bash run.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.2_lyft.py 8
  • The training/test process will be accelerated several times after you generate datasets by the first epoch.

Test OCFNet for different tasks

If you only want to test the performance of occupancy prediction for the present frame (current observation), please set test_present=True in the config files. Otherwise, forecasting performance on the future interval is evaluated.

bash run_eval.sh $PATH_TO_CFG $PATH_TO_CKPT $GPU_NUM
# e.g. bash run_eval.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1.py ./work_dirs/OCFNet_in_Cam4DOcc_V1.1/epoch_20.pth  8

Please set save_pred and save_path in the config files once saving prediction results is needed.

VPQ evaluation of 3D instance prediction will be refined in the future.

Visualization

Please install the dependencies as follows:

sudo apt-get install Xvfb
pip install xvfbwrapper
pip install mayavi

where Xvfb may be needed for visualization in your server.

Visualize ground-truth occupancy labels. Set show_time_change = True if you want to show the changing state of occupancy in time intervals.

cd viz
python viz_gt.py

Visualize occupancy forecasting results. Set show_time_change = True if you want to show the changing state of occupancy in time intervals.

cd viz
python viz_pred.py

There is still room for improvement. Camera-only 4D occupancy forecasting remains challenging, especially for predicting over longer time intervals with many moving objects. We envision this benchmark as a valuable evaluation tool, and our OCFNet can serve as a foundational codebase for future research on 4D occupancy forecasting.

Basic Information

Some basic information as well as key parameters for our current version.

Type Info Parameter
train 23,930 sequences train_capacity
val 5,119 frames test_capacity
voxel size 0.2m voxel_x/y/z
range [-51.2m, -51.2m, -5m, 51.2m, 51.2m, 3m] point_cloud_range
volume size [512, 512, 40] occ_size
classes 2 for V1.1 / 9 for V1.2 num_cls
observation frames 3 time_receptive_field
future frames 4 n_future_frames
extension frames 6 n_future_frames_plus

Our proposed OCFNet can still perform well while being trained with partial data. Please try to decrease train_capacity if you want to explore more details with sparser supervision signals.

In addition, please make sure that n_future_frames_plus <= time_receptive_field + n_future_frames because n_future_frames_plus means the real prediction number. We estimate more frames including the past ones rather than only n_future_frames.

Pretrained Models

We will provide our pretrained models of the erratum version. Your patience is appreciated.

Deprecated:

Please download our pretrained models (for epoch=20) to resume training or reproduce results.

Version Google Drive Google Drive Baidu Cloud Baidu Yun Config
V1.0 link link only vehicle
V1.1 link link OCFNet_in_Cam4DOcc_V1.1.py
V1.2 link link OCFNet_in_Cam4DOcc_V1.2.py

Other Baselines

We also provide the evaluation on the forecasting performance of other baselines in Cam4DOcc.

TODO

The tutorial is being updated ...

We will release our pretrained models as soon as possible. OCFNetV1.3 and OCFNetV2 are on their way ...

Acknowledgement

We thank the fantastic works OpenOccupancy, PowerBEV, and FIERY for their pioneer code release, which provide codebase for this benchmark.

cam4docc's People

Contributors

bit-mjy avatar chen-xieyuanli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cam4docc's Issues

Wrap feature possible issue

In line 293 of ocfnet.py, I think the transformed_grid.shape[1]=4, which may indicate you only wrap 4 voxel grid feature to the wraped_x from x. I don't know if I understand right, I think the for loop should be the number of voxels. May you check again this part?

4D annotations

Hi, could you provide the 4D annotations as zip files?
Generating the dataset is taking longer than a week.

Thanks

How many GPU resources does this task require in total?

Hi! I noticed that your script uses 8 GPUs for training and testing simultaneously. How many GPU resources are needed to reproduce the results of this task, training and testing, in totally? Thanks for noticing this issue!

How to change the grid_size?

Hello! I have read this paper, which is an impressive work. However, I want to run your code on my computer (RTX4090) because the default grid size is 512 x 512 x 40, which is too big.

I have tried to change the grid_size to 200 x 200 x 16 and it can run appropriately. However, when I run the vis_gt.py, I get the below result:

242ac1b59fec4e0c9c200ffca81775e2_7c3bbed918f14e58b0ab7dd94bb4216d

Uploading eae0cfe3be5f44a5be4f3e2c961397ed_72a725a46e374dd281c955b5da4cd603.png…

about computation cost

Hi, thanks to your great work,

Can you tell me how much GPU memory is needed approximately for this task?

how much GPU memory do we need?

thanks for your release
I only have one gpu, memory is 48G, so i run this

run.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1.py 1

but seems my gpu is not enough

RuntimeError: CUDA out of memory. Tried to allocate 198.00 MiB (GPU 0; 47.54 GiB total capacity; 45.25 GiB already allocated; 18.19 MiB free; 45.46 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Is there a way to run this code with one 48G gpu

About training time

Hi,
I am trying to run training code in this repository with 5 GPUs, but as training continues, more and more training time is required. Can you help me with it?

87697c6fafa8904dbd40ca7abc17d861

Error with running a baseline

I experience the following problem when using config OCFNet_in_Cam4DOcc_V1.2.py:

  File "/home/eitan/Cam4DOcc/projects/occ_plugin/occupancy/image2bev/ViewTransformerLSSBEVDepth.py", line 495, in forward
    depth = self.depth_conv(depth)
  File "/fs/scratch/rb_bd_dlp_rng-dl01_cr_AID_employees/users/eitan/envs/cam4docc/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/fs/scratch/rb_bd_dlp_rng-dl01_cr_AID_employees/users/eitan/envs/cam4docc/lib/python3.7/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/fs/scratch/rb_bd_dlp_rng-dl01_cr_AID_employees/users/eitan/envs/cam4docc/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/fs/scratch/rb_bd_dlp_rng-dl01_cr_AID_employees/users/eitan/envs/cam4docc/lib/python3.7/site-packages/mmcv/ops/deform_conv.py", line 378, in forward
    False, self.im2col_step)
  File "/fs/scratch/rb_bd_dlp_rng-dl01_cr_AID_employees/users/eitan/envs/cam4docc/lib/python3.7/site-packages/mmcv/ops/deform_conv.py", line 109, in forward
    im2col_step=cur_im2col_step)
RuntimeError: CUDA error: no kernel image is available for execution on the device

What could be the problem? The installation seemed to work fine without any error.

where is segmentation folder?

Traceback (most recent call last):
  File "tools/train.py", line 204, in <module>
    main()
  File "tools/train.py", line 200, in main
    meta=meta)
  File "/home/glj/roseupram/Cam4DOcc/projects/occ_plugin/occupancy/apis/train.py", line 27, in custom_train_model
    meta=meta)
  File "/home/glj/roseupram/Cam4DOcc/projects/occ_plugin/occupancy/apis/mmdet_train.py", line 113, in custom_train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/home/glj/miniconda3/envs/cam4docc/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/glj/miniconda3/envs/cam4docc/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
    for i, data_batch in enumerate(self.data_loader):
  File "/home/glj/miniconda3/envs/cam4docc/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/home/glj/miniconda3/envs/cam4docc/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/home/glj/miniconda3/envs/cam4docc/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/home/glj/miniconda3/envs/cam4docc/lib/python3.7/site-packages/torch/_utils.py", line 434, in reraise
    raise exception
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/glj/miniconda3/envs/cam4docc/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/glj/miniconda3/envs/cam4docc/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/glj/miniconda3/envs/cam4docc/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/glj/roseupram/Cam4DOcc/projects/occ_plugin/datasets/cam4docc_dataset.py", line 100, in __getitem__
    data = self.prepare_train_data(idx)
  File "/home/glj/roseupram/Cam4DOcc/projects/occ_plugin/datasets/cam4docc_dataset.py", line 312, in prepare_train_data
    example = self.prepare_sequential_data(index)
  File "/home/glj/roseupram/Cam4DOcc/projects/occ_plugin/datasets/cam4docc_dataset.py", line 370, in prepare_sequential_data
    example = self.pipeline(input_seq_data)
  File "/home/glj/miniconda3/envs/cam4docc/lib/python3.7/site-packages/mmdet/datasets/pipelines/compose.py", line 40, in __call__
    data = t(data)
  File "/home/glj/roseupram/Cam4DOcc/projects/occ_plugin/datasets/pipelines/loading_instance.py", line 392, in __call__
    np.savez(seg_label_path, segmentation_saved_list2)
  File "<__array_function__ internals>", line 6, in savez
  File "/home/glj/miniconda3/envs/cam4docc/lib/python3.7/site-packages/numpy/lib/npyio.py", line 616, in savez
    _savez(file, args, kwds, False)
  File "/home/glj/miniconda3/envs/cam4docc/lib/python3.7/site-packages/numpy/lib/npyio.py", line 712, in _savez
    zipf = zipfile_factory(file, mode="w", compression=compression)
  File "/home/glj/miniconda3/envs/cam4docc/lib/python3.7/site-packages/numpy/lib/npyio.py", line 112, in zipfile_factory
    return zipfile.ZipFile(file, *args, **kwargs)
  File "/home/glj/miniconda3/envs/cam4docc/lib/python3.7/zipfile.py", line 1240, in __init__
    self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: './data/segmentation/2175d0e84f224ea69907e5c338bde395_b60a2bc82d88419497510082d9301a59.npz'

where is segmentation folder?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.