xy-guo / liga-stereo Goto Github PK

Code for LIGA-Stereo Detector, ICCV'21

License: Apache License 2.0

Python 86.35% C++ 5.14% Cuda 8.15% Shell 0.36%

liga-stereo's Introduction

LIGA-Stereo

Introduction

This is the official implementation of the paper LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector, In ICCV'21, Xiaoyang Guo, Shaoshuai Shi, Xiaogang Wang and Hongsheng Li.

[project page] [paper] [code]

Installation

Requirements

All the codes are tested in the following environment:

Linux (tested on Ubuntu 14.04 / 16.04)
Python 3.7
PyTorch 1.6.0
Torchvision 0.7.0
CUDA 9.2 / 10.1
spconv (commit f22dd9)

Installation Steps

a. Clone this repository.

git clone https://github.com/xy-guo/LIGA.git

b. Install the dependent libraries as follows:

Install the dependent python libraries:

pip install -r requirements.txt

Install the SparseConv library, we use the implementation from [spconv].

git clone https://github.com/traveller59/spconv
git reset --hard f22dd9
git submodule update --recursive
python setup.py bdist_wheel
pip install ./dist/spconv-1.2.1-cp37-cp37m-linux_x86_64.whl

Install modified mmdetection from [mmdetection_kitti]

git clone https://github.com/xy-guo/mmdetection_kitti
python setup.py develop

c. Install this library by running the following command:

python setup.py develop

Getting Started

The dataset configs are located within configs/stereo/dataset_configs, and the model configs are located within configs/stereo for different datasets.

Dataset Preparation

Currently we only provide the dataloader of KITTI dataset.

Please download the official KITTI 3D object detection dataset and organize the downloaded files as follows (the road planes are provided by OpenPCDet [road plane], which are optional for training LiDAR models):

LIGA_PATH
├── data
│   ├── kitti
│   │   │── ImageSets
│   │   │── training
│   │   │   ├──calib & velodyne & label_2 & image_2 & (optional: planes)
│   │   │── testing
│   │   │   ├──calib & velodyne & image_2
├── configs
├── liga
├── tools

You can also choose to link your KITTI dataset path by

YOUR_KITTI_DATA_PATH=~/data/kitti_object
ln -s $YOUR_KITTI_DATA_PATH/training/ ./data/kitti/
ln -s $YOUR_KITTI_DATA_PATH/testing/ ./data/kitti/

Generate the data infos by running the following command:

python -m liga.datasets.kitti.kitti_dataset create_kitti_infos
python -m liga.datasets.kitti.kitti_dataset create_gt_database_only

Training & Testing

Test and evaluate the pretrained models

To test with multiple GPUs:

./scripts/dist_test_ckpt.sh ${NUM_GPUS} ./configs/stereo/kitti_models/liga.yaml ./ckpt/pretrained_liga.pth

Train a model

Train with multiple GPUs

./scripts/dist_train.sh ${NUM_GPUS} 'exp_name' ./configs/stereo/kitti_models/liga.yaml

Pretrained Models

Google Drive

Citation

@InProceedings{Guo_2021_ICCV,
    author = {Guo, Xiaoyang and Shi, Shaoshuai and Wang, Xiaogang and Li, Hongsheng},
    title = {LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month = {October},
    year = {2021}
}

Acknowledgements

Part of codes are migrated from OpenPCDet and DSGN.

liga-stereo's People

Contributors

Stargazers

Watchers

Forkers

liga-stereo's Issues

GPU memory usage

The GPU memory usage reported in your paper is about 10G, but the GPU memory usage on my machine is about 18G when I train the model. Is there some different setting in the repo with your paper?

scripts/dist_train.sh

Hello,
Thanks for your excellent work !

I have several problem about distributed training

When i try to "CUDA_VISIBLE_DEVICE=0 python3 tools/train.py --cfg_file ${cfg} --batch_size 1" and
"CUDA_VISIBLE_DEVICE=0 ./scripts/dist_train.sh 1 exp cfg_path", it is worked.
but when i try to
"python3 tools/train.py --cfg_file ${cfg} --batch_size 1" or
"CUDA_VISIBLE_DEVICE=0,1,2,3 python3 tools/train.py --cfg_file ${cfg} --batch_size 1" or
"CUDA_VISIBLE_DEVICE=0,1,2,3 ./scripts/dist_train.sh 4 exp cfg_path", That are not worked. How can i modify about the code for distributed training?

pseudo-lidar coord

hello,

what's the difference between pseudo-lidar coodinate and Lidar coordinate?

Thankyou

The implementation process and experimental results of using "soft targets".

Hi Xiaoyang! Thanks for your great work.
In the Introduction of LIGA-Stereo, you mentioned

'Comparing with traditional knowledge distillation for recognition tasks, we did not take the final erroneous classification and regression predictions from the LiDAR model as “soft” targets, which we found benefits little for training stereo detection networks.'

Could you please elaborate on your implementation process and experimental results?

will you release the mmlab version of LIGA-Stereo?

Thank for your excellent work!

U-net网络咨询

你好，图2里在这个模型中的b部分第一个BEV特征与第二个BEV特征之间用到的2D Aggregation Network 在代码中的那里，能否给指出详细的位置（具体到开始的那一行），感谢！

Segmentation fault on CUDA 11.0/torch 1.7.1

Thank you for your great contribution.

CUDA 11.0?

I do manage to compile everything in a docker with CUDA 11.0/pytorch 1.7.1. including spconv (it seems that spconv show no error in build and install)

But after it start training for the first step, the code ends with error:

CUDA_VISIBLE_DEVICES=0 ./scripts/dist_train.sh 1 exp_name configs/stereo/kitti_models/liga.3d-and-bev.yaml

subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', 'tools/train.py', '--local_rank=0', '--launcher', 'pytorch', '--fix_random_seed', '--sync_bn', '--save_to_file', '--cfg_file', 'configs/stereo/kitti_models/liga.3d-and-bev.yaml', '--exp_name', 'exp_name']' died with <Signals.SIGSEGV: 11>.

Then I rewrite your code for single GPU training without distributed training (the re-written code is in my fork repo). Everything looks the same and it turns out to be a segmentation fault.

python3 tools/train.py --cfg configs/stereo/kitti_models/liga.3d-and-bev.yaml --launcher=none --batch_size 1

Segmentation fault (core dumped)

I have not fully investigated where does it happen.

CUDA 10

I then try using a lower CUDA version, but 3090 only supports CUDA 11+, and the current model is too large to fit into a single 1080Ti/2080Ti (similar to DSGN?).

Batch size > 1 on single GPU

First, thank you for your great work and code.

I saw in your code that you force the batch_size_per_gpu = 1. What's the reason for this config? If I want to train a larger batch size on a single GPU, which parts should I modify?

Look forward to your answer. Thanks.

Tips on installing for pytorch>1.11, to avoid <THC/THC.h> bugs

If you find bugs about <THC/THC.h>, you can do the following modifications:

uncomment this line:

LIGA-Stereo/liga/ops/build_cost_volume/src/BuildCostVolume_cuda.cu

Line 5 in aee3731

#include <THC/THC.h>
define a new ceil_div function:

int  ceil_div(int a, int b){ 
    return  (a + b - 1) / b; 
}

replace this line:

LIGA-Stereo/liga/ops/build_cost_volume/src/BuildCostVolume_cuda.cu

Line 232 in aee3731

dim3 grid(std::min(THCCeilDiv((long)(output_size / 2), 512L), 4096L));

with
dim3 grid(std::min(ceil_div((long)(output_size / 2), 512), 4096));
replace this line:

LIGA-Stereo/liga/ops/build_cost_volume/src/BuildCostVolume_cuda.cu

Line 278 in aee3731

dim3 grid(std::min(THCCeilDiv((long)grad.numel(), 512L), 4096L));

with
dim3 grid(std::min(ceil_div((long)(grad.numel()), 512) , 4096));
replace THCudaCheck(cudaGetLastError()); with AT_CUDA_CHECK(cudaGetLastError());

No module named liga.datasets.kitti.kitti_dataset

Thanks for your great work~

When I run the following commands:
python -m liga.datasets.kitti.kitti_dataset create_kitti_infos python -m liga.datasets.kitti.kitti_dataset create_gt_database_only

An error comes to me:
No module named liga.datasets.kitti.kitti_dataset

I find that there are only stereo_kitti_dataset.py and lidar_kitti_dataset.py in the path: liga/datasets/kitti/
Any suggestions would be deeply appreciated!
Thanks again.

Wrong versions of both mmcv and mmdet

Hello,

Thanks a lot for your wonderful work. I followed the instructions to install mmdet and mmcv. It returned the errors that "cannot import name 'MultiScaleDeformableAttention' from 'mmcv.cnn.bricks.transformer'". It seems that this module is not defined in mmcv.cnn.

I tried other versions, no one can match all the requires of the test repository. Could you please share the versions of mmcv and mmdet that you used in your project.

Thanks in advance. Hoping to hear from you soon.

Best.

please help me!

pseudo lidar coordinates

Hi, thanks for your great work!
I have a question about the coordinate system.

I notice that in the stereo_kitti_dataset.py file， there is the introduction of a pseudo-lidar coordinate system.

LIGA-Stereo/liga/datasets/kitti/stereo_kitti_dataset.py

Line 366 in aee3731

gt_boxes_lidar = box_utils.boxes3d_kitti_camera_to_lidar(

I would like to know why this function is not rect_to_lidar, but rect_to_lidar_pseudo? Is there any difference in labelling between double and single purpose?

生成数据信息不行

你好作者，我配置好环境来生成数据信息，执行你们的如下指令不行，显示没有这个模块，

我改成了lida里面的datssets里的那个lidar_kitti_dataset.py才可以这样是否正确。

Error: AttributeError: module 'matplotlib.cbook' has no attribute '_rename_parameter'

Hello Xiaoyang,

Thanks a lot for your great contribution! I am facing a problem when I run the following command:

python -m liga.datasets.kitti.kitti_dataset create_kitti_infos
python -m liga.datasets.kitti.kitti_dataset create_gt_database_only

First I didn't find "kitti_dataset" in the ~/liga/dataset/kitti/kitti_dataset, but I have lidar_kitti_dataset.py and stereo_kitti_dataset.py instead. Then I run this command "python -m liga.datasets.kitti.kitti_dataset create_kitti_infos", it returned the error: "AttributeError: module 'matplotlib.cbook' has no attribute '_rename_parameter'. "

Any ideas and suggestions will be helpful.

Thanks in advance.

when I run these scripts,there're some questions

Thanks to your sharing,but when i first run following codes in my docker containers
'./scripts/dist_train.sh 1 dev configs/stereo/kitti_models/liga.yaml'
or
'./scripts/dist_test_ckpt.sh 1 ./configs/stereo/kitti_models/liga.yaml ./ckpt/pretrained_liga.pth'
nothing to show!
If I cancle this processing by ctrl+c, run it again that will show
'''bash
Traceback (most recent call last):
File "tools/train.py", line 211, in
main()
File "tools/train.py", line 73, in main
args.tcp_port, args.local_rank, backend='nccl'
File "/root/LIGA-Stereo-master/liga/utils/common_utils.py", line 181, in init_dist_pytorch
world_size=num_gpus
File "/root/miniconda3/envs/liga/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 422, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/root/miniconda3/envs/liga/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 126, in _tcp_rendezvous_handler
store = TCPStore(result.hostname, result.port, world_size, start_daemon, timeout)
RuntimeError: Address already in use
Traceback (most recent call last):
File "/root/miniconda3/envs/liga/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/root/miniconda3/envs/liga/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/miniconda3/envs/liga/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in
main()
File "/root/miniconda3/envs/liga/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/envs/liga/bin/python', '-u', 'tools/train.py', '--local_rank=0', '--launcher', 'pytorch', '--fix_random_seed', '--sync_bn', '--save_to_file', '--cfg_file', 'configs/stereo/kitti_models/liga.yaml', '--exp_name', 'dev']' returned non-zero exit status 1.
'''
How should I solve it?

subprocess.CalledProcessError: Command '' died with <Signals.SIGSEGV: 11>.

Hi！Thanks for sharing your awesome code.
But I have some problem when i running this code...
My error massages:

data/kitti/training/image_2/001773.png
data/kitti/training/image_2/001816.png
data/kitti/training/image_2/002829.png
data/kitti/training/image_3/001773.png
data/kitti/training/image_3/001816.png
data/kitti/training/image_3/002829.png
{'NAME': 'filter_truncated', 'AREA_RATIO_THRESH': None, 'AREA_2D_RATIO_THRESH': None, 'GT_TRUNCATED_THRESH': 0.98}
filter truncated ratio: null 3d boxes [[ 2.99       -3.87       -0.66499996  4.43        1.84        1.75
  -0.2907964 ]] flipped False image idx 890 frame_id 001773 

/home/users/gaoshiyu01/anaconda3/envs/liga5/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:123: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
/home/users/gaoshiyu01/anaconda3/envs/liga5/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:143: UserWarning: The epoch parameter in `scheduler.step()` was not necessary and is being deprecated where possible. Please use `scheduler.step()` to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose.
  warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
data/kitti/training/image_2/004052.png
data/kitti/training/image_3/004052.png
Traceback (most recent call last):
  File "/home/users/gaoshiyu01/anaconda3/envs/liga5/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/users/gaoshiyu01/anaconda3/envs/liga5/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/users/gaoshiyu01/anaconda3/envs/liga5/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in <module>
    main()
  File "/home/users/gaoshiyu01/anaconda3/envs/liga5/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/users/gaoshiyu01/anaconda3/envs/liga5/bin/python', '-u', 'tools/train.py', '--local_rank=1', '--launcher', 'pytorch', '--fix_random_seed', '--sync_bn', '--save_to_file', '--cfg_file', './configs/stereo/kitti_models/liga.3d-and-bev.yaml', '--exp_name', 'test1']' died with <Signals.SIGSEGV: 11>.

Seems like a common bug caused by mmdet, so i followed the instruction from: mmdet bug report and checked my running/compiling libraries with nvcc, but everything seems alright, i still have no idea how to fix it, could you please provide more info, thanks a lot :)

My environment:

nvcc --version: 10.1
nvidia-smi: 10.2
Cudatoolkit: 10.1
python: 3.7.13
pytorch: 1.6.0
spconv : 1.2.1
mmcv-full: 1.2.1
mmdet: 2.6.0
mmpycocotools : 12.0.3

My conda list:

```

Name Version Build Channel
libgcc_mutex 0.1 main defaults
_openmp_mutex 5.1 1_gnu defaults
addict 2.4.0 pypi_0 pypi
blas 1.0 mkl defaults
ca-certificates 2022.07.19 h06a4308_0 defaults
certifi 2022.6.15 py37h06a4308_0 defaults
cudatoolkit 10.1.243 h6bb024c_0 defaults
cycler 0.11.0 pypi_0 pypi
cython 0.29.32 pypi_0 pypi
easydict 1.9 pypi_0 pypi
fire 0.4.0 pypi_0 pypi
fonttools 4.37.2 pypi_0 pypi
freetype 2.11.0 h70c0345_0 defaults
future 0.18.2 pypi_0 pypi
giflib 5.2.1 h7b6447c_0 defaults
imageio 2.21.3 pypi_0 pypi
importlib-metadata 4.12.0 pypi_0 pypi
intel-openmp 2021.4.0 h06a4308_3561 defaults
jpeg 9e h7f8727e_0 defaults
kiwisolver 1.4.4 pypi_0 pypi
lcms2 2.12 h3be6417_0 defaults
ld_impl_linux-64 2.38 h1181459_1 defaults
lerc 3.0 h295c915_0 defaults
libdeflate 1.8 h7f8727e_5 defaults
libffi 3.3 he6710b0_2 defaults
libgcc-ng 11.2.0 h1234567_1 defaults
libgomp 11.2.0 h1234567_1 defaults
libpng 1.6.37 hbc83047_0 defaults
libstdcxx-ng 11.2.0 h1234567_1 defaults
libtiff 4.4.0 hecacb30_0 defaults
libwebp 1.2.2 h55f646e_0 defaults
libwebp-base 1.2.2 h7f8727e_0 defaults
liga 0.1.0+0 dev_0
llvmlite 0.39.1 pypi_0 pypi
lz4-c 1.9.3 h295c915_1 defaults
matplotlib 3.5.3 pypi_0 pypi
mkl 2021.4.0 h06a4308_640 defaults
mkl-service 2.4.0 py37h7f8727e_0 defaults
mkl_fft 1.3.1 py37hd3c417c_0 defaults
mkl_random 1.2.2 py37h51133e4_0 defaults
mmcv-full 1.2.1 pypi_0 pypi
mmdet 2.6.0 dev_0
mmpycocotools 12.0.3 pypi_0 pypi
ncurses 6.3 h5eee18b_3 defaults
networkx 2.6.3 pypi_0 pypi
ninja 1.10.2 h06a4308_5 defaults
ninja-base 1.10.2 hd09550d_5 defaults
numba 0.56.2 pypi_0 pypi
numpy 1.21.5 py37h6c91a56_3 defaults
numpy-base 1.21.5 py37ha15fc14_3 defaults
opencv-python 4.6.0.66 pypi_0 pypi
openssl 1.1.1q h7f8727e_0 defaults
packaging 21.3 pypi_0 pypi
pillow 9.2.0 py37hace64e9_1 defaults
pip 22.1.2 py37h06a4308_0 defaults
protobuf 3.20.1 pypi_0 pypi
pyparsing 3.0.9 pypi_0 pypi
python 3.7.13 h12debd9_0 defaults
python-dateutil 2.8.2 pypi_0 pypi
pytorch 1.6.0 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch
pywavelets 1.3.0 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
readline 8.1.2 h7f8727e_1 defaults
scikit-image 0.19.3 pypi_0 pypi
scipy 1.7.3 pypi_0 pypi
setuptools 59.8.0 pypi_0 pypi
six 1.16.0 pyhd3eb1b0_1 defaults
spconv 1.2.1 pypi_0 pypi
sqlite 3.39.2 h5082296_0 defaults
tensorboardx 2.5.1 pypi_0 pypi
termcolor 2.0.1 pypi_0 pypi
terminaltables 3.1.10 pypi_0 pypi
tifffile 2021.11.2 pypi_0 pypi
tk 8.6.12 h1ccaba5_0 defaults
torchvision 0.7.0 py37_cu101 pytorch
tqdm 4.64.1 pypi_0 pypi
typing-extensions 4.3.0 pypi_0 pypi
wheel 0.37.1 pyhd3eb1b0_0 defaults
xz 5.2.5 h7f8727e_1 defaults
yapf 0.32.0 pypi_0 pypi
zipp 3.8.1 pypi_0 pypi
zlib 1.2.12 h5eee18b_3 defaults
zstd 1.5.2 ha4553b6_0 defaults

</details>

'stereo_kitti_dataset.py' has no function 'create_kitti_infos()' and 'create_gt_database_only'. How can I generate the data infos and gt_database?

Hi, Xiaoyang! I'm trying to reimplement your awesome work.

In 'Getting Started', you mentioned 'Generate the data infos by running the following command:'

python -m liga.datasets.kitti.kitti_dataset create_kitti_infos
python -m liga.datasets.kitti.kitti_dataset create_gt_database_only

Unfortunately, these are only 'lidar_kitti_dataset' and 'stereo_kitti_dataset' in './liga/datasets/kitti/'. I successfully created kitti_infos and gt_database by running python -m liga.datasets.kitti.lidar_kitti_dataset create_kitti_infos and python -m liga.datasets.kitti.lidar_kitti_dataset create_gt_database_only.

However, I don't know how to create kitti_infos for the stereo detector. When I ran python -m liga.datasets.kitti.stereo_kitti_dataset create_kitti_infos, I found that I can't get the .pkl files (kitti_infos) because there is no 'create_kitti_infos()' and 'create_gt_database_only' in stereo_kitti_dataset create_kitti_infos.py.

More directly, if I want to train the whole LIGA-Stereo instead of just the modified SECOND, should I first create kitti_infos for the Stereo detector and then run ./scripts/dist_train.sh ${NUM_GPUS} 'exp_name' ./configs/stereo/kitti_models/liga.3d-and-bev.yaml?

Look forward to your answer!

The first command of installation steps should be modified to 'git clone https://github.com/xy-guo/LIGA-Stereo.git'.

Why do you use "VoxelBackBone4xNoFinalBnReLU"?

Hello, there is "VoxelBackBone4x" in lidar model configs, but use "VoxelBackBone4xNoFinalBnReLU" in LIGA. why is that?