mcg-nju / camliflow Goto Github PK

View Code? Open in Web Editor NEW

221.0 6.0 21.0 2.53 MB

[CVPR 2022 Oral & TPAMI 2023] Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion

Home Page: https://arxiv.org/abs/2303.12017

Python 89.52% C++ 5.71% Cuda 4.66% C 0.10%

optical-flow scene-flow point-cloud multimodal cvpr2022

camliflow's Introduction

CamLiFlow & CamLiRAFT

This is the official PyTorch implementation for our two papers:

Conference version: CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation. (CVPR 2022 Oral)
Extended version (CamLiRAFT): Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion. (TPAMI 2023)

中文解读：https://zhuanlan.zhihu.com/p/616384758

Changes to the Conference Paper

In this extended version, we instantiate a new type of the bidirectional fusion pipeline, the CamLiRAFT based on the recurrent all-pairs field transforms. CamLiRAFT obtains significant performance improvements over the original PWC-based CamLiFlow and sets a new state-of-the-art record on various datasets.

Comparison with stereo scene flow methods: On FlyingThings3D, CamLiRAFT achieves 1.73 EPE2D and 0.049 EPE3D, 21% and 20% lower error compared to CamLiFlow. On KITTI, even the non-rigid CamLiRAFT performs on par with the previous state-of-the-art method RigidMask (SF-all: 4.97% vs. 4.89%). By refining the background scene flow with rigid priors, CamLiRAFT further achieves an error of 4.26%, ranking first on the leaderboard.
Comparison with LiDAR-only scene flow methods: The LiDAR-only variant of our method, dubbed CamLiRAFT-L, also outperforms all previous LiDAR-only scene flow methods in terms of both accuracy and speed (see Tab. 5 in the paper). Thus, CamLiRAFT-L can also serve as a strong baseline for LiDAR-only scene flow estimation.
Comparison on MPI Sintel: Without finetuning on Sintel, CamLiRAFT achieves 2.38 AEPE on the final pass of the Sintel training set, reducing the error by 12% and 18% over RAFT and RAFT-3D respectively. This demonstrates that our method has good generalization performance and can handle non-rigid motion.
Training schedule: The original CamLiFlow requires a complicated training schedule of Things (L2 loss) -> Things (Robust loss) -> Driving -> KITTI and takes about 10 days to train. CamLiRAFT simplifies the schedule to Things -> KITTI, and the training only takes about 3 days. (Tested on 4x RTX 3090 GPUs)

News

2023-11-05: CamLiRAFT is accepted to TPAMI. Thanks for the valuable suggestions from the reviewers!
2023-09-20: We provide a demo for CamLiRAFT, see demo.py for more details.
2023-03-22: We release CamLiRAFT, an extended version of CamLiFlow on https://arxiv.org/abs/2303.12017.
2022-03-29: Our paper is selected for an oral presentation.
2022-03-07: We release the code and the pretrained weights.
2022-03-03: Our paper is accepted by CVPR 2022.
2021-11-20: Our paper is available at https://arxiv.org/abs/2111.10502
2021-11-04: Our method ranked first on the leaderboard of KITTI Scene Flow.

Pretrained Weights

Model	Training set	Weights	Comments
CamLiRAFT	Things (80e)	camliraft_things80e.pt	Best generalization performance
CamLiRAFT	Things (150e)	camliraft_things150e.pt	Best performance on Things
CamLiRAFT	Things (150e) -> KITTI (800e)	camliraft_things150e_kitti800e.pt	Best performance on KITTI
CamLiRAFT-L	Things-Occ (100e)	camliraft_l_best_things_occ.pt	Best performance on Things-Occ
CamLiRAFT-L	Things-Occ (100e)	camliraft_l_best_kitti_occ.pt	Best generalization performance on KITTI-Occ
CamLiRAFT-L	Things-Noc (100e)	camliraft_l_best_things_noc.pt	Best performance on Things-Noc
CamLiRAFT-L	Things-Noc (100e)	camliraft_l_best_kitti_noc.pt	Best generalization performance on KITTI-Noc

Things-Occ means "occluded FlyingThings3D" and Things-Noc means "non-occluded FlyingThings3D". Same for KITTI-Occ and KITTI-Noc.

Precomputed Results

Here, we provide precomputed results for the submission to the online benchmark of KITTI Scene Flow. * denotes refining the background scene flow with rigid priors.

Model	D1-all	D2-all	Fl-all	SF-all	Link
CamLiFlow	1.81%	3.19%	4.05%	5.62%	camliflow-wo-refine.zip
CamLiFlow *	1.81%	2.95%	3.10%	4.43%	camliflow.zip
CamLiRAFT	1.81%	3.02%	3.43%	4.97%	camliraft-wo-refine.zip
CamLiRAFT *	1.81%	2.94%	2.96%	4.26%	camliraft.zip

Environment

Create a PyTorch environment using conda:

conda create -n camliraft python=3.7
conda activate camliraft
conda install pytorch==1.10.2 torchvision==0.11.3 cudatoolkit=11.3 -c pytorch

Install mmcv and mmdet:

pip install openmim
mim install mmcv-full==1.4.0
mim install mmdet==2.14.0

Install other dependencies:

pip install opencv-python open3d tensorboard hydra-core==1.1.0

Compile CUDA extensions for faster training and evaluation:

cd models/csrc
python setup.py build_ext --inplace

Download the ResNet-50 pretrained on ImageNet-1k:

wget https://download.pytorch.org/models/resnet50-11ad3fa6.pth
mkdir pretrain
mv resnet50-11ad3fa6.pth pretrain/

NG-RANSAC is also required if you want to evaluate on KITTI. Please follow https://github.com/vislearn/ngransac to install the library.

Demo

Then, run the following script to launch a demo of estimating optical flow and scene flow from a pair of images and point clouds:

python demo.py --model camliraft --weights /path/to/camliraft/checkpoint.pt

Note that CamLiRAFT is not very robust to objects at a greater distance, as the network has only been trained on data with a depth of less than 35m. If you are getting bad results on your own data, try scaling the depth of the point cloud to a range of 5 ~ 35m.

Evaluate CamLiFlow and CamLiRAFT

FlyingThings3D

First, download and preprocess the dataset (see preprocess_flyingthings3d_subset.py for detailed instructions):

python preprocess_flyingthings3d_subset.py --input_dir /mnt/data/flyingthings3d_subset

Then, download the pretrained weights camliraft_things150e.pt and save it to checkpoints/camliraft_things150e.pt.

Now you can reproduce the results in Table 2 (see the extended paper):

python eval_things.py testset=flyingthings3d_subset model=camliraft ckpt.path=checkpoints/camliraft_things150e.pt

KITTI

First, download the following parts:

Main data: data_scene_flow.zip
Calibration files: data_scene_flow_calib.zip
Disparity estimation (from GA-Net): disp_ganet.zip
Semantic segmentation (from DDR-Net): semantic_ddr.zip

Unzip them and organize the directory as follows:

datasets/kitti_scene_flow
├── testing
│   ├── calib_cam_to_cam
│   ├── calib_imu_to_velo
│   ├── calib_velo_to_cam
│   ├── disp_ganet
│   ├── flow_occ
│   ├── image_2
│   ├── image_3
│   ├── semantic_ddr
└── training
    ├── calib_cam_to_cam
    ├── calib_imu_to_velo
    ├── calib_velo_to_cam
    ├── disp_ganet
    ├── disp_occ_0
    ├── disp_occ_1
    ├── flow_occ
    ├── image_2
    ├── image_3
    ├── obj_map
    ├── semantic_ddr

Then, download the pretrained weights camliraft_things150e_kitti800e.pt and save it to checkpoints/camliraft_things150e_kitti800e.pt.

To reproduce the results without leveraging rigid-body assumptions (SF-all: 4.97%):

python kitti_submission.py testset=kitti model=camliraft ckpt.path=checkpoints/camliraft_things150e_kitti800e.pt

To reproduce the results with rigid background refinement (SF-all: 4.26%), you need to further refine the background scene flow:

python refine_background.py

Results are saved to submission/testing. The initial non-rigid estimations are indicated by the _initial suffix.

Sintel

First, download the flow dataset from: http://sintel.is.tue.mpg.de and the depth dataset from https://sintel-depth.csail.mit.edu/landing

Unzip them and organize the directory as follows:

datasets/sintel
├── depth
│   ├── README_depth.txt
│   ├── sdk
│   └── training
└── flow
    ├── bundler
    ├── flow_code
    ├── README.txt
    ├── test
    └── training

Then, download the pretrained weights camliraft_things80e.pt and save it to checkpoints/camliraft_things80e.pt.

Now you can reproduce the results in Table 4 (see the extended paper):

python eval_sintel.py testset=sintel model=camliraft ckpt.path=checkpoints/camliraft_things80e.pt

Evaluate CamLiRAFT-L

FlyingThings3D

There are two different ways of data preprocessing. The first setting is the one proposed by HPLFlowNet, which only keeps non-occluded points during the preprocessing. The second setting, proposed by FlowNet3D, remains the occluded points.

# Non-occluded
python eval_things_noc_sf.py testset=flyingthings3d_subset_hpl model=camlipwc_l ckpt.path=checkpoints/camliraft_l_best_things_noc.pt
# Occluded
python eval_things_occ_sf.py testset=flyingthings3d_subset_flownet3d model=camliraft_l ckpt.path=checkpoints/camliraft_l_best_things_occ.pt

KITTI

Same with FlyingThings3D, there are two different ways of data preprocessing. We report results on both settings.

# Non-occluded
python eval_kitti_noc_sf.py testset=kitti model=camliraft_l ckpt.path=checkpoints/camliraft_l_best_kitti_noc.pt
# Occluded
python eval_kitti_occ_sf.py testset=kitti model=camliraft_l ckpt.path=checkpoints/camliraft_l_best_kitti_occ.pt

Training

FlyingThings3D

You need to preprocess the FlyingThings3D dataset before training (see preprocess_flyingthings3d_subset.py for detailed instructions).

Train CamLiRAFT on FlyingThings3D (150 epochs):

python train.py trainset=flyingthings3d_subset valset=flyingthings3d_subset model=camliraft

The entire training process takes about 3 days on 4x RTX 3090 GPUs.

KITTI

Finetune the model on KITTI using the weights trained on FlyingThings3D:

python train.py trainset=kitti valset=kitti model=camliraft ckpt.path=checkpoints/camliraft_things150e.pt

The entire training process takes about 0.5 days on 4x RTX 3090 GPUs. We use the last checkpoint (800th) to generate the submission.

Citation

If you find them useful in your research, please cite:

@article{liu2023learning,
  title   = {Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion},
  author  = {Haisong Liu and Tao Lu and Yihui Xu and Jia Liu and Limin Wang},
  journal = {arXiv preprint arXiv:2303.12017},
  year    = {2023}
}

@inproceedings{liu2022camliflow,
  title     = {Camliflow: bidirectional camera-lidar fusion for joint optical flow and scene flow estimation},
  author    = {Liu, Haisong and Lu, Tao and Xu, Yihui and Liu, Jia and Li, Wenjie and Chen, Lijun},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages     = {5791--5801},
  year      = {2022}
}

camliflow's People

Contributors

Stargazers

Watchers

camliflow's Issues

Question about the implementation of the ablation in the first row of table 8 in your pami version

Hi,
Thanks for your excellent work. In the table 8 of the pami version, you mentioned that the naive implementation is directly projecting 3D features on to image plane with empty locations filled by zeros. But I don't find the corresponding code in this project, could you please explain more about the details? For example, the coordinates will be decimal and is hard to process in the lower levels.

"out of memory" , when execute kitti_submission

when execute kitti_submission.py file, it show error message:
`File "kitti_submission.py", line 162, in
evaluator.run()
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "kitti_submission.py", line 104, in run
flow_3d_dense = knn_interpolation(
File "/workspace/CamLiFlow/models/utils.py", line 149, in knn_interpolation
knn_indices = k_nearest_neighbor(input_xyz, query_xyz, k) # [batch_size, n_queries, 3]
File "/workspace/CamLiFlow/models/csrc/wrapper.py", line 128, in k_nearest_neighbor
return _k_nearest_neighbor_py(input_xyz, query_xyz, k)
File "/workspace/CamLiFlow/models/csrc/wrapper.py", line 117, in _k_nearest_neighbor_py
dists = squared_distance(_query_xyz, _input_xyz)
File "/workspace/CamLiFlow/models/csrc/wrapper.py", line 50, in squared_distance
dist = -2 * torch.matmul(xyz1, xyz2.permute(0, 2, 1))
RuntimeError: CUDA out of memory. Tried to allocate 14.21 GiB (GPU 0; 22.38 GiB total capacity; 14.25 GiB already allocated; 7.34 GiB free; 14.29 GiB reserved in total by PyTorch)

I have set batch_size=1, but don't work. How much capacity when execute this script?
Could you give me some advice? Thank you！

About Testing

Please：
Your article is great！！！
But I want to test my datasets about optical flow , so what should I do next? I only see the evaluation，No having demo or test module to support my test.
Hope to get a reply

loss INF in training

Thank you for opensource this wonderful work!
When I tried to train your model, I found that the loss appeared inf in the first things pretraining stage.
I use 4 3090 GPUs and the pytorch version is 1.9.1+cu111 with python3.8.
I haven't changed the data preprocessing and training code, have you ever had this problem before?

Here is the training log.
train.log

Whether camera intrinsic parameters are required？

Whether must that get the camera intrinsic parameters in the stage of inference？
In practice，Can i use the depth estimate method get image depth replace the points cloud depth from camera intrinsic params?
Looking forward to your reply， thanks!

Error when loading FlyingThings3D

Hi, thanks for the nice work. But I got the following error when loading the FlyingThings3D subset after preprocessing. I've checked that the pcs actually are 0 length. Do you have any idea why I got this error and how to fix it? Thanks.

Traceback (most recent call last):
File "main.py", line 554, in
main(args)
File "main.py", line 301, in main
for i, sample in enumerate(train_loader):
File "/mimer/NOBACKUP/groups/alvis_cvl/yushanz/miniconda3/envs/sceneflow/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 628, in next
data = self._next_data()
File "/mimer/NOBACKUP/groups/alvis_cvl/yushanz/miniconda3/envs/sceneflow/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
return self._process_data(data)
File "/mimer/NOBACKUP/groups/alvis_cvl/yushanz/miniconda3/envs/sceneflow/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
data.reraise()
File "/mimer/NOBACKUP/groups/alvis_cvl/yushanz/miniconda3/envs/sceneflow/lib/python3.8/site-packages/torch/_utils.py", line 543, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/mimer/NOBACKUP/groups/alvis_cvl/yushanz/miniconda3/envs/sceneflow/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/mimer/NOBACKUP/groups/alvis_cvl/yushanz/miniconda3/envs/sceneflow/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/mimer/NOBACKUP/groups/alvis_cvl/yushanz/miniconda3/envs/sceneflow/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 58, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/mimer/NOBACKUP/groups/alvis_cvl/yushanz/scene_flow/sceneflow/dataloader/flyingthings3d.py", line 65, in getitem
indices2 = np.random.choice(pc2.shape[0], size=self.n_points, replace=pc2.shape[0] < self.n_points)
File "mtrand.pyx", line 909, in numpy.random.mtrand.RandomState.choice
ValueError: a must be greater than 0 unless no samples are taken

Implementing details in CamLiRAFT

Hi, thanks for the great work!
I wonder how did you get the numbers in Table 2 and Table 5 in CamLiRAFT?
From my understanding, Table 2 is trained on dataset using lifting preprocess from FlowNet3D, training/testing split is 19640/3824. All points including the occluded ones are evaluated for 3D metrics.
Table 2 left part is trained on dataset using lifting preprocess from HPLFlowNet, right part is trained on dataset using lifting preprocess from FlowNet3D, training/testing split is 19640/3824. Only non-occluded points are evaluated for 3D metrics.
Is that correct? How did you get these numbers? Did you train it by yourself? Are these numbers the final version or will it change since it's currently an arxiv version.

work on Quadro RTX 6000

Thanks for your great work! The code didn‘t work on that platform because of the correlation module in /model/csrc/correlation2d.
##################################
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

where the supplementary material

I want to know how to download the supplementary material,
thank you!

About validation result on kitti val 40

Great work ! !

Do you still have your validation result on your kitti val 40 (not the submission version), THX :)

Missing link to download pretrained models for camliraft_l (lidar only)

Hello,
I would like to use demo.py with the model camliraft_l (lidar only).

I added the model in the assert (line 149) and I am adding the argument --model camliraft_l in my shell command line.

The network still cannot run (using camliraft_things80e.pt), since I suppose the weights are not matching.

Can you please share the pre-trained .pt files also for the camliraft_l model? Is that the issue?

Best wishes,
Guido

Make inferences on my own data

Hi, thank for your great work!!! How can I make an inference on my own data? such as KITTI Odometry. A RAFT-3D demo like this: https://github.com/princeton-vl/RAFT-3D/blob/master/scripts/demo.py#L53

About perspect2parallel and parallel2perspect

Hi there, thanks for your excellent work!
When I debug, I'm confused about the functions perspect2parallel and parallel2perspect, could your tell me what do these two functions do or which part of the paper do they correspond to? Thanks a lot!

About PointConvDW

Hi~
Thanks for your great works. I was very curious about the PointConvDW used in the CamLiRAFT, why did the standard PointConv be used in the Encoder3D, while later PointConvDW was used in FlowHead3D, GRU3D, and MotionEncoder3D? Is there a specific reason?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.