cogsys-tuebingen / mobilestereonet Goto Github PK

Lightweight stereo matching network based on MobileNet blocks

License: Apache License 2.0

Python 100.00%

mobilestereonet's Introduction

MobileStereoNet

This repository contains the code for "MobileStereoNet: Towards Lightweight Deep Networks for Stereo Matching", presented at WACV 2022 [Paper] [Supp] [arXiv] [Video Presentation].

Input image

2D-MobileStereoNet prediction

3D-MobileStereoNet prediction

Evaluation Results

MobileStereoNets are trained and tested using SceneFlow (SF), KITTI and DrivingStereo (DS) datasets.
In the following tables, the first columns show the training sets. For instance, in the case of "SF + KITTI2015", the model is firstly pretrained on the SceneFlow dataset, and then finetuned on KITTI images.
The results are reported in End-point Error (EPE); the lower, the better.
Note that some experiments evaluate the zero-shot cross-dataset generalizability, e.g. when the model is trained on "SF + DS" and evaluated on "KITTI2015 val" or "KITTI2012 train".
The related trained models are provided in the tables as hyperlinks.

2D-MobileStereoNet

	SF test	DS test	KITTI2015 val	KITTI2012 train
SF	1.14	6.59	2.42	2.45
DS	-	0.67	1.02	0.96
SF + DS	-	0.73	1.04	1.04
SF + KITTI2015	-	1.41	0.79	1.18
DS + KITTI2015	-	0.79	0.65	0.91
SF + DS + KITTI2015	-	0.83	0.68	0.90

3D-MobileStereoNet

	SF test	DS test	KITTI2015 val	KITTI2012 train
SF	0.80	4.50	10.30	9.38
DS	-	0.60	1.16	1.14
SF + DS	-	0.57	1.12	1.10
SF + KITTI2015	-	1.53	0.65	0.90
DS + KITTI2015	-	0.65	0.60	0.85
SF + DS + KITTI2015	-	0.62	0.59	0.83

Results on KITTI 2015 validation

Predictions of difference networks

Results on KITTI 2015 Leaderboard

Leaderboard
2D-MobileStereoNet on the leaderboard
3D-MobileStereoNet on the leaderboard

Computational Complexity

Requirements for computing the complexity by two methods:

pip install --upgrade git+https://github.com/sovrasov/flops-counter.pytorch.git
pip install --upgrade git+https://github.com/Lyken17/pytorch-OpCounter.git
pip install onnx

Run the following command to see the complexity in terms of number of operations and parameters.

python cost.py

You can also compute the complexity of each part of the network separately. For this, the input size of each module has been written in cost.py.

Installation

Requirements

The code is tested on:

Ubuntu 18.04
Python 3.6
PyTorch 1.4.0
Torchvision 0.5.0
CUDA 10.0

Setting up the environment

conda env create --file mobilestereonet.yml
conda activate mobilestereonet

SceneFlow Dataset Preparation

Download the finalpass images and the disparity data for SceneFlow FlyingThings3D, Driving and Monkaa. For both, image and disparity data, move the directories in the TRAIN and TEST directories of the Driving and Monkaa Dataset (15mm_focallength/35mm_focallength for Driving, funnyworld_x2 etc. for Monkaa) into the FlyingThings3D TRAIN and TEST directories, respectively.

It should look like this:

frames_finalpass
│
└───TEST
│   │
│   └───A
│   └───B
│   └───C
│   
│
└───TRAIN
│   │
│   └───15mm_focallength
│   └───35mm_focallength
│   └───A
│   └───a_rain_of_stones_x2
│   └─── ..

Training

Set a variable for the dataset directory, e.g. DATAPATH="/Datasets/SceneFlow/". Then, run train.py as below:

Pretraining on SceneFlow

python train.py --dataset sceneflow --datapath $DATAPATH --trainlist ./filenames/sceneflow_train.txt --testlist ./filenames/sceneflow_test.txt --epochs 20 --lrepochs "10,12,14,16:2" --batch_size 8 --test_batch_size 8 --model MSNet2D

Finetuning on KITTI

python train.py --dataset kitti --datapath $DATAPATH --trainlist ./filenames/kitti15_train.txt --testlist ./filenames/kitti15_val.txt --epochs 400 --lrepochs "200:10" --batch_size 8 --test_batch_size 8 --loadckpt ./checkpoints/pretrained.ckpt --model MSNet2D

The arguments in both cases can be set differently depending on the model, dataset and hardware resources.

Prediction

The following script creates disparity maps for a specified model:

python prediction.py --datapath $DATAPATH --testlist ./filenames/kitti15_test.txt --loadckpt ./checkpoints/finetuned.ckpt --dataset kitti --colored True --model MSNet2D

Credits

The implementation of this code is based on PSMNet and GwcNet. Also, we would like to thank the authors of THOP: PyTorch-OpCounter, Flops counter and KITTI python utils.

License

This project is released under the Apache 2.0 license.

Citation

If you use this code, please cite this paper:

@inproceedings{shamsafar2022mobilestereonet,
  title={MobileStereoNet: Towards Lightweight Deep Networks for Stereo Matching},
  author={Shamsafar, Faranak and Woerz, Samuel and Rahim, Rafia and Zell, Andreas},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={2417--2426},
  year={2022}
}

Contact

The repository is maintained by Faranak Shamsafar.
[email protected]

mobilestereonet's People

Contributors

Stargazers

Watchers

mobilestereonet's Issues

mobilestereonet3d convert onnx error how to convert it

Links to trained models are broken

Hi, I found your code pretty promising and I would like to use it in a project I am developing to get depthmap predictions as an additional cue to a distinct task and dataset. However, I found that the links to download the trained models are broken. May I ask if you could make them available somehow? Thanks a lot in advance.
Xavier

How I can reduce GPU memory usage

my device is 3080 16GB, but with the original code, I can only train with batch_size=1, otherwise, cuda out of memory.

I tried with Mixed Precision, and I can increase batch_size to 2, how can I train with large batch size without dropping accuracy.

Can't reproduce the report performance

Hello! I am using the MSNet2D setting, but I am unable to achieve an EPE (endpoint error) of 1.14, as I am only able to get it down to 1.40. Could you please advise if I am doing something wrong?

Here are the parameters I am using:

Batch size: 8
Number of epochs: 20
Optimizer: Adam
Learning rate: 0.001
Betas: (0.9, 0.999)
Test batch size: 8
Model: MSNet2D

Why are there 3D conv. in the 2D model?

e.g. https://github.com/cogsys-tuebingen/mobilestereonet/blob/main/models/MSNet2D.py#L73

How to set the maxdisp value for a custom dataset?

Does the maxdisp value has a influence on the result of inference time or the performance? Besides, the result does not work well on my own dataset, I guess it is the problem of maxdisp value.

Slow inference on RTX 3050Ti

Is 0.11 seconds inference time normal for RTX 3050Ti?

disp to depth

how can i convert the disp to depth in kitti？thank you

could you please share your model files？

i want to verify other methods with yours，could you please share your model files？My mail address is [email protected]. Thanks a lot. @draeger

Thank you! Working to get this running in DepthAI for Embedded Use-Case

Hi there,

I just wanted to reach out to say that this looks awesome! We make a AI-capable stereo camera ecosystem called DepthAI (Github: https://github.com/luxonis/depthai-hardware) which actually is pretty well tuned to run MobileNetV1- and MobileNetV2- based networks.

So we are SUPER excited to try to get this running in our ecosystem!

No real issue here - just wanted to say thanks - and that we'll be working to get this running in DepthAI, issue luxonis/depthai#476.

-Brandon \ OpenCV / Luxonis

Could you please upload MSNet2D, MSNet3D and submodule ?

File names in file list don't seem to match file names in dataset(s)

e.g. https://github.com/cogsys-tuebingen/mobilestereonet/blob/main/filenames/ds_test.txt#L1 doesn't seem to match the filenames in DrivingStereo as in the dataset provided on the linked website: https://drivingstereo-dataset.github.io/, which, e.g., are in the form 2018-07-11-14-48-52_2018-07-11-15-45-53-241.jpg. also the folder structure seems to be different. it seems to be the same issue for SceneFlow.
is there any mismatch in the version of the dataset(s) or preprocessing required? in case, is the preprocessing described somewhere (maybe i missed it?).

train the network

Thank you for your great work. I want to see the details of the the extension from 2D to 3D convolution and find there is no model ! I noticed a record of detete, is there something wrong? waiting for the rest, thank you again.

Converting pytorch model to torchscript or ONNX format for inference in C++

Hi @AdrianZw

Are there any scripts or resources to convert the mobilestereonet to torchscript or ONNX format? I want to use those formats for inference in C++. Any suggestions would be greatly appreciated.

Thanks.

Pretrained Netwrok

Hi,

Thank you for this project! Are there any trained weights you can share?

Can't find the model!

Hi,
Thanks for your great work. Unfortunately I can't find the model weights. would you please share the download link of the model weights? thanks in advance.

Legacy autograd function with non-static forward method is deprecated

Hi,
i am using Cuda11.1 and encountered following error.
RuntimeError: Legacy autograd function with non-static forward method is deprecated. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)

def make_nograd_func(func):
def wrapper(*f_args, **f_kwargs):
with torch.no_grad():
ret = func(*f_args, **f_kwargs)
return ret
return wrapper

This is the function that throws the error while calling func()

thop missing from mobilestereonet.yml

conda env create --file mobilestereonet.yml

... env created succesfully ...

(base) tornado@tornado:~/mobilestereonet$ conda activate mobilestereonet
(mobilestereonet) tornado@tornado:~/mobilestereonet$ ls
cost.py  datasets  filenames  images  LICENSE  mobilestereonet.yml  models  prediction.py  README.md  train.py  utils
(mobilestereonet) tornado@tornado:~/mobilestereonet$ python cost.py
Traceback (most recent call last):
  File "cost.py", line 28, in <module>
    from thop import profile
ModuleNotFoundError: No module named 'thop'

fixed by

pip install thop

Training MSNet3D error

I am trying to train MSNet3D on stereodriving and KITTI2015 but faced this error:

File "train.py", line 205, in
train()
File "train.py", line 98, in train
loss, scalar_outputs, image_outputs = train_sample(sample, compute_metrics=do_summary)
File "train.py", line 155, in train_sample
disp_ests = model(imgL, imgR)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 161, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/opt/conda/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/trainer/mobile_stereo/models/MSNet3D.py", line 124, in forward
out1 = self.encoder_decoder1(cost0)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/trainer/mobile_stereo/models/MSNet3D.py", line 46, in forward
conv5 = F.relu(self.conv5(conv4) + self.redir2(conv2), inplace=True)
RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 2

Does anybody know how to solve this problem?
Looking forward to your reply！

where to download the right scene flow dataset for pretraining?

could you point me to where to download the right dataset for SF pretraining?

After follow Sceneflow hyperlink,
I downloaded Sceneflow disparity and frames_finalpass dataset.

But when I checked ./filenames/sceneflow_train.txt, the file path are different from dataset:

./filenames/sceneflow_train.txt: for example:
frames_finalpass/TRAIN/15mm_focallength/scene_backwards/slow/left/0224.png
frames_finalpass/TRAIN/A/0605/left/0015.png
frames_finalpass/TRAIN/B/0665/left/0006.png

in the downloaded dataset: for example:
frames_finalpass/15mm_focallength/scene_backwards/slow/left/0224.png
there is no TRAIN/, TRAIN/A/ or TRAIN/B directories.

Where is TRAIN/, TRAIN/A/, or TRAIN/B from?
Is there a script that I can run to automatically download all the dataset needed and prepare them in the right directory hierarchy as specified in ./filenames/*.txt? Please advise.

Thank you very much for your help in advance.

Will there be a release of the pretrained model?

Same

Have tried monochrome stereo pair input?

Hi,

I am impressed by your work and tried to run prediction on kitti dataset. It looks good.

But when I tried to run prediction on a monochrome dataset captured by an oak-d stereo camera, the results looks like random noise. Just wondering if you ever tried to run your model using monochrome input or not and how was the performance?

If you like I attached a pair of monochrome stereo images below for your trial. Please let me know how's your test result looks like. Thanks a lot

CUDA out of memory error

HI. I get out of memory error when executing train.py code. I reduced batch size but get same error. my GPU is Nvidia 1060 6GB.
the error is:
RuntimeError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 5.94 GiB total capacity; 4.78 GiB already allocated; 27.19 MiB free; 4.80 GiB reserved in total by PyTorch)

I appreciate any help to fix this error. thanks in advance.

update:
I could successfully train a model with batch size 1 but with batch size greater than 1 I get "out of memory error" which is discussed above.