GithubHelp home page GithubHelp logo

filippoaleotti / reversing Goto Github PK

View Code? Open in Web Editor NEW
51.0 4.0 8.0 16.8 MB

Code for "Reversing the cycle: self-supervised deep stereo through enhanced monocular distillation"

License: Apache License 2.0

Python 100.00%
stereo-matching monocular-depth-estimation computer-vision deep-learning eccv-2020

reversing's Introduction

Reversing the cycle

Code for "Reversing the cycle: self-supervised deep stereo through enhanced monocular distillation"

Filippo Aleotti, Fabio Tosi, Li Zhang, Matteo Poggi, Stefano Mattoccia

Paper Short Video Long Video
Short presentation Short presentation Long presentation

Citation

@inproceedings{aleotti2020reversing,
  title={Reversing the cycle: self-supervised deep stereo through enhanced monocular distillation},
  author={Aleotti, Filippo and Tosi, Fabio and Zhang, Li and Poggi, Matteo and Mattoccia, Stefano},
  booktitle = {16th European Conference on Computer Vision (ECCV)},
  year={2020},
  publisher={Springer}
}

Abstract

In many fields, self-supervised learning solutions are rapidly evolving and filling the gap with supervised approaches. This fact occurs for depth estimation based on either monocular or stereo, with the latter often providing a valid source of self-supervision for the former. In contrast, to soften typical stereo artefacts, we propose a novel self-supervised paradigm reversing the link between the two. Purposely, in order to train deep stereo networks, we distill knowledge through a monocular completion network. This architecture exploits single-image clues and few sparse points, sourced by traditional stereo algorithms, to estimate dense yet accurate disparity maps by means of a consensus mechanism over multiple estimations. We thoroughly evaluate with popular stereo datasets the impact of different supervisory signals showing how stereo networks trained with our paradigm outperform existing self-supervised frameworks. Finally, our proposal achieves notable generalization capabilities dealing with domain shift issues.

From top to bottom, the left image, the right image and the predicted disparity map. No ground-truths or LiDaR data have been used to train the network.

Framework

Settings

Code tested using PyTorch 1.0.1 and python 3.x., using a single GPU. Requirements can be installed using the following script:

pip install -r requirements

Train

To replicate the pipeline, you have to:

  1. generate stereo labels with a Traditional Stereo Methodology
  2. train the Monocular Completion Network (MCN) using such labels
  3. Generate the proxy labels for the stereo network using the Consensus Mechanism
  4. Train the stereo network on the proxy labels created in (3)

Generate stereo labels with a Traditional Stereo Methodology

To create the initial stereo labels for training MCN, we used the code of the paper "Learning confidence measures in the wild", by F. Tosi, M. Poggi, A. Tonioni, L. Di Stefano and S. Mattoccia (BM/W).

Once compiled, you can generate stereo labels for a given pair with the command:

./build/bmvc2017 -l [left_image] -r [right_image] -o [output_path] \
                 -p da ds lrc apkr dsm uc med \
                 -n da ds lrc apkr uc \ 
                 -t 0.4 -b 1 -d 192

Train the Monocular Completion Network

We used MonoResMatch as monocular network, changing the network to accept also sparse stereo points as input. Original code of the network can be found here. Since the original network has been developed using TensorFlow, we used the same framework to train our MCN.

You can install MCN requirements in a new python 3 environment:

cd mono
pip install -r requirements.txt

Notice that at this point we use TensorFlow 1.10, while in the Stereo section we need TensorFlow 2.0. This difference is motivated by the automatic execution of TensorBoard in the stereo training, however you can disable it and use TensorBoard 1.10 even for the last stage.

Then,

python main.py --is_training \
               --data_path_image $path_full_kitti \
               --data_path_proxy $path_proxy \
               --batch_size $batch_size \
               --iterations $iterations \
               --patch_width $patch_width \
               --patch_height $patch_height \
               --initial_learning_rate $initial_learning_rate \
               --filenames_file $file_training \
               --learning_rate_schedule $learning_rate_schedule \
               --log_directory $log_directory \
               --width $width --height $height

where:

  • is_training: training flag. Add it in case of training
  • data_path_image: path to RGB dataset
  • data_path_proxy: path to traditional stereo proxies
  • patch_height: width of crop used a training time
  • patch_width: height of crop used a training time
  • batch_size: batch size
  • iterations: number of training steps
  • learning_rate_schedule: milestones in which the learning rate has to change
  • initial_learning_rate: initial value of learning rate
  • filenames_file: path to training txt files with names
  • log_directory: where checkpoints will be saved
  • width: width of images after initial resize
  • height: height of images after initial resize

You can test the single inference MCN using the test scripts:

python main.py  --output_path $output_path \
                --data_path_image $path_to_KITTI/2015/training \
                --data_path_proxy $path_proxy \
                --filenames_file "./utils/filenames/kitti_2015_test" \
                --checkpoint_path $checkpoint_path \
                --width 1280 --height 384 \
                --number_hypothesis -1
cd ..
python test/kitti.py --prediction $output_path --gt $path_to_KITTI/2015/training

where:

  • output_path: where the filtered proxies will be saved
  • data_path_image: path to RGB dataset
  • path_proxy: path to traditional stereo proxies. You can use BM/W again or a different method (e.g., SGM/L).
  • filenames_file: path to dataset filename file
  • checkpoint_path: path to pre-trained MCN
  • width: width of resized image
  • height: height of resized image
  • number_hypothesis: number of multiple inferences. If -1, do not apply the consensus mechanism, and save the single prediction of the network
  • input_points: rules the percentage of traditional stereo points given as input to the network. Default is 0.95, meaning that the network takes 5% of points as input

In this case the model takes few random stereo points as input and it does not apply the consensus mechanism over multiple inferences. To test the MCN with consensus mechanism, apply multiple inferences over the testing split and change the testing code, masking out not only ground-truth invalid points but also invalid points in predictions.

Generate the proxy labels for the stereo network using the Consensus Mechanism

You can generate the monocular proxies by running the consensus mechanism over multiple predictions. To obtain these proxies, you have to run the same script used to test MCN changing just some parameters.

python main.py  --output_path $output_path \
                --data_path_image $data_path_image  \
                --data_path_proxy $path_proxy \
                --filenames_file $filenames_file \
                --checkpoint_path $checkpoint_path \
                --width $width --height $height \
                --output_path $output_folder \
                --number_hypothesis $n \
                --temp_folder $temp

where:

  • number_hypothesis: number of multiple inferences. If -1, do not apply the consensus mechanism, and save the single prediction of the network
  • temp: temporary directory that contains multiple inferences
  • right: flag. To generate right images (e.g., for KITTI is image_03) and not left (image_02 on KITTI)

For instance, to generate proxies for KITTI train dataset, the script is:

python main.py  --output_path $output_path \
                --data_path_image $data_path_image  \
                --data_path_proxy $path_proxy \
                --filenames_file "./utils/filenames/kitti_2015_train.txt" \
                --checkpoint_path $checkpoint_path \
                --width 1280 --height 384 \
                --output_path $output_folder \
                --number_hypothesis 25 \
                --temp_folder kitti_temp

NOTE: if number_hypothesis > 0, then at each step we make a prediction both for the original and the flipped image. This means that with number_hypothesis == 25 you will obtain 50 images (that is, 50% changes of flip the image).

Train the stereo network

At this point, you can train the stereo network on the monocular proxies.

cd stereo
python main.py --mode train --model $model --epoch $epochs --milestone $milestone \
               --datapath $datapath --dataset $dataset \
               --proxy $proxy_path \
               --crop_w $w --crop_h $h --batch $batch \
               --loss_weights=$loss_weights

where:

  • model: architecture to train. Choices are [psm, iresnet, stereodepth, gwcnet]
  • epoch: number of training epochs
  • dataset: training dataset. Choices are [KITTI, DS]
  • crop_w: width of the cropped image
  • crop_h: height of the cropped image
  • batch: batch size
  • milestone: epoch in which the learning rate has to change
  • datapath: path to rgb images
  • proxy: path to proxies
  • loss_weights: weights of scales. Sequence comma separated, e.g. 0.2,0.6,1.0
  • maxdisp: maximum disparity. Default is 192
  • rgb_ext: extension of rgb images. Default is .jpg

For instance, to train psm the command is:

python main.py --mode train --model psm --epoch 11 --milestone 8 \
               --datapath $datapath --dataset KITTI \
               --proxy $proxy_path \
               --crop_w 640 --crop_h 192 --batch 2 \
               --loss_weights=0.2,0.6,1.0 \
               --maxdisp 192

Test

Pretrained models are available for download:

Network KITTI
PSMNet weights
IResNet weights
Stereodepth weights
GwcNet weights
MCN weights

KITTI

You can test them running a command like this:

python main.py --mode "test" --model $model \
                --gpu_ids $gpu \
                --datapath $data \
                --ckpt $ckpt \
                --results $results_dir \
                --qualitative \
                --final_h 384 \
                --final_w 1280

where:

  • model: network architecture. Options are [psm, iresnet, stereonet, gwcnet]
  • gpu_ids: gpu index. Default is 0
  • datapath: path to images
  • ckpt: path to pre-trained model
  • results: where results will be saved
  • qualitative: save also colored maps
  • final_h: height of image after padding
  • final_w: width of image after padding

Then, you can test disparities using the testing script:

python test/kitti.py --prediction $result/16bit/$model/KITTI --gt $path

where:

  • prediction: path to 16 bit disparities predicted by the model
  • gt: path to KITTI/2015/training on your machine

Middlebury

You can generate and test artifacts on Middlebury (at quarter resolution) running the command:

cd stereo
python main.py --mode "test" --model $model \
                --gpu_ids 0 \
                --maxdisp 192  \
                --dataset "MIDDLEBURY" \
                --ckpt $ckpt  \
                --datapath $datapath \
                --results "./results" \
                --final_h 512 \
                --final_w 1024
cd ..
python test/middlebury.py --prediction stereo/results/16bit/$model/MIDDLEBURY --gt $gt

where:

  • gt: path to MiddleburyTrainingQ folder (the script looks for pfm ground-truth)

ETH3D

You can generate and test artifacts on ETH3D running the command:

cd stereo
python main.py --mode "test" --model $model \
                --gpu_ids 0 \
                --maxdisp 192  \
                --dataset "ETH3D" \
                --ckpt $ckpt  \
                --datapath $datapath \
                --results "./results" \
                --final_h 512 \
                --final_w 1024
cd ..
python test/eth.py --prediction stereo/results/16bit/$model/ETH3D --gt $gt

where:

  • gt: path to ETH3D/training folder (the script looks for pfm ground-truth)

Single inference

You can run the network on a single stereo pair, or eventually on a list of pairs, using the following script:

python single_shot.py --left $left --right $right \
                      --model $model --ckpt $ckpt \
                      --maxdisp 192 \
                      --qualitative --cmap $map \
                      --final_h 384 \
                      --final_w 1280 \
                      --results $result_dir \
                      --gpu_ids 0 \
                      --maxval $maxval

where:

  • left: list (space separated) of paths to left images
  • right: list (space separated) of paths to right images
  • ckpt: path to checkpoint
  • maxdisp: maximum disparity value. Default 192
  • qualitative: if add, save prediction using a colormap. Otherwise, save 16 bit png image
  • cmap: select the colormap to apply in case of qualitative. Choices are [kitti, magma, jet, gray]
  • final_h: height of image after padding
  • final_w: width of image after padding
  • results: folder where predictions will be saved. If not exists, it will be created
  • gpu_ids: index of gpu to use
  • maxval: optional. For KITTI colormap, you can set even maxval. Default -1

Acknowledgment

We thank the authors that shared the code of their works. In particular:

reversing's People

Contributors

filippoaleotti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

reversing's Issues

About reproduction

Hello,

I can't reproduce the results in the paper using the pretrained psmnet model.
As it show in the paper, RMSE for psmnet model is 3.764. But I could not reproduce the results.

Can you please give me advise on this issue?

Thankyou.

proxy for stereo model

Hello,

Thank you for the great work!

Can you please provide Kitti proxy from mono network to train stereo net?

The provided mode is demaged!

I try to downloaded the pretrained model, like psm.tar. It is demaged!
And I downloaded other pretrained models, only mcn-bm-w-arc.zip is complete!
I tried on chrome, edge and firefox and only zip type is work.
So can you upload other models with zip type?
Thanks!

Better results without accounting for difficult images in Middlebury

Hi,

Thanks for publishing this great work.

I was testing your code on Middlebury and I realized that you obtain slightly better results when you don't multiply by 0.5 on difficult images.
Are you applying this factor because some other previous work did the same or there is any other reason to do it? If there is a previous work that does this scaling, could you tell me which?

By the way, I am referring to this:

difficult_images = ["PianoL", "Playroom", "Playtable", "Shelves", "Vintage"]
...
...
if seq in difficult_images:
    scalar = 0.5

Thanks in advance,
Sergio

Question Relating to the SGM proxy source

Hi, Thank you for the great work.
I have tried SGM implementations like OpenCV StereoSGBM_create or libSGM. But unfortunately none of them can reach the D1 & EPE indicated in your paper (D1=4.01, EPE=1.00 in Table 1, SGM with LRC).
Also, nearly all SGM implementations require hyper-parameters like P1, P2, uniqueness. It will be highly appreciated if you could provide yours when tuning on KITTI 2015 trainset.

Best.

Question about results in Table1 and Table3

Hi,
Thanks for your great work.
I have a question about the settings of experiments in Table 1 and Table 3.
In Table 1, it said that results were tested on KITTI 2015 training set. And in Table 3, it was said that results were tested on KITTI 2015. So, were the results of Table 3 also tested on training set or on the whole dataset(200 images)?

And In Table 3, best result of D1 is 2.78, with 92.53% density. In Table 1, best result of D1 is 3.68, with 100% density. Does it mean that MCN already can get good enough results?

Thanks

incomplete kitti_2015_train.txt

Hi! Thank you for your impressive work! But did you notice that kitti_2015_train.txt is corrupted located in mono/utils/filenames/kitti_2015_train.txt? Paper tells there should be 29K rectified stereo images rather than 19K lines in this file.

proxy disparity generation

Hey, I tried to use the method "Unsupervised-Confidence-Measure" to generate proxy disparities. However, I found it can only get left disparity. How did you generate the right one?

Another question is that sampling probability, also the argument "input_points" in main.py. I see it is different between train and test in your paper. Specially, 1/1000 in train stage, 1/20 in test. But you didn't set it in the readme. I want to confirm it.

hope for your reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.