GithubHelp home page GithubHelp logo

prstrive / epcdepth Goto Github PK

View Code? Open in Web Editor NEW
128.0 3.0 18.0 790 KB

[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

License: MIT License

Python 100.00%
monocular-depth-estimation monocular depth-estimation self-supervised unsupervised stereo data-augmentation deep-learning monodepth

epcdepth's Introduction

EPCDepth

EPCDepth is a self-supervised monocular depth estimation model, whose supervision is coming from the other image in a stereo pair. Details are described in our paper:

Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

Rui Peng, Ronggang Wang, Yawen Lai, Luyang Tang, Yangang Cai

ICCV 2021 (arxiv)

EPCDepth can produce the most accurate and sharpest result. In the last example, the depth of the person in the second red box should be greater than that of the road sign because the road sign obscures the person. Only our model accurately captures the cue of occlusion.

βš™ Setup

1. Recommended environment

  • PyTorch 1.1
  • Python 3.6

2. KITTI data

You can download the raw KITTI dataset (about 175GB) by running:

wget -i dataset/kitti_archives_to_download.txt -P <your kitti path>/
cd <your kitti path>
unzip "*.zip"

Then, we recommend that you converted the png images to jpeg with this command:

find <your kitti path>/ -name '*.png' | parallel 'convert -quality 92 -sampling-factor 2x2,1x1,1x1 {.}.png {.}.jpg && rm {}'

or you can skip this conversion step and by manually adjusting the suffix of the image from .jpg to .png in dataset/kitti_dataset.py. Our pre-trained model is trained in jpg, and the test performance on png will slightly decrease.

3. Prepare depth hint

Once you have downloaded the KITTI dataset as in the previous step, you need to prepare the depth hint by running:

python precompute_depth_hints.py --data_path <your kitti path>

the generated depth hint will be saved to <your kitti path>/depth_hints. You should also pay attention to the suffix of the image.

πŸ“Š Evaluation

1. Download models

Download our pretrained model and put it to <your model path>.

Pre-trained PP HxW Backbone Output Scale Abs Rel Sq Rel RMSE Ξ΄ < 1.25
model18_lr √ 192x640 resnet18 (pt) d0 0.0998 0.722 4.475 0.888
d2 0.1 0.712 4.462 0.886
model18 √ 320x1024 resnet18 (pt) d0 0.0925 0.671 4.297 0.899
d2 0.0920 0.655 4.268 0.898
model50 √ 320x1024 resnet50 (pt) d0 0.0905 0.646 4.207 0.901
d2 0.0905 0.629 4.187 0.900

Note: pt refers to pre-trained on ImageNet, and the results of low resolution are a bit different from the paper.

2. KITTI evaluation

This operation will save the estimated disparity map to <your disparity save path>. To recreate the results from our paper, run:

python main.py 
    --val --data_path <your kitti path> --resume <your model path>/model18.pth.tar 
    --use_full_scale --post_process --output_scale 0 --disps_path <your disparity save path>

The shape of saved disparities in numpy data format is (N, H, W).

3. NYUv2 evaluation

We validate the generalization ability on the NYU-Depth-V2 dataset using the mode trained on the KITTI dataset. Download the testing data nyu_test.tar.gz, and unzip it to <your nyuv2 testing date path>. All evaluation codes are in the nyuv2Testing folder. Run:

python nyuv2_testing.py 
    --data_path <your nyuv2 testing date path>
    --resume <your mode path>/model50.pth.tar --post_process
    --save_dir <your nyuv2 disparity save path>

By default, only the visualization results (png format) of the predicted disparity and ground-truth will be saved to <your nyuv2 disparity save path> on NYUv2 dataset.

πŸ“¦ KITTI Results

You can download our precomputed disparity predictions from the following links:

Disparity PP HxW Backbone Output Scale Abs Rel Sq Rel RMSE Ξ΄ < 1.25
disps18_lr √ 192x640 resnet18 (pt) d0 0.0998 0.722 4.475 0.888
disps18 √ 320x1024 resnet18 (pt) d0 0.0925 0.671 4.297 0.899
disps50 √ 320x1024 resnet50 (pt) d0 0.0905 0.646 4.207 0.901

πŸ–Ό Visualization

To visualize the disparity map saved in the KITTI evaluation (or other disparities in numpy data format), run:

python main.py --vis --disps_path <your disparity save path>/disps50.npy

The visualized depth map will be saved to <your disparity save path>/disps_vis in png format.

⏳ Training

To train the model from scratch, run:

python main.py 
    --data_path <your kitti path> --model_dir <checkpoint save dir> 
    --logs_dir <tensorboard save dir> --pretrained --post_process 
    --use_depth_hint --use_spp_distillation --use_data_graft 
    --use_full_scale

πŸ”§ Suggestion

  1. The magnitude of performance improvement: Data Grafting > Full-Scale > Self-Distillation. We noticed that the performance improvement of self-distillation becomes insignificant when the model capacity is large. Therefore, it is potential to explore more accurate self-distillation label extraction methods and better self-distillation strategies in the future.
  2. According to our experimental experience, the convergence of the self-supervised monocular depth estimation model using a larger backbone network is relatively unstable. You can verify your innovations on the small backbone first, and then adjust the learning rate appropriately to train on the big backbone.
  3. We found that using a pure RSU encoder has better performance than the traditional Resnet encoder, but unfortunately there is no RSU encoder pre-trained on Imagenet. Therefore, we firmly believe that someone can pre-train the RSU encoder on Imagenet and replace the resnet encoder of this model to get huge performance improvement.

βš– Citation

If you find our work useful in your research please consider citing our paper:

@inproceedings{epcdepth,
    title = {Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation},
    author = {Peng, Rui and Wang, Ronggang and Lai, Yawen and Tang, Luyang and Cai, Yangang},
    booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
    year = {2021}
}

πŸ‘©β€ Acknowledgements

Our depth hint module refers to DepthHints, the NYUv2 pre-processing refers to P2Net, and the RSU block refers to U2Net.

epcdepth's People

Contributors

prstrive avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

epcdepth's Issues

Getting different test results on the KITTI

  1. I downloaded your pre-trained model named "model18_lr" from: https://drive.google.com/file/d/1Z60MI_UdTHfoSFSFwLI39yfe8njEN6Kp/view?usp=sharing .

  2. I saved the estimated disparity map by your script:

python main.py
--val --data_path --resume /model18_192x640.pth.tar
--use_full_scale --post_process --output_scale 0 --disps_path

  1. I tested the depth map using the script provided by monodepth2
    ( https://github.com/nianticlabs/monodepth2/blob/master/evaluate_depth.py ).
    The command is: python evaluate_depth.py --data_path <dataset_dir> --eval_mono --ext_disp_to_eval <saved_depth_map> --post_process.

The result is:
Mono evaluation - using median scaling
Scaling ratios | med: 6.675 | std: 0.085

abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.169 & 0.981 & 5.269 & 0.241 & 0.745 & 0.943 & 0.978 \

It is not good. Is there anything I have missed?
Thank you!

Dataset

Thanks for your code. I have some question. Can I train this to train other datasets? And can I use pretrain model to predict which pictures are not from KITTI?

I sincerely congratulate you for publishing such an excellent article. After reading your article, I encountered a problem when running the code, I hope you can help to take a look

Traceback (most recent call last):
File "main.py", line 55, in
model.main()
File "/hpcfiles/users/hx/EPCDepth-main/model.py", line 89, in main
train_loss = self.train_epoch(epoch)
File "/hpcfiles/users/hx/EPCDepth-main/model.py", line 189, in train_epoch
progressbar.Timer(), ",", progressbar.ETA(), ",", progressbar.Variable('LR', width=1), ",",
AttributeError: module 'progressbar' has no attribute 'Variable'

Artifact appears as the training goes on

Hi, dear author, I really appreciate your awesome work! It is more stable and performs better than depth estimation with monocular video.

However, I met a problem when I trained EPCNet on my own dataset.
When the model is only trained for 3 epochs, the performance is good. However, when i trained for more epochs (such as 20 epochs), artifacts appears on the predicted disparity map, as shown in the following figures.
image
image(1)

What could be possible to lead to the result? Could you provide me some advice?
THANK YOU!

Getting different test results on the KITTI

Hi, first of all, thanks for your excellent work! When I tried to reproduce your results on the KITTI test set with your code and pretrained weights, I got different results from those reported in this repository. Specifically, I tested model50 with:
python main.py --val --data_path <kitti path> --resume <model path>/model50.tar --use_full_scale --post_process --output_scale 0 --disps_path <disparity save path> --num_layer 50 --batch_size 4
And the results are:

From Abs Rel Sq Rel RMSE Ξ΄ < 1.25
This Repository 0.091 0.646 4.207 0.901
My Reproduction 0.096 0.669 4.254 0.888

It is noticed that the extension name of the images in my KITTI dataset is .png, and you mentioned that 'Our pre-trained model is trained in jpg, and the test performance on png will slightly decrease.'. Are the differences just caused by the extension name of the images? Or do I misunderstand other things?

TypeError: expected str, bytes or os.PathLike object, not NoneType

Epoch 0/20: N/A% 00/5650 || Elapsed Time: 0:00:00,ETA: --:--:--,LR: -,Loss: ------
Traceback (most recent call last):
File "main.py", line 55, in
model.main()
File "/home/ji322906/EPCDepth/model.py", line 90, in main
train_loss = self.train_epoch(epoch)
File "/home/ji322906/EPCDepth/model.py", line 197, in train_epoch
for batch, data in enumerate(self.train_loader):
File "/home/ji322906/.conda/envs/jihyungkim94/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/home/ji322906/.conda/envs/jihyungkim94/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/home/ji322906/.conda/envs/jihyungkim94/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/home/ji322906/.conda/envs/jihyungkim94/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/ji322906/.conda/envs/jihyungkim94/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/ji322906/.conda/envs/jihyungkim94/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/ji322906/.conda/envs/jihyungkim94/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/ji322906/EPCDepth/dataset/kitti_dataset.py", line 151, in getitem
data["curr"] = self.transform(self.get_img(folder, frame_idx, side), is_flip, False, color_aug)
File "/home/ji322906/EPCDepth/dataset/kitti_dataset.py", line 88, in get_img
img_path = os.path.join(self.data_path, folder, "image_0{}/data".format(self.side_map[side]), "{:010d}{}".format(frame_idx, ".png"))
File "/home/ji322906/.conda/envs/jihyungkim94/lib/python3.8/posixpath.py", line 76, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

How do I fix it??

Multiple GPU training

Hi there, thanks for the excellent work! I am working on using a large backbone on your EPCDepth network which takes a lot of time to train. I am wondering if we can accelerate the training with multiple GPUs. I have tried using the torch.distributed but failed for some reason. Have you tried using multiple GPUs for training? I really appreciate any help you can provide.

distortion between near cars and adjacent environment

Thanks authors for the interesting idea in the paper.
In my test, the portion containing near car of the reconstructed point cloud is distorted, which means the disparity between near obvious cars and environment background is not predicted distinctly.
I guess three reasons may cause this. First, the encoding part of the network is not deep enough, the semantic is not learned well, so the difference between the environment and the vehicles may not be well judged. Second, the disparity output decoder contains down-sampled part, so the disparity of the car and adjacent environment may belong to the same grid in the output feature map. Third, the photo-metric loss contain lots of surrounding parts of the image such as the sky, making the fine-grained loss is submerged.
Please tell me if you ever encountered this situation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.