oniroai / monodepth-pytorch Goto Github PK

Unofficial implementation of Unsupervised Monocular Depth Estimation neural network MonoDepth in PyTorch

Python 87.19% Jupyter Notebook 12.81%

monodepth stereo computer-vision deep-learning pytorch depth-estimation

monodepth-pytorch's Introduction

MonoDepth

This repo is inspired by an amazing work of Clément Godard, Oisin Mac Aodha and Gabriel J. Brostow for Unsupervised Monocular Depth Estimation. Original code and paper could be found via the following links:

MonoDepth-PyTorch

This repository contains code and additional parts for the PyTorch port of the MonoDepth Deep Learning algorithm. For more information about original work, please visit author's website

Purpose

Purpose of this repository is to make a more lightweight model for depth estimation with better accuracy. In our version of MonoDepth, we used ResNet50 as an encoder. It was slightly changed (with one more lateral shrinkage) as well as in the original repo.

Also, we add ResNet18 version and used batch normalization in both cases for training stability. Moreover, we made flexible feature extractor with any version of original Resnet from torchvision models zoo with an option to use pretrained models.

Dataset

KITTI

This algorithm requires stereo-pair images for training and single images for testing. KITTI dataset was used for training. It contains 38237 training samples. Raw dataset (about 175 GB) can be downloaded by running:

wget -i kitti_archives_to_download.txt -P ~/my/output/folder/

kitti_archives_to_download.txt may be found in this repo.

Dataloader

Dataloader assumes the following structure of the folder with train examples ('data_dir' argument contains path to that folder): The folder contains subfolders with following folders "image_02/data" for left images and "image_03/data" for right images. Such structure is default for KITTI dataset

Example data folder structure (path to the "kitti" directory should be passed as 'data_dir' in this example):

data
├── kitti
│   ├── 2011_09_26_drive_0001_sync
│   │   ├── image_02
│   │   │   ├─ data
│   │   │   │   ├── 0000000000.png
│   │   │   │   └── ...
│   │   ├── image_03
│   │   │   ├── data
│   │   │   │   ├── 0000000000.png
│   │   │   │   └── ...
│   ├── ...
├── models
├── output
├── test
│   ├── left
│   │   ├── test_1.jpg
│   │   └── ...

Training

Example of training can be find in Monodepth notebook.

Model class from main_monodepth_pytorch.py should be initialized with following params (as easydict) for training:

data_dir: path to the dataset folder
val_data_dir: path to the validation dataset folder
model_path: path to save the trained model
output_directory: where save dispairities for tested images
input_height
input_width
model: model for encoder (resnet18_md or resnet50_md or any torchvision version of Resnet (resnet18, resnet34 etc.)
pretrained: if use a torchvision model it's possible to download weights for pretrained model
mode: train or test
epochs: number of epochs,
learning_rate
batch_size
adjust_lr: apply learning rate decay or not
tensor_type:'torch.cuda.FloatTensor' or 'torch.FloatTensor'
do_augmentation:do data augmentation or not
augment_parameters:lowest and highest values for gamma, lightness and color respectively
print_images
print_weights
input_channels Number of channels in input tensor (3 for RGB images)
num_workers Number of workers to use in dataloader

Optionally after initialization, we can load a pretrained model via model.load.

After that calling train() on Model class object starts the training process.

Also, it can be started via calling main_monodepth_pytorch.py through the terminal and feeding parameters as argparse arguments.

Train results and pretrained model

Results presented on the gif image were obtained using the model with a resnet18 as an encoder, which can be downloaded from here.

For training the following parameters were used:

`model`: 'resnet18_md'
`epochs`: 200,
`learning_rate`: 1e-4,
`batch_size`: 8,
`adjust_lr`: True,
`do_augmentation`: True

The provided model was trained on the whole dataset, except subsets, listed below, which were used for a hold-out validation.

2011_09_26_drive_0002_sync  2011_09_29_drive_0071_sync
2011_09_26_drive_0014_sync  2011_09_30_drive_0033_sync
2011_09_26_drive_0020_sync  2011_10_03_drive_0042_sync
2011_09_26_drive_0079_sync

The demo gif image is a visualization of the predictions on 2011_09_26_drive_0014_sync subset.

See Monodepth notebook for the details on the training.

Testing

Example of testing can also be find in Monodepth notebook.

Model class from main_monodepth_pytorch.py should be initialized with following params (as easydict) for testing:

data_dir: path to the dataset folder
model_path: path to save the trained model
pretrained:
output_directory: where save dispairities for tested images
input_height
input_width
model: model for encoder (resnet18 or resnet50)
mode: train or test
input_channels Number of channels in input tensor (3 for RGB images)
num_workers Number of workers to use in dataloader

After that calling test() on Model class object starts testing process.

Also it can be started via calling main_monodepth_pytorch.py through the terminal and feeding parameters as argparse arguments.

Requirements

This code was tested with PyTorch 0.4.1, CUDA 9.1 and Ubuntu 16.04. Other required modules:

torchvision
numpy
matplotlib
easydict

monodepth-pytorch's People

Contributors

Stargazers

Watchers

Forkers

alwynmathew keishatsai khu-guru-ai tanmaniac snooble wpfhtl 3togo slumnitz satoshirobatofujimoto xiaohedu aaawsql exception4u taoihsu 174high bullud hyzcn chzhu940222 gaopeng91 hdjang yash0307 chenhongming ksnzh collector-m miyatat hansry hgrui iscoelacanth braun-steven huagl shovan777 moonmyth jiangxiaobai00 dtmoodie jinjingyi r5sb mindojune gerrygekao ggraffieti yindong97 weili1457355863 booiljung blac4t yinglongfeng ouceduxzk shubhampachori12110095 canflyzhou qinziwen researchase koutilya-pnvr zkwalt ferrine perrywu1989 neuzyy sudonuma ahangchen tangjunjun0925 tamwaiban buyiasd itking666 ylqm keepbefit farazsaeedan caikw0602 tom-cat-god ling-zzz headreaper-hc sanderkohnstamm wangqiqi577 bytesrobotics tuskaw philipderijk siyamsajeebkhan iyht kruskallin amansharma2910 duytrangiale anasgit amr-mustafa mush-room yshen47 adarshkosta chovyqw hrishikeshkale4 alextsolovikos jiang-stan kshitijminhas big-chan adelbennaceur malesilver sunshuofeng nyohohoho fruit-lab ztlai666 peterisfar cifang-kasit d2021101420 lijiunderstand emigmo deepthdisparityflow joochann

monodepth-pytorch's Issues

Multi-GPU support and multi-threaded data loading

I think it will be good to use nn.DataParallel
self.model = nn.DataParallel(self.model)

Maybe also add multi-threading in dataloader in your utils.py (This can decrease training time by 4 times atleast)

loader = DataLoader(dataset, batch_size= batch_size,
shuffle=True, num_workers= 4)

[Feature requested] Support for depth-separable(depth-wise) backbones such as MobileNetV2, EfficientNet...

Thank you for your excellent work! I want to deploy MonoDepth to an embedding chip(Horizon X3) which is computational efficient on backbones for depth-separable structure. I will appreciate if such a backbone network is provided!

how to output the predicted right image

Hi,

Is there a way to output the right side image being constructed from the predicted disparity during the self training step ?
Is there a way to 'sharpen' the disparity map being produced ?

Thanks,
OG

Difference between ResNet50_md and ResNet model

Hi, I want to ask the difference between them.
When I calculated the FLOPS of each model, the first one is 2.5 times smaller than the latter. Is there any significant change in your implementation of ResNet50_md comparing to the original one except one lateral shrinkage?

Provided model fails to load

I tried to run the model you uploaded here with parameters

python3 main_monodepth_pytorch.py --mode test --model "resnet18_md" --input_channels 3 --data_dir "dataset/" --model_path trained_kitti_model.pth --output_directory out
But i got the following error

Traceback (most recent call last): File "main_monodepth_pytorch.py", line 321, in <module> main() File "main_monodepth_pytorch.py", line 316, in main model_test = Model(args) File "main_monodepth_pytorch.py", line 145, in __init__ self.model.load_state_dict(torch.load(args.model_path)) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 719, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for Resnet18_md: size mismatch for iconv3.conv_base.weight: copying a param of torch.Size([64, 130, 3, 3]) from checkpoint, where the shape is torch.Size([64, 128, 3, 3]) in current model. size mismatch for iconv2.conv_base.weight: copying a param of torch.Size([32, 98, 3, 3]) from checkpoint, where the shape is torch.Size([32, 96, 3, 3]) in current model.

I removed + 2 in Resnet18_md
self.iconv3 = conv(64+64 + 2, 64, 3, 1)
and
self.iconv2 = conv(64+32 + 2, 32, 3, 1)

Model then loads but on forward pass I get this error

Traceback (most recent call last): File "main_monodepth_pytorch.py", line 321, in <module> main() File "main_monodepth_pytorch.py", line 317, in main model_test.test() File "main_monodepth_pytorch.py", line 298, in test disps = self.model(left) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/stas/gits/MonoDepth-PyTorch/models_resnet.py", line 297, in forward iconv3 = self.iconv3(concat3) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/stas/gits/MonoDepth-PyTorch/models_resnet.py", line 19, in forward x = self.conv_base(F.pad(x, p2d)) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 301, in forward self.padding, self.dilation, self.groups) RuntimeError: Given groups=1, weight of size [64, 128, 3, 3], expected input[2, 130, 130, 66] to have 128 channels, but got 130 channels instead

Could you update the uploaded model, code or maybe tell me what I'm doing wrong? Thanks

Python version 3.6.5, Pytorch version 0.4.1

How do you compute L-R Consistency

    # L-R Consistency
    right_left_disp = [self.generate_image_left(disp_right_est[i],
                       disp_left_est[i]) for i in range(self.n)]
    left_right_disp = [self.generate_image_right(disp_left_est[i],
                       disp_right_est[i]) for i in range(self.n)]


    # L-R Consistency
    lr_left_loss = [torch.mean(torch.abs(right_left_disp[i]
                    - disp_left_est[i])) for i in range(self.n)]
    lr_right_loss = [torch.mean(torch.abs(left_right_disp[i]
                     - disp_right_est[i])) for i in range(self.n)]
    lr_loss = sum(lr_left_loss + lr_right_loss)

Can you explain how did you compute it?

Question about Post-processing Code

Thanks a lot for sharing the code. What is the post-processing code doing:
https://github.com/ClubAI/MonoDepth-PyTorch/blob/master/main_monodepth_pytorch.py#L109-L117

def post_process_disparity(disp):
    (_, h, w) = disp.shape
    l_disp = disp[0, :, :]
    r_disp = np.fliplr(disp[1, :, :])
    m_disp = 0.5 * (l_disp + r_disp)
    (l, _) = np.meshgrid(np.linspace(0, 1, w), np.linspace(0, 1, h))
    l_mask = 1.0 - np.clip(20 * (l - 0.05), 0, 1)
    r_mask = np.fliplr(l_mask)
    return r_mask * l_disp + l_mask * r_disp + (1.0 - l_mask - r_mask) * m_disp

Key already registered with the same priority: GroupSpatialSoftmax

Hello, I trying to test your project that seems very nice, but I can't find how to run a simple test, I tried that:

main_monodepth_pytorch.py --data_dir=test --model_path=models/monodepth_resnet18_001.pth --output_directory=output --input_height=375 --input_width=1242 --model=resnet18 --mode=test --input_channels=3 --num_workers=4
But I get that error: Key already registered with the same priority: GroupSpatialSoftmax
What does that mean ? What can I do ?

Thanks in advance !

Endoscope images

Hi,

Thanks for the code.
I am about to use this repo for training to be able to estimate the depth for the images acquired from a stereo endoscope under the water. As I can see this and most of the mono depth methods are applied for the street and cars. Is there anything that I should do or not to do for training the model when my aim is to get the depth for underwater and small distant objects?

Also I noticed the amount of overlap between stereo images for me is not as much as typical images from the street views. So my problem is a smaller amount of the overlap.

Thanks for reading

Why Sigmoid?

Why is a sigmoid operation used for the output map?

Training data with monocular

Hi @tanmaniac,

Your work is valuable for me! However, I have a question about training customer data. If my data just have monocular data, not divided into right side and left side, can I train the data?

Thanks!

Can anyone provide the download URL of pretrained model 'monodepth_resnet18_001_cpt.pth' in this repository?

Before I dive into this project, I just want to test the monodepth model and get some intuitive impressions without training, but I don't find the download URL of the pretrained model 'monodepth_resnet18_001_cpt.pth' used in the .ipynb file.

Can anyone provide it.

Thank you!

This code is able to reproduce similar results to those in the original paper

Hi, months ago I opened an issue about performance reproducing and I forgot to give answers and feedbacks. Really sorry for that. Now as the previous issue has been closed, I open this one to tell the users that this code is able to reproduce almost the same results as Godard's paper. The parameter setting is suitable. Using the evaluation codes in https://github.com/mrharicot/monodepth/tree/master/utils is able to evaluate the performance on depth metrics. Thanks again for your impressive work!

Best,
Zhenyu

What is the loss value of the trained model？

I want to know the loss value of the provided trained model.
the provided trained model's parameter:
model: ‘resnet18_md’
epochs: 200,
learning_rate: 1e-4,
batch_size: 8,
adjust_lr: True,
do_augmentation: True
Thank you.

Test Data

Thank you for your great work!!
Where can I find the test data used here?

Is it the Cityscapes dataset?
Kitti was only used for training?

The scale is not correct

https://github.com/ClubAI/MonoDepth-PyTorch/blob/50e50b6e64080c0c05076e8552dde17342e5b236/loss.py#L44-L45

I think this should be [-0.5, 0.5] instead of [-1, 1]. which results in a 2x reduction in depth but has no influence with disparity.

[Question] How to get the metric disparity value?

Thanks for your great work！As i noticed that the output of the network is sigmod modulated values lie in [0, 1], so how to get the actual value of disparity?

RMSE keeps increasing after 25 epochs training while disparity prediction looks fine

I am training your code on the KITTI eigen split with the default hyper parameters in the README. At the beginning, everything was fine and the loss decreased to 1.5, rmse was around 10 meters. However, after 25 epochs training (learning rate is kept as 1e-4), continuous increase of rmse was observed, like 200~800 meters. I checked the predicted disparity map and it looked fine. A further check against the ground truth indicated that the crazy rmse error was due to small disparities in some regions, especially textureless areas.

I thought disparity smoothness loss should prevent jump of disparity prediction, right? Any ideas?

why the disp_gradient_loss, lr_loss is zero and the total loss is not getting converging?

I transform this code into tensorflow 2.0 version and trained a driving dataset, then I try your original code but this question still exist. can you help me? pleeeeeeeeeeeease........

Can't download pretrained model from link in README

I tried to download pretrained models, but link was broken.

Pretrained model on resnet18 or resnet50

The pretrained model mentioned in the readme has network architecture resnet18 or resnet50? Please update the same in readme.

License

Can you please add a license to this repo?

How to interpolate the sparse gt depth map？

Hello, I wonder how to transfer the sparse gt depth map to a dense one like that in your paper. Thanks!!!!

About the KITTI Split training data

Hi, I have a question about the training data.
The original author used the remaining 33 scenes for training. However, you use the total scenes except for the 8 scenes for testing. I am confuse about why do you follow the original training split.

Training becomes slower and slower after epochs and results get deteriorated

Hello, I am strictly using the parameters according to advice from here, but the result is getting worse along training process.

Another problem is the training process becomes slower (and the loss gets greater of course):
Epoch: 1 train_loss: 1.8654187655635037 val_loss: 1.3897445498384196 time: 985.607 s Model_saved
Epoch: 2 train_loss: 1.7081988794441578 val_loss: 1.299928411009348 time: 2059.05 s Model_saved
Epoch: 3 train_loss: 1.6670600274022838 val_loss: 1.2872194408765298 time: 3121.77 s Model_saved
Epoch: 4 train_loss: 1.6339830119013636 val_loss: 1.320097159008084 time: 4135.433 s
Epoch: 5 train_loss: 1.6094302785873893 val_loss: 1.505771610942589 time: 5179.27 s

Does this work on stereo image pair (not video)

Will this produce decent results with just left/right static images? I'm trying to find a library that can do that, but so far the ones I've found and tried produce awful results.

disparity map error

hi, thanks for your impressive work, which helps me a lot. but I met the following problem and I have no idea about it. Can you give me some points to solve it ?

thanks in advance!

How can I generate depth maps for other image datasets using the code here？

How can I get deep information?

Very thank you for your good job. I have studied for several days. I think the output of this code is only the disparities and dispariteis with post-processing. Am I ritht? If so, how can I get the deep information of my own input pics please? Look forward to your reply. Thank you very much.

performance

First, thanks for the code. It helps me a lot.
However with your code, I do not reach the same performance than what is reported in Godard's paper. Did you obtain the same performance? Could you report what you obtain in the readme?
best,

Doesnt follow standard kitti directory structure

The kitti directory structure given in the your readme seems to be incorrect.

Directory structure mentioned in your readme:

data
├── kitti
│   ├── 2011_09_26_drive_0001_sync
│   │   ├── image_02
│   │   │   ├─ data
│   │   │   │   ├── 0000000000.png
│   │   │   │   └── ...
│   │   ├── image_03
│   │   │   ├── data
│   │   │   │   ├── 0000000000.png
│   │   │   │   └── ...
│   ├── ...

Standard kitti directory structure as mentioned in mrharicot repo as below:

data
├── kitti
│   ├── 2011_09_30
│   │   ├── 2011_09_26_drive_0001_sync
│   │   │   ├── image_02
│   │   │   │   ├─ data
│   │   │   │   │   ├── 0000000000.png
│   │   │   │   │   └── ...
│   │   │   ├── image_03
│   │   │   │   ├─ data
│   │   │   │   │   ├── 0000000000.png
│   │   │   │   │   └── ...
│   │   ├── ...
│   ├── ...

Its not a big deal, still its good to follow the standard structure.

datasets

With the original kitti dataset, the data cannot be downloaded in the correct format

Unstable training

Results with lr=1e-3

Epoch 1:

Epoch 2:

Epoch 5:

Disparities are degrading as I train more.

test my data

Thanks for the good job, I want to kown that when only have a single image, whether to put this picture in both image_02 and image_03 ?

Question about how to get the original image

Your code is as follows:

x_shifts = disp[:, 0, :, :] # Disparity is passed in NCHW format with 1 channel
flow_field = torch.stack((x_base + x_shifts, y_base), dim=3)
In grid_sample coordinates are assumed to be between -1 and 1

output = F.grid_sample(img, 2*flow_field - 1, mode='bilinear',
padding_mode='zeros')

But why did you use 2*flow_field - 1? In my opinion, I think you should use
output = F.grid_sample(img, flow_field, mode='bilinear', padding_mode='zeros')

Could you explain this? Thank you very much!

Accuracy?

Hi,

Can you report how much increase in accuracy you made after those changes you mentioned?

Thanks

Why horizontal flip

Hi! I have a question about the horizontal flip transformation.

In my understanding, say the stereo system is left-right (i.e., not up-bottom), then horizontal flip would also reverse the left-right order of the images. (See the image below for an example).

While this reversed relation is fine if the output disparity map is allowed to be negative, in your implementation there does exist a sigmoid on the disparity map. So once the input is flipped, the output disparity is somehow counter-intuitive.

Could you tell me if I have any mistakes?