GithubHelp home page GithubHelp logo

oniroai / monodepth-pytorch Goto Github PK

View Code? Open in Web Editor NEW
538.0 15.0 128.0 34.37 MB

Unofficial implementation of Unsupervised Monocular Depth Estimation neural network MonoDepth in PyTorch

Python 87.19% Jupyter Notebook 12.81%
monodepth stereo computer-vision deep-learning pytorch depth-estimation

monodepth-pytorch's Introduction

MonoDepth

demo.gif animation

This repo is inspired by an amazing work of Clément Godard, Oisin Mac Aodha and Gabriel J. Brostow for Unsupervised Monocular Depth Estimation. Original code and paper could be found via the following links:

  1. Original repo
  2. Original paper

MonoDepth-PyTorch

This repository contains code and additional parts for the PyTorch port of the MonoDepth Deep Learning algorithm. For more information about original work, please visit author's website

Purpose

Purpose of this repository is to make a more lightweight model for depth estimation with better accuracy. In our version of MonoDepth, we used ResNet50 as an encoder. It was slightly changed (with one more lateral shrinkage) as well as in the original repo.

Also, we add ResNet18 version and used batch normalization in both cases for training stability. Moreover, we made flexible feature extractor with any version of original Resnet from torchvision models zoo with an option to use pretrained models.

Dataset

KITTI

This algorithm requires stereo-pair images for training and single images for testing. KITTI dataset was used for training. It contains 38237 training samples. Raw dataset (about 175 GB) can be downloaded by running:

wget -i kitti_archives_to_download.txt -P ~/my/output/folder/

kitti_archives_to_download.txt may be found in this repo.

Dataloader

Dataloader assumes the following structure of the folder with train examples ('data_dir' argument contains path to that folder): The folder contains subfolders with following folders "image_02/data" for left images and "image_03/data" for right images. Such structure is default for KITTI dataset

Example data folder structure (path to the "kitti" directory should be passed as 'data_dir' in this example):

data
├── kitti
│   ├── 2011_09_26_drive_0001_sync
│   │   ├── image_02
│   │   │   ├─ data
│   │   │   │   ├── 0000000000.png
│   │   │   │   └── ...
│   │   ├── image_03
│   │   │   ├── data
│   │   │   │   ├── 0000000000.png
│   │   │   │   └── ...
│   ├── ...
├── models
├── output
├── test
│   ├── left
│   │   ├── test_1.jpg
│   │   └── ...

Training

Example of training can be find in Monodepth notebook.

Model class from main_monodepth_pytorch.py should be initialized with following params (as easydict) for training:

  • data_dir: path to the dataset folder
  • val_data_dir: path to the validation dataset folder
  • model_path: path to save the trained model
  • output_directory: where save dispairities for tested images
  • input_height
  • input_width
  • model: model for encoder (resnet18_md or resnet50_md or any torchvision version of Resnet (resnet18, resnet34 etc.)
  • pretrained: if use a torchvision model it's possible to download weights for pretrained model
  • mode: train or test
  • epochs: number of epochs,
  • learning_rate
  • batch_size
  • adjust_lr: apply learning rate decay or not
  • tensor_type:'torch.cuda.FloatTensor' or 'torch.FloatTensor'
  • do_augmentation:do data augmentation or not
  • augment_parameters:lowest and highest values for gamma, lightness and color respectively
  • print_images
  • print_weights
  • input_channels Number of channels in input tensor (3 for RGB images)
  • num_workers Number of workers to use in dataloader

Optionally after initialization, we can load a pretrained model via model.load.

After that calling train() on Model class object starts the training process.

Also, it can be started via calling main_monodepth_pytorch.py through the terminal and feeding parameters as argparse arguments.

Train results and pretrained model

Results presented on the gif image were obtained using the model with a resnet18 as an encoder, which can be downloaded from here.

For training the following parameters were used:

`model`: 'resnet18_md'
`epochs`: 200,
`learning_rate`: 1e-4,
`batch_size`: 8,
`adjust_lr`: True,
`do_augmentation`: True

The provided model was trained on the whole dataset, except subsets, listed below, which were used for a hold-out validation.

2011_09_26_drive_0002_sync  2011_09_29_drive_0071_sync
2011_09_26_drive_0014_sync  2011_09_30_drive_0033_sync
2011_09_26_drive_0020_sync  2011_10_03_drive_0042_sync
2011_09_26_drive_0079_sync

The demo gif image is a visualization of the predictions on 2011_09_26_drive_0014_sync subset.

See Monodepth notebook for the details on the training.

Testing

Example of testing can also be find in Monodepth notebook.

Model class from main_monodepth_pytorch.py should be initialized with following params (as easydict) for testing:

  • data_dir: path to the dataset folder
  • model_path: path to save the trained model
  • pretrained:
  • output_directory: where save dispairities for tested images
  • input_height
  • input_width
  • model: model for encoder (resnet18 or resnet50)
  • mode: train or test
  • input_channels Number of channels in input tensor (3 for RGB images)
  • num_workers Number of workers to use in dataloader

After that calling test() on Model class object starts testing process.

Also it can be started via calling main_monodepth_pytorch.py through the terminal and feeding parameters as argparse arguments.

Requirements

This code was tested with PyTorch 0.4.1, CUDA 9.1 and Ubuntu 16.04. Other required modules:

torchvision
numpy
matplotlib
easydict

monodepth-pytorch's People

Contributors

askorikov avatar nikolasent avatar tanmaniac avatar voeykovroman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

monodepth-pytorch's Issues

Multi-GPU support and multi-threaded data loading

I think it will be good to use nn.DataParallel
self.model = nn.DataParallel(self.model)

Maybe also add multi-threading in dataloader in your utils.py (This can decrease training time by 4 times atleast)

loader = DataLoader(dataset, batch_size= batch_size,
shuffle=True, num_workers= 4)

how to output the predicted right image

Hi,

  1. Is there a way to output the right side image being constructed from the predicted disparity during the self training step ?
  2. Is there a way to 'sharpen' the disparity map being produced ?

Thanks,
OG

Difference between ResNet50_md and ResNet model

Hi, I want to ask the difference between them.
When I calculated the FLOPS of each model, the first one is 2.5 times smaller than the latter. Is there any significant change in your implementation of ResNet50_md comparing to the original one except one lateral shrinkage?

Provided model fails to load

I tried to run the model you uploaded here with parameters

python3 main_monodepth_pytorch.py --mode test --model "resnet18_md" --input_channels 3 --data_dir "dataset/" --model_path trained_kitti_model.pth --output_directory out
But i got the following error

Traceback (most recent call last): File "main_monodepth_pytorch.py", line 321, in <module> main() File "main_monodepth_pytorch.py", line 316, in main model_test = Model(args) File "main_monodepth_pytorch.py", line 145, in __init__ self.model.load_state_dict(torch.load(args.model_path)) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 719, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for Resnet18_md: size mismatch for iconv3.conv_base.weight: copying a param of torch.Size([64, 130, 3, 3]) from checkpoint, where the shape is torch.Size([64, 128, 3, 3]) in current model. size mismatch for iconv2.conv_base.weight: copying a param of torch.Size([32, 98, 3, 3]) from checkpoint, where the shape is torch.Size([32, 96, 3, 3]) in current model.

I removed + 2 in Resnet18_md
self.iconv3 = conv(64+64 + 2, 64, 3, 1)
and
self.iconv2 = conv(64+32 + 2, 32, 3, 1)

Model then loads but on forward pass I get this error

Traceback (most recent call last): File "main_monodepth_pytorch.py", line 321, in <module> main() File "main_monodepth_pytorch.py", line 317, in main model_test.test() File "main_monodepth_pytorch.py", line 298, in test disps = self.model(left) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/stas/gits/MonoDepth-PyTorch/models_resnet.py", line 297, in forward iconv3 = self.iconv3(concat3) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/stas/gits/MonoDepth-PyTorch/models_resnet.py", line 19, in forward x = self.conv_base(F.pad(x, p2d)) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 301, in forward self.padding, self.dilation, self.groups) RuntimeError: Given groups=1, weight of size [64, 128, 3, 3], expected input[2, 130, 130, 66] to have 128 channels, but got 130 channels instead

Could you update the uploaded model, code or maybe tell me what I'm doing wrong? Thanks

Python version 3.6.5, Pytorch version 0.4.1

How do you compute L-R Consistency

    # L-R Consistency
    right_left_disp = [self.generate_image_left(disp_right_est[i],
                       disp_left_est[i]) for i in range(self.n)]
    left_right_disp = [self.generate_image_right(disp_left_est[i],
                       disp_right_est[i]) for i in range(self.n)]


    # L-R Consistency
    lr_left_loss = [torch.mean(torch.abs(right_left_disp[i]
                    - disp_left_est[i])) for i in range(self.n)]
    lr_right_loss = [torch.mean(torch.abs(left_right_disp[i]
                     - disp_right_est[i])) for i in range(self.n)]
    lr_loss = sum(lr_left_loss + lr_right_loss)

Can you explain how did you compute it?

Question about Post-processing Code

Thanks a lot for sharing the code. What is the post-processing code doing:
https://github.com/ClubAI/MonoDepth-PyTorch/blob/master/main_monodepth_pytorch.py#L109-L117

def post_process_disparity(disp):
    (_, h, w) = disp.shape
    l_disp = disp[0, :, :]
    r_disp = np.fliplr(disp[1, :, :])
    m_disp = 0.5 * (l_disp + r_disp)
    (l, _) = np.meshgrid(np.linspace(0, 1, w), np.linspace(0, 1, h))
    l_mask = 1.0 - np.clip(20 * (l - 0.05), 0, 1)
    r_mask = np.fliplr(l_mask)
    return r_mask * l_disp + l_mask * r_disp + (1.0 - l_mask - r_mask) * m_disp

Key already registered with the same priority: GroupSpatialSoftmax

Hello, I trying to test your project that seems very nice, but I can't find how to run a simple test, I tried that:

main_monodepth_pytorch.py --data_dir=test --model_path=models/monodepth_resnet18_001.pth --output_directory=output --input_height=375 --input_width=1242 --model=resnet18 --mode=test --input_channels=3 --num_workers=4
But I get that error: Key already registered with the same priority: GroupSpatialSoftmax
What does that mean ? What can I do ?

Thanks in advance !

Endoscope images

Hi,

Thanks for the code.
I am about to use this repo for training to be able to estimate the depth for the images acquired from a stereo endoscope under the water. As I can see this and most of the mono depth methods are applied for the street and cars. Is there anything that I should do or not to do for training the model when my aim is to get the depth for underwater and small distant objects?

Also I noticed the amount of overlap between stereo images for me is not as much as typical images from the street views. So my problem is a smaller amount of the overlap.

Thanks for reading

Why Sigmoid?

Why is a sigmoid operation used for the output map?

Training data with monocular

Hi @tanmaniac,

Your work is valuable for me! However, I have a question about training customer data. If my data just have monocular data, not divided into right side and left side, can I train the data?

Thanks!

This code is able to reproduce similar results to those in the original paper

Hi, months ago I opened an issue about performance reproducing and I forgot to give answers and feedbacks. Really sorry for that. Now as the previous issue has been closed, I open this one to tell the users that this code is able to reproduce almost the same results as Godard's paper. The parameter setting is suitable. Using the evaluation codes in https://github.com/mrharicot/monodepth/tree/master/utils is able to evaluate the performance on depth metrics. Thanks again for your impressive work!

Best,
Zhenyu

What is the loss value of the trained model?

I want to know the loss value of the provided trained model.
the provided trained model's parameter:
model: ‘resnet18_md’
epochs: 200,
learning_rate: 1e-4,
batch_size: 8,
adjust_lr: True,
do_augmentation: True
Thank you.

Test Data

Thank you for your great work!!
Where can I find the test data used here?

Is it the Cityscapes dataset?
Kitti was only used for training?

RMSE keeps increasing after 25 epochs training while disparity prediction looks fine

I am training your code on the KITTI eigen split with the default hyper parameters in the README. At the beginning, everything was fine and the loss decreased to 1.5, rmse was around 10 meters. However, after 25 epochs training (learning rate is kept as 1e-4), continuous increase of rmse was observed, like 200~800 meters. I checked the predicted disparity map and it looked fine. A further check against the ground truth indicated that the crazy rmse error was due to small disparities in some regions, especially textureless areas.

I thought disparity smoothness loss should prevent jump of disparity prediction, right? Any ideas?

License

Can you please add a license to this repo?

About the KITTI Split training data

Hi, I have a question about the training data.
The original author used the remaining 33 scenes for training. However, you use the total scenes except for the 8 scenes for testing. I am confuse about why do you follow the original training split.

Training becomes slower and slower after epochs and results get deteriorated

Hello, I am strictly using the parameters according to advice from here, but the result is getting worse along training process.
figure_1

Another problem is the training process becomes slower (and the loss gets greater of course):
Epoch: 1 train_loss: 1.8654187655635037 val_loss: 1.3897445498384196 time: 985.607 s Model_saved
Epoch: 2 train_loss: 1.7081988794441578 val_loss: 1.299928411009348 time: 2059.05 s Model_saved
Epoch: 3 train_loss: 1.6670600274022838 val_loss: 1.2872194408765298 time: 3121.77 s Model_saved
Epoch: 4 train_loss: 1.6339830119013636 val_loss: 1.320097159008084 time: 4135.433 s
Epoch: 5 train_loss: 1.6094302785873893 val_loss: 1.505771610942589 time: 5179.27 s

disparity map error

hi, thanks for your impressive work, which helps me a lot. but I met the following problem and I have no idea about it. Can you give me some points to solve it ?

0000000012

thanks in advance!

How can I get deep information?

Very thank you for your good job. I have studied for several days. I think the output of this code is only the disparities and dispariteis with post-processing. Am I ritht? If so, how can I get the deep information of my own input pics please? Look forward to your reply. Thank you very much.

performance

First, thanks for the code. It helps me a lot.
However with your code, I do not reach the same performance than what is reported in Godard's paper. Did you obtain the same performance? Could you report what you obtain in the readme?
best,

Doesnt follow standard kitti directory structure

The kitti directory structure given in the your readme seems to be incorrect.

Directory structure mentioned in your readme:

data
├── kitti
│   ├── 2011_09_26_drive_0001_sync
│   │   ├── image_02
│   │   │   ├─ data
│   │   │   │   ├── 0000000000.png
│   │   │   │   └── ...
│   │   ├── image_03
│   │   │   ├── data
│   │   │   │   ├── 0000000000.png
│   │   │   │   └── ...
│   ├── ...

Standard kitti directory structure as mentioned in mrharicot repo as below:

data
├── kitti
│   ├── 2011_09_30
│   │   ├── 2011_09_26_drive_0001_sync
│   │   │   ├── image_02
│   │   │   │   ├─ data
│   │   │   │   │   ├── 0000000000.png
│   │   │   │   │   └── ...
│   │   │   ├── image_03
│   │   │   │   ├─ data
│   │   │   │   │   ├── 0000000000.png
│   │   │   │   │   └── ...
│   │   ├── ...
│   ├── ...

Its not a big deal, still its good to follow the standard structure.

datasets

With the original kitti dataset, the data cannot be downloaded in the correct format

Unstable training

Results with lr=1e-3

Epoch 1:
epoch001_disp_left
Epoch 2:
epoch003_disp_left
Epoch 5:
epoch005_disp_left

Disparities are degrading as I train more.

test my data

Thanks for the good job, I want to kown that when only have a single image, whether to put this picture in both image_02 and image_03 ?

Question about how to get the original image

Your code is as follows:

x_shifts = disp[:, 0, :, :] # Disparity is passed in NCHW format with 1 channel
flow_field = torch.stack((x_base + x_shifts, y_base), dim=3)
In grid_sample coordinates are assumed to be between -1 and 1

output = F.grid_sample(img, 2*flow_field - 1, mode='bilinear',
padding_mode='zeros')

But why did you use 2*flow_field - 1? In my opinion, I think you should use
output = F.grid_sample(img, flow_field, mode='bilinear', padding_mode='zeros')

Could you explain this? Thank you very much!

Accuracy?

Hi,

Can you report how much increase in accuracy you made after those changes you mentioned?

Thanks

Why horizontal flip

Hi! I have a question about the horizontal flip transformation.

In my understanding, say the stereo system is left-right (i.e., not up-bottom), then horizontal flip would also reverse the left-right order of the images. (See the image below for an example).

While this reversed relation is fine if the output disparity map is allowed to be negative, in your implementation there does exist a sigmoid on the disparity map. So once the input is flipped, the output disparity is somehow counter-intuitive.

Could you tell me if I have any mistakes?

Figure_1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.