GithubHelp home page GithubHelp logo

hkust-aerial-robotics / mvdepthnet Goto Github PK

View Code? Open in Web Editor NEW
309.0 18.0 72.0 372 KB

This repository provides PyTorch implementation for 3DV 2018 paper "MVDepthNet: real-time multiview depth estimation neural network"

License: GNU General Public License v3.0

Python 100.00%

mvdepthnet's Introduction

MVDepthNet

A Real-time Multiview Depth Estimation Network

This is an open source implementation for 3DV 2018 submission "MVDepthNet: real-time multiview depth estimation neural network" by Kaixuan Wang and Shaojie Shen. arXiv link. If you find the project useful for your research, please cite:

@InProceedings{mvdepthnet,
    author       = "K. Wang and S. Shen",
    title        = "MVDepthNet: real-time multiview depth estimation neural network",
    booktitle    = "International Conference on 3D Vision (3DV)",
    month        = "Sep.",
    year         = "2018",
  }

Given multiple images and the corresponding camera poses, a cost volume is firstly calculated and then combined with the reference image to generate the depth map. An example is

MVDepthNet example

From left to right is: the left image, the right image, the "ground truth" depth from RGB-D cameras and the estimated depth map.

A video can be used to illustrate the performance of our system:

video

1.0 Prerequisites

  • PyTorch

The PyTorch version used in the implementation is 0.3. To use the network in higher versions, only small changes are needed.

  • OpenCV

  • NumPy

2.0 Download the model parameters and the samples

UPDATE: the dropbox link has failed because of the large traffic. This is the BaiduPan link: model weight: 链接: https://pan.baidu.com/s/1CjV6iWBbjWOxGetf2ZXStQ 提取码: gbfg and sample data: 链接: https://pan.baidu.com/s/1feYfF6qSd7z7_anmR_rgnQ 提取码: g1fo.

We provide a trained model used in our paper evaluation and some images to run the example code.

Please download the model via the link and the sample images via the link. Put the model opensource_model.pth.tar and extract the sample_data.pkl.tar.gz under the project folder.

3.0 Run the example

Just

python example.py

4.0 Use your own data

To use the network, you need to provide a left image, a right image, camera intrinsic parameters and the relative camera pose. Images are normalized using the mean 81.0 and the std 35.0, for example

normalized_image = (image - 81.0)/35.0.

We here provide the file example2.py to shown how to run the network using your own data. the left_pose and right_pose is the camera pose in the world frame. we show left_image, right_image, and the predicted depth in the final visualization window. A red dot in the left_image is used to test the relative pose accuracy. The red line in the right_image is the epiploar line that it much contains the red dot in the left_image. Otherwise, the pose is not accurate. You can change the position of the tested point in line 56.

To get good results, images should have enough translation and overlap between each other. Rotation dose not help in the depth estimation.

4.1 Use multiple images

Please refer to depthNet_model.py, use the function getVolume to construct multiple volumes and average them. Input the model with the reference image and the averaged cost volume to get the estimated depth maps.

5.0 Acknowledgement

Most of the training data and test data are collected by DeMoN and we thank their work.

mvdepthnet's People

Contributors

wang-kx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mvdepthnet's Issues

how to find the camera intrinsic and the relative camera pose

you mentioned that to use own dataset you have get the camera parameter can you suggest any any tutorial regarding this camera parameter estimation and also I have one more doubt what is camera_k in example2.py how to find that camera_k parameter .

Thanks

About camera external matrix

Hello

How to calculate left_pose , left_pose matrices in example2.py for my own images taken by an android phone ?

I get camera intrinsic parameters Focal Length = 3.46 mm , Sensor Size=4.66*3.51 , Pixel Array Size=4160*3120 , Orintation = 90 degree in android by installing a device info app.

Having these, how we can get the camera pose (external matrix) ? Any other method?

Here are some other camera characteristics that the Camera2 Info app displays

android.lens.facing:  1
android.lens.info.availableApertures:  [1.8]
android.lens.info.availableFilterDensities:  [0.0]
android.lens.info.availableFocalLengths:  [3.46]
android.lens.info.availableOpticalStabilization:  [0]
android.lens.info.focusDistanceCalibration:  2
android.lens.info.hyperfocalDistance:  0.2
android.lens.info.minimumFocusDistance:  14.285714

android.sensor.calibrationTransform1:  ColorSpaceTransform([128/128, 0/128, 0/128], [0/128, 128/128, 0/128], [0/128, 0/128, 128/128])

android.sensor.calibrationTransform2:  ColorSpaceTransform([128/128, 0/128, 0/128], [0/128, 128/128, 0/128], [0/128, 0/128, 128/128])

android.sensor.colorTransform1:  ColorSpaceTransform([1094/1024, -306/1024, -146/1024], [-442/1024, 1388/1024, 52/1024], [-104/1024, 250/1024, 600/1024])

android.sensor.colorTransform2:  ColorSpaceTransform([2263/1024, -1364/1024, -145/1024], [-194/1024, 1257/1024, -56/1024], [-24/1024, 187/1024, 618/1024])

android.sensor.forwardMatrix1:  ColorSpaceTransform([612/1024, 233/1024, 139/1024], [199/1024, 831/1024, -6/1024], [15/1024, -224/1024, 1049/1024])

android.sensor.forwardMatrix2:  ColorSpaceTransform([441/1024, 317/1024, 226/1024], [29/1024, 908/1024, 87/1024], [9/1024, -655/1024, 1486/1024])

android.sensor.info.activeArraySize:  Rect(0, 0 - 4160, 3120)

android.sensor.info.colorFilterArrangement:  0

android.sensor.info.physicalSize:  4.66x3.51

android.sensor.info.pixelArraySize:  4160x3120

android.sensor.info.preCorrectionActiveArraySize:  Rect(0, 0 - 4160, 3120)

android.sensor.info.sensitivityRange:  [50, 3200]

android.sensor.info.whiteLevel:  1023

Thanks

how to use in c++

I want to use this in my c++ program. How can I use this python program?

Problems when running example.py

Hi, thank you for the project!

When I run the example.py, I meet the following decoding problem:

Traceback (most recent call last):   File "example.py", line 16, in <module>     sample_datas = pickle.load(fp) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc7 in position 1: ordinal not in range(128)

Is the sample images link(https://www.dropbox.com/s/hr59f24byc3x8z3/sample_data.pkl.tar.gz?dl=0) degenerated or should I modify the encoding? (I have tried encoding='utf-8' but still does not work).

Thanks!

Question about warping the image to construct the cost volume.

Hello, thanks a ton for the code.
I use my own data but get strange warped images.
The left and right images are consecutive frames captured by a mobile phone. Camera intrinsic parameters and the relative camera pose are also known. But my warped images are not as “good” as your sample data. It’s only a small part of a image with rotation.
I don’t quite understand the warping part in your GetVolume() function and I am not very familiar with stereo matching. Could you point me on where this problem is?

generate .pkl data

problemrantime
Hello,
I use TUM dataset to run the code and occuer this problem.When i use np.seterr(divide='ignore', invalid='ignore') to debug, The result as follows.
result123
Do you know how to solve?

Yours sincerely,
Huboni

Problems using my own data

Hi, thank you for the great repository! :)

I could successfully test it with your provided test labels. However I am having issues using my own data.

Here is an example of my input data:

First image (left):
96369 798432

Second image (right):
96373 464939

First image extrinsics and intrinsics:

"96369.798432.jpg": {

	"extrinsics": [[0.7245516,0.3634065,-0.5856284,1.018013],[0.05039374,0.8194891,0.5708749,0.02848812],[0.6873757,-0.4431403,0.5754488,-0.4639755],[0,0,0,0.9999999]],

	"imageName": "96369.798432.jpg",

	"intrinsics": [[1619.542,0,959.5],[0,1619.542,539.5],[0,0,1]],

	"timeStamp": 96369.798432

}

Second image intrinsics and extrinsics:

"96373.464939.jpg": {

	"extrinsics": [[0.4488778,0.404342,-0.7968791,1.488057],[0.02514287,0.8857014,0.4635739,0.05461761],[0.8932394,-0.2281239,0.3874055,-0.3899191],[0,0,0,1]],

	"imageName": "96373.464939.jpg",

	"intrinsics": [[1619.076,0,959.5],[0,1619.076,539.5],[0,0,1]],

	"timeStamp": 96373.464938666

}

But the result is:

96370 131751-96373 464939

This is just an example result; I am getting the same wrong results on all of my images.

Any ideas how I could solve it? Thanks! :)

Did I missed something to get a good depth?

Thanks for your great work.
I refer the example2.py to use my own data. However, the result seems not good.
The code and images are provided below. Please tell me if I missed anything.

left_image = cv2.imread("./left.jpeg")
right_image = cv2.imread("./right.jpeg")

camera_k_left = np.asarray([
                            [1.7141879128232438e+003, 0.,                      1.2686456493940061e+003],
                            [0,                       1.7141879128232438e+003, 9.9575285430241513e+002],
                            [0,                       0,                       1]])

camera_k_right = np.asarray([ 
                            [1.7141879128232438e+003, 0.,                      1.2666075491361062e+003],
                            [0,                       1.7141879128232438e+003, 9.8047895362229440e+002],
                            [0,                       0,                       1]])

left2right = np.asarray([
                        [9.9969708004761548e-001, -1.7112957892382444e-002,  1.7688833100150528e-002, -8.3976622746264312e+001],
                        [1.6926228781311496e-002, 9.9979999147940424e-001,   1.0652690600304717e-002, 6.4193373297895686e+000],
                        [-1.7867594228494717e-002, -1.0350058451847681e-002, 9.9978678995400272e-001, -2.9538222186700258e+000],
                        [0,                       0,                         0,                        1]])

## process images
# scale to 320x256
original_width = left_image.shape[1]
original_height = left_image.shape[0]
factor_x = 320.0 / original_width
factor_y = 256.0 / original_height

left_image = cv2.resize(left_image, (320, 256)) # (256, 320)
right_image = cv2.resize(right_image, (320, 256)) # (256, 256)
camera_k_left[0, :] *= factor_x
camera_k_left[1, :] *= factor_y
camera_k_right[0, :] *= factor_x
camera_k_right[1, :] *= factor_y

# convert to pytorch format
torch_left_image = np.moveaxis(left_image, -1, 0) # (3, 256, 320)
torch_left_image = np.expand_dims(torch_left_image, 0) # (1, 3, 256, 320)
torch_left_image = (torch_left_image - 81.0)/ 35.0 # whiten
torch_right_image = np.moveaxis(right_image, -1, 0)
torch_right_image = np.expand_dims(torch_right_image, 0)
torch_right_image = (torch_right_image - 81.0) / 35.0

left_image_cuda = Tensor(torch_left_image).cuda()
left_image_cuda = Variable(left_image_cuda, volatile=True)

right_image_cuda = Tensor(torch_right_image).cuda()
right_image_cuda = Variable(right_image_cuda, volatile=True)

## process camera params
# for warp the image to construct the cost volume
pixel_coordinate = np.indices([320, 256]).astype(np.float32)
pixel_coordinate = np.concatenate((pixel_coordinate, np.ones([1, 320, 256])), axis=0)
pixel_coordinate = np.reshape(pixel_coordinate, [3, -1]) # [0,:] in [0,319]; [1,:] in [0,255]; [2,:]==1;

left_in_right_T = left2right[0:3, 3]  # translation vector
left_in_right_R = left2right[0:3, 0:3]  # rotation matrix
KL = camera_k_left
KR = camera_k_right
KL_inverse = inv(KL)
KRK_i = KR.dot(left_in_right_R.dot(KL_inverse))
KRKiUV = KRK_i.dot(pixel_coordinate)
KT = KR.dot(left_in_right_T)
KT = np.expand_dims(KT, -1)
KT = np.expand_dims(KT, 0)
KT = KT.astype(np.float32)
KRKiUV = KRKiUV.astype(np.float32)
KRKiUV = np.expand_dims(KRKiUV, 0)
KRKiUV_cuda_T = Tensor(KRKiUV).cuda()
KT_cuda_T = Tensor(KT).cuda()

# model
depthnet = depthNet()
model_data = torch.load('opensource_model.pth.tar')
depthnet.load_state_dict(model_data['state_dict'])
depthnet = depthnet.cuda()
cudnn.benchmark = True
depthnet.eval()

predict_depths = depthnet(left_image_cuda, right_image_cuda, KRKiUV_cuda_T, KT_cuda_T)

left
right

epipolar
depth

about the .pkl data

hello,
Thank you for your great work first.I run the example.py successfuly, and the results are great.Now i want to run my own data,but i can't creat the data of pkl.What should i do?Do you have the source code about creating the type of data?

Best wishes!
Boni Hu

run example.py problem

When I run example.py,there is a UnpicklingError.

File "G:/GroupMeet/Depth_problem/MVDepthNet-master/example.py", line 19, in
sample_datas = pickle.load(fp)

UnpicklingError: invalid load key, '�'.

How can I solve it?
The data is error ?

Can you help me?

what dataset and parameters do you use for training?

I used TUM rgbd dataset for training. After about 100 epoch, this is an image pair example in training dataset.
pred_result
It is not that good as you mentioned.
So, what dataset do you use for training? TUM (including dynamic objects), Nyu v2 or some else? And the training parameters?
Thanks for your great job. It helps a lot.

Inaccurate prediction depth

Thank you for your excellent work. Your work is very helpful to me, but I have some problems in depth prediction when using your work. When using example2.py to generate the predicted inverse depth map, although it can effectively distinguish the objects and the background in the scene, there is an estimation error when obtaining the absolute depth of an object. For example, the ground truth is 500mm, but it returns 260mm, while the background is 850mm, but the returned value is 350 mm. Excuse me, what is the problem?

Dataset appears corrupted

Hello, when I try to download the dataset and the pretrained models, the files appear to be corrupted, any idea what is wrong? Could you make them available somewhere else? Also, I would really like to train this model if possible!
Thanks.

Problem when converting depth map to point cloud

Hi, I am trying to convert the predicted depth map to point cloud but I meet some problems. I use the image pair in example2.py for test. As MVDepthNet predicts the metric inverse depth (0.02 ~ 2 m^-1), so I saved the inverse depth map normalized to 0~255 as a png image as follows. White means large inverse depth, which means the pixel is close to the camera.

image

I get the depth from the saved png depth map as follows

double pixel_depth = (1 / (double(pixel) / 255 * (2 - 0.02) + 0.02));

But when I check the generated point cloud, It looks very bad, like this:

image

image

It seems like that the depth is predicted very bad, while the depth map image looks good.
Am I missing something to do the conversion correctly?

Best regards

conv2d error

Hi, there was one error when I ran "python example.py", it is
Traceback (most recent call last):
File "example.py", line 62, in
predict_depths = depthnet(left_image_cuda, right_image_cuda, KRKiUV_cuda_T, KT_cuda_T)
File "/home/emg/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/emg/code_yj/stereo_code/MVDepthNet-master/depthNet_model.py", line 173, in forward
conv1 = self.conv1(x)
File "/home/emg/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/emg/miniconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/emg/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/emg/miniconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 343, in forward
return self.conv2d_forward(input, self.weight)
File "/home/emg/miniconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 340, in conv2d_forward
self.padding, self.dilation, self.groups)
TypeError: conv2d(): argument 'padding' must be tuple of ints, but found element of type float at pos 1

Do you know how to solve this problem?

Training Code

Dear,
Thank you for your great contribution,firstly. I use my own data by using example2.py, but the result is a bit poor. I think if i can train this data, the result maybe will be better. Can you give me the train code. Many thanks.

Yours sincerely,
HuBoni

Problem in generating point clouds from predicted Depth

Thanks a lot for open-sourcing your project!

The most impressive thing, that catches most people is that the paper claims to be better than DeMoN. The heat map looks very good. However, when I try reconstructing the 3D point cloud from the depth, the result looks uneven, wavy and also it doesn't preserve the shape of the objects.

Shown below are the point clouds obtained from MVDepthNet and Demon.

MVDepth Pointcloud:

screenshot from 2018-09-27 19-08-47
screenshot from 2018-09-27 19-57-56

DeMoN pointcloud:

screenshot from 2018-09-27 20-00-03

As shown above, DeMoN point cloud preserves the object structure.

Is it the problem with the model, or there is something done wrongly by me. I can provide the corresponding images and pose if required for you to verify.

Also, the pretrained model provided along with the repo is the one trained with or without Geometric Augmentation.

Any suggestions/ideas are highly appreciated! :)

scene id list in SceneNN dataset

Thank you for making your project available. It's really helpful. I just want to compare my model with your MVDepthNet but scene id list in SceneNN used in your paper is not mentioned. Could you tell me which scene of SceneNN you used for training and testing?

Training script

Hi, Thanks for the great paper and code.
Do you plan on releasing a training script including the geometric augmentation?

Thanks

train data about pkl

Dear,
Thank for your help, and I want to train TUM dataset use the train code you offered. But i don't know the construct about the pkl data. How to create the pkl file from origin dataset. Do you have this code?
Yours sincerely,
HuBoni

question about datasets

Hi, I'm working on my network.And i use rgbd and mvs datasets for training.But I find that mvs isn't work very well, the test accuracy is bad.But the training loss is very well and accuracy for rgbd test datasets is similliar with you(and I set depth plane as 16, because i see in your paper that 16 depth planes already has good results).Have you ever met that situation?

Question about camera pose and camera intrinsic

Hello, you have a great project!

I have a few question on using it:

  1. The camera motion provided to the code should be left_image_pose(reference image) - right_image_pose(measurement image) or the other direction?
  2. Does the camera intrinsic changes with image resolution ? Cause I have my camera calibrated with resolution of 640x480, if i want to use your network and don't want to retrain one, i'll have to use 320x256. Do i need to re-calibrate my camera under that resolution?

Thank you very much!

How to use it?

MVDepthNet is supported single camera? left image and right image means two camera?

Problem about the selection strategy for the dataset

Hi, thanks for your great work!
Recently I am working on my project and I want to use the same dataset as your work's. In your paper you mentioned the selection strategy for the dataset. Specifically, "at least 70% of the pixels in the reference image should be visible in the measurement image, and the average photo-consistency error between image pairs should be less than 80". I tried to implement the selection strategy to filter the DeMoN dataset but by now I have not got the same number of training/testing samples as you mentioned in your supplementary material. Could you elaborate the selection strategy a little? What kind of photo-consistency metrics did you use for selecting the training/testing samples? Did you compute the p-c error between rgb or gray image pairs? When computing the p-c error will you ignore the invisible pixels(pixels in the reference image which are not visible in the measurement image)? I'd appreciate it if you can reply. Thank you!

TUM validate dataset

Dear,
How to get the TUM validate dataset from original TUM dataset? What is the proportion?

Thank you

About the license for this model

Thank you for sharing your great code. 😄

What is the license for this model? I'd like to cite it to the repository I'm working on if possible, but I want to post the license correctly.

Thank you.

more image pose pairs

Dear WANG-KX,
Thanks for you great project, I have used two image pose pairs to predict depth, the result is not very good, now I want to know how to use more than two image pose pairs to predict depth.
Thanks for you reply.

Sincerely,
HuBoni

About the depth of MVS test set

Hi,
Sorry for bothering again.
In other issues and your paper you mentioned that the depth values of the DeMoN dataset are in metric scale. I checked the depth of the dataset and most of the data are normal. However, when I checked the test set of MVS dataset, I found something wrong with the depth values. To be specific, in the MVS test set, there are some samples of a chair:
image
The mean depth value of corresponding depth map is about 7.4.
However, when it comes to samples of a building:
image
The mean depth value of corresponding depth map is only around 2.2.
This is really strange and I wonder if you noticed this problem in your experiment?
I am looking forward to your reply.

Best regards!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.