hkust-aerial-robotics / mvdepthnet Goto Github PK

This repository provides PyTorch implementation for 3DV 2018 paper "MVDepthNet: real-time multiview depth estimation neural network"

License: GNU General Public License v3.0

Python 100.00%

mvdepthnet's Introduction

MVDepthNet

A Real-time Multiview Depth Estimation Network

This is an open source implementation for 3DV 2018 submission "MVDepthNet: real-time multiview depth estimation neural network" by Kaixuan Wang and Shaojie Shen. arXiv link. If you find the project useful for your research, please cite:

@InProceedings{mvdepthnet,
    author       = "K. Wang and S. Shen",
    title        = "MVDepthNet: real-time multiview depth estimation neural network",
    booktitle    = "International Conference on 3D Vision (3DV)",
    month        = "Sep.",
    year         = "2018",
  }

Given multiple images and the corresponding camera poses, a cost volume is firstly calculated and then combined with the reference image to generate the depth map. An example is

From left to right is: the left image, the right image, the "ground truth" depth from RGB-D cameras and the estimated depth map.

A video can be used to illustrate the performance of our system:

1.0 Prerequisites

PyTorch

The PyTorch version used in the implementation is 0.3. To use the network in higher versions, only small changes are needed.

OpenCV
NumPy

2.0 Download the model parameters and the samples

UPDATE: the dropbox link has failed because of the large traffic. This is the BaiduPan link: model weight: 链接: https://pan.baidu.com/s/1CjV6iWBbjWOxGetf2ZXStQ 提取码: gbfg and sample data: 链接: https://pan.baidu.com/s/1feYfF6qSd7z7_anmR_rgnQ 提取码: g1fo.

We provide a trained model used in our paper evaluation and some images to run the example code.

Please download the model via the link and the sample images via the link. Put the model opensource_model.pth.tar and extract the sample_data.pkl.tar.gz under the project folder.

3.0 Run the example

Just

python example.py

4.0 Use your own data

To use the network, you need to provide a left image, a right image, camera intrinsic parameters and the relative camera pose. Images are normalized using the mean 81.0 and the std 35.0, for example

normalized_image = (image - 81.0)/35.0.

We here provide the file example2.py to shown how to run the network using your own data. the left_pose and right_pose is the camera pose in the world frame. we show left_image, right_image, and the predicted depth in the final visualization window. A red dot in the left_image is used to test the relative pose accuracy. The red line in the right_image is the epiploar line that it much contains the red dot in the left_image. Otherwise, the pose is not accurate. You can change the position of the tested point in line 56.

To get good results, images should have enough translation and overlap between each other. Rotation dose not help in the depth estimation.

4.1 Use multiple images

Please refer to depthNet_model.py, use the function getVolume to construct multiple volumes and average them. Input the model with the reference image and the averaged cost volume to get the estimated depth maps.

5.0 Acknowledgement

Most of the training data and test data are collected by DeMoN and we thank their work.

mvdepthnet's People

Contributors

Stargazers

Watchers

mvdepthnet's Issues

how to find the camera intrinsic and the relative camera pose

you mentioned that to use own dataset you have get the camera parameter can you suggest any any tutorial regarding this camera parameter estimation and also I have one more doubt what is camera_k in example2.py how to find that camera_k parameter .

Thanks

能再分享一下model吗？链接失效了

About camera external matrix

Hello

How to calculate left_pose , left_pose matrices in example2.py for my own images taken by an android phone ?

I get camera intrinsic parameters Focal Length = 3.46 mm , Sensor Size=4.66*3.51 , Pixel Array Size=4160*3120 , Orintation = 90 degree in android by installing a device info app.

Having these, how we can get the camera pose (external matrix) ? Any other method?

Here are some other camera characteristics that the Camera2 Info app displays

android.lens.facing:  1
android.lens.info.availableApertures:  [1.8]
android.lens.info.availableFilterDensities:  [0.0]
android.lens.info.availableFocalLengths:  [3.46]
android.lens.info.availableOpticalStabilization:  [0]
android.lens.info.focusDistanceCalibration:  2
android.lens.info.hyperfocalDistance:  0.2
android.lens.info.minimumFocusDistance:  14.285714

android.sensor.calibrationTransform1:  ColorSpaceTransform([128/128, 0/128, 0/128], [0/128, 128/128, 0/128], [0/128, 0/128, 128/128])

android.sensor.calibrationTransform2:  ColorSpaceTransform([128/128, 0/128, 0/128], [0/128, 128/128, 0/128], [0/128, 0/128, 128/128])

android.sensor.colorTransform1:  ColorSpaceTransform([1094/1024, -306/1024, -146/1024], [-442/1024, 1388/1024, 52/1024], [-104/1024, 250/1024, 600/1024])

android.sensor.colorTransform2:  ColorSpaceTransform([2263/1024, -1364/1024, -145/1024], [-194/1024, 1257/1024, -56/1024], [-24/1024, 187/1024, 618/1024])

android.sensor.forwardMatrix1:  ColorSpaceTransform([612/1024, 233/1024, 139/1024], [199/1024, 831/1024, -6/1024], [15/1024, -224/1024, 1049/1024])

android.sensor.forwardMatrix2:  ColorSpaceTransform([441/1024, 317/1024, 226/1024], [29/1024, 908/1024, 87/1024], [9/1024, -655/1024, 1486/1024])

android.sensor.info.activeArraySize:  Rect(0, 0 - 4160, 3120)

android.sensor.info.colorFilterArrangement:  0

android.sensor.info.physicalSize:  4.66x3.51

android.sensor.info.pixelArraySize:  4160x3120

android.sensor.info.preCorrectionActiveArraySize:  Rect(0, 0 - 4160, 3120)

android.sensor.info.sensitivityRange:  [50, 3200]

android.sensor.info.whiteLevel:  1023

Thanks

how to use in c++

I want to use this in my c++ program. How can I use this python program?

Problems when running example.py

Hi, thank you for the project!

When I run the example.py, I meet the following decoding problem:

Traceback (most recent call last):   File "example.py", line 16, in <module>     sample_datas = pickle.load(fp) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc7 in position 1: ordinal not in range(128)

Is the sample images link(https://www.dropbox.com/s/hr59f24byc3x8z3/sample_data.pkl.tar.gz?dl=0) degenerated or should I modify the encoding? (I have tried encoding='utf-8' but still does not work).

Thanks!

Unable to Access Model Weights on Baidu

Hi I am having trouble accessing the pretrained model on Baidu. I was wondering if there was an alternative method I could obtain these weights?

Question about warping the image to construct the cost volume.

Hello, thanks a ton for the code.
I use my own data but get strange warped images.
The left and right images are consecutive frames captured by a mobile phone. Camera intrinsic parameters and the relative camera pose are also known. But my warped images are not as “good” as your sample data. It’s only a small part of a image with rotation.
I don’t quite understand the warping part in your GetVolume() function and I am not very familiar with stereo matching. Could you point me on where this problem is?

generate .pkl data

Hello,
I use TUM dataset to run the code and occuer this problem.When i use np.seterr(divide='ignore', invalid='ignore') to debug, The result as follows.

Do you know how to solve?

Yours sincerely,
Huboni

Problems using my own data

Hi, thank you for the great repository! :)

I could successfully test it with your provided test labels. However I am having issues using my own data.

Here is an example of my input data:

First image (left):

Second image (right):

First image extrinsics and intrinsics:

"96369.798432.jpg": {

	"extrinsics": [[0.7245516,0.3634065,-0.5856284,1.018013],[0.05039374,0.8194891,0.5708749,0.02848812],[0.6873757,-0.4431403,0.5754488,-0.4639755],[0,0,0,0.9999999]],

	"imageName": "96369.798432.jpg",

	"intrinsics": [[1619.542,0,959.5],[0,1619.542,539.5],[0,0,1]],

	"timeStamp": 96369.798432

}

Second image intrinsics and extrinsics:

"96373.464939.jpg": {

	"extrinsics": [[0.4488778,0.404342,-0.7968791,1.488057],[0.02514287,0.8857014,0.4635739,0.05461761],[0.8932394,-0.2281239,0.3874055,-0.3899191],[0,0,0,1]],

	"imageName": "96373.464939.jpg",

	"intrinsics": [[1619.076,0,959.5],[0,1619.076,539.5],[0,0,1]],

	"timeStamp": 96373.464938666

}

But the result is:

This is just an example result; I am getting the same wrong results on all of my images.

Any ideas how I could solve it? Thanks! :)

I want to use it with vins-mono

Problem with running example.py

Hey,

Thank you so much for making the code public and accessible. However, when I try to implement it an error pops up "_pickle.UnpicklingError: unpickling stack underflow". I am attaching a screenshot. Can you figure out and help me with this.

Thanks in advance !
Rajagopal Sugumar

Attch:
https://user-images.githubusercontent.com/38287816/56172448-836b1e00-5fb7-11e9-8f3c-65c1915c96dc.PNG

Did I missed something to get a good depth?

Thanks for your great work.
I refer the example2.py to use my own data. However, the result seems not good.
The code and images are provided below. Please tell me if I missed anything.

left_image = cv2.imread("./left.jpeg")
right_image = cv2.imread("./right.jpeg")

camera_k_left = np.asarray([
                            [1.7141879128232438e+003, 0.,                      1.2686456493940061e+003],
                            [0,                       1.7141879128232438e+003, 9.9575285430241513e+002],
                            [0,                       0,                       1]])

camera_k_right = np.asarray([ 
                            [1.7141879128232438e+003, 0.,                      1.2666075491361062e+003],
                            [0,                       1.7141879128232438e+003, 9.8047895362229440e+002],
                            [0,                       0,                       1]])

left2right = np.asarray([
                        [9.9969708004761548e-001, -1.7112957892382444e-002,  1.7688833100150528e-002, -8.3976622746264312e+001],
                        [1.6926228781311496e-002, 9.9979999147940424e-001,   1.0652690600304717e-002, 6.4193373297895686e+000],
                        [-1.7867594228494717e-002, -1.0350058451847681e-002, 9.9978678995400272e-001, -2.9538222186700258e+000],
                        [0,                       0,                         0,                        1]])

## process images
# scale to 320x256
original_width = left_image.shape[1]
original_height = left_image.shape[0]
factor_x = 320.0 / original_width
factor_y = 256.0 / original_height

left_image = cv2.resize(left_image, (320, 256)) # (256, 320)
right_image = cv2.resize(right_image, (320, 256)) # (256, 256)
camera_k_left[0, :] *= factor_x
camera_k_left[1, :] *= factor_y
camera_k_right[0, :] *= factor_x
camera_k_right[1, :] *= factor_y

# convert to pytorch format
torch_left_image = np.moveaxis(left_image, -1, 0) # (3, 256, 320)
torch_left_image = np.expand_dims(torch_left_image, 0) # (1, 3, 256, 320)
torch_left_image = (torch_left_image - 81.0)/ 35.0 # whiten
torch_right_image = np.moveaxis(right_image, -1, 0)
torch_right_image = np.expand_dims(torch_right_image, 0)
torch_right_image = (torch_right_image - 81.0) / 35.0

left_image_cuda = Tensor(torch_left_image).cuda()
left_image_cuda = Variable(left_image_cuda, volatile=True)

right_image_cuda = Tensor(torch_right_image).cuda()
right_image_cuda = Variable(right_image_cuda, volatile=True)

## process camera params
# for warp the image to construct the cost volume
pixel_coordinate = np.indices([320, 256]).astype(np.float32)
pixel_coordinate = np.concatenate((pixel_coordinate, np.ones([1, 320, 256])), axis=0)
pixel_coordinate = np.reshape(pixel_coordinate, [3, -1]) # [0,:] in [0,319]; [1,:] in [0,255]; [2,:]==1;

left_in_right_T = left2right[0:3, 3]  # translation vector
left_in_right_R = left2right[0:3, 0:3]  # rotation matrix
KL = camera_k_left
KR = camera_k_right
KL_inverse = inv(KL)
KRK_i = KR.dot(left_in_right_R.dot(KL_inverse))
KRKiUV = KRK_i.dot(pixel_coordinate)
KT = KR.dot(left_in_right_T)
KT = np.expand_dims(KT, -1)
KT = np.expand_dims(KT, 0)
KT = KT.astype(np.float32)
KRKiUV = KRKiUV.astype(np.float32)
KRKiUV = np.expand_dims(KRKiUV, 0)
KRKiUV_cuda_T = Tensor(KRKiUV).cuda()
KT_cuda_T = Tensor(KT).cuda()

# model
depthnet = depthNet()
model_data = torch.load('opensource_model.pth.tar')
depthnet.load_state_dict(model_data['state_dict'])
depthnet = depthnet.cuda()
cudnn.benchmark = True
depthnet.eval()

predict_depths = depthnet(left_image_cuda, right_image_cuda, KRKiUV_cuda_T, KT_cuda_T)

it can use with vins-mono?

about the .pkl data

hello,
Thank you for your great work first.I run the example.py successfuly, and the results are great.Now i want to run my own data,but i can't creat the data of pkl.What should i do?Do you have the source code about creating the type of data?

Best wishes!
Boni Hu

run example.py problem

When I run example.py,there is a UnpicklingError.

File "G:/GroupMeet/Depth_problem/MVDepthNet-master/example.py", line 19, in
sample_datas = pickle.load(fp)

UnpicklingError: invalid load key, '�'.

How can I solve it?
The data is error ?

Can you help me?

what dataset and parameters do you use for training?

I used TUM rgbd dataset for training. After about 100 epoch, this is an image pair example in training dataset.

It is not that good as you mentioned.
So, what dataset do you use for training? TUM (including dynamic objects), Nyu v2 or some else? And the training parameters?
Thanks for your great job. It helps a lot.

Inaccurate prediction depth

Thank you for your excellent work. Your work is very helpful to me, but I have some problems in depth prediction when using your work. When using example2.py to generate the predicted inverse depth map, although it can effectively distinguish the objects and the background in the scene, there is an estimation error when obtaining the absolute depth of an object. For example, the ground truth is 500mm, but it returns 260mm, while the background is 850mm, but the returned value is 350 mm. Excuse me, what is the problem?

Dataset appears corrupted

Hello, when I try to download the dataset and the pretrained models, the files appear to be corrupted, any idea what is wrong? Could you make them available somewhere else? Also, I would really like to train this model if possible!
Thanks.

Problem when converting depth map to point cloud

Hi, I am trying to convert the predicted depth map to point cloud but I meet some problems. I use the image pair in example2.py for test. As MVDepthNet predicts the metric inverse depth (0.02 ~ 2 m^-1), so I saved the inverse depth map normalized to 0~255 as a png image as follows. White means large inverse depth, which means the pixel is close to the camera.

I get the depth from the saved png depth map as follows

double pixel_depth = (1 / (double(pixel) / 255 * (2 - 0.02) + 0.02));

But when I check the generated point cloud, It looks very bad, like this:

It seems like that the depth is predicted very bad, while the depth map image looks good.
Am I missing something to do the conversion correctly?

Best regards

conv2d error

Hi, there was one error when I ran "python example.py", it is
Traceback (most recent call last):
File "example.py", line 62, in
predict_depths = depthnet(left_image_cuda, right_image_cuda, KRKiUV_cuda_T, KT_cuda_T)
File "/home/emg/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/emg/code_yj/stereo_code/MVDepthNet-master/depthNet_model.py", line 173, in forward
conv1 = self.conv1(x)
File "/home/emg/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/emg/miniconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/emg/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/emg/miniconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 343, in forward
return self.conv2d_forward(input, self.weight)
File "/home/emg/miniconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 340, in conv2d_forward
self.padding, self.dilation, self.groups)
TypeError: conv2d(): argument 'padding' must be tuple of ints, but found element of type float at pos 1

Do you know how to solve this problem?

Training Code

Dear，
Thank you for your great contribution,firstly. I use my own data by using example2.py, but the result is a bit poor. I think if i can train this data, the result maybe will be better. Can you give me the train code. Many thanks.

Yours sincerely,
HuBoni

Problem in generating point clouds from predicted Depth

Thanks a lot for open-sourcing your project!

The most impressive thing, that catches most people is that the paper claims to be better than DeMoN. The heat map looks very good. However, when I try reconstructing the 3D point cloud from the depth, the result looks uneven, wavy and also it doesn't preserve the shape of the objects.

Shown below are the point clouds obtained from MVDepthNet and Demon.

MVDepth Pointcloud:

DeMoN pointcloud:

As shown above, DeMoN point cloud preserves the object structure.

Is it the problem with the model, or there is something done wrongly by me. I can provide the corresponding images and pose if required for you to verify.

Also, the pretrained model provided along with the repo is the one trained with or without Geometric Augmentation.

Any suggestions/ideas are highly appreciated! :)

scene id list in SceneNN dataset

Thank you for making your project available. It's really helpful. I just want to compare my model with your MVDepthNet but scene id list in SceneNN used in your paper is not mentioned. Could you tell me which scene of SceneNN you used for training and testing?

Training script

Hi, Thanks for the great paper and code.
Do you plan on releasing a training script including the geometric augmentation?

Thanks

train data about pkl

Dear,
Thank for your help, and I want to train TUM dataset use the train code you offered. But i don't know the construct about the pkl data. How to create the pkl file from origin dataset. Do you have this code?
Yours sincerely,
HuBoni

question about datasets

Hi, I'm working on my network.And i use rgbd and mvs datasets for training.But I find that mvs isn't work very well, the test accuracy is bad.But the training loss is very well and accuracy for rgbd test datasets is similliar with you(and I set depth plane as 16, because i see in your paper that 16 depth planes already has good results).Have you ever met that situation?

Question about camera pose and camera intrinsic

Hello, you have a great project!

I have a few question on using it:

The camera motion provided to the code should be left_image_pose(reference image) - right_image_pose(measurement image) or the other direction?
Does the camera intrinsic changes with image resolution ? Cause I have my camera calibrated with resolution of 640x480, if i want to use your network and don't want to retrain one, i'll have to use 320x256. Do i need to re-calibrate my camera under that resolution?

Thank you very much!

How to use it?

MVDepthNet is supported single camera? left image and right image means two camera?

opensource_model.pth.tar may be not available.

Hi Kai-Xuan,

The trained model file opensource_model.pth.tar seems not accessible on the dropbox, while the sample dataset is available to download.

Sida

Problem about the selection strategy for the dataset

Hi, thanks for your great work!
Recently I am working on my project and I want to use the same dataset as your work's. In your paper you mentioned the selection strategy for the dataset. Specifically, "at least 70% of the pixels in the reference image should be visible in the measurement image, and the average photo-consistency error between image pairs should be less than 80". I tried to implement the selection strategy to filter the DeMoN dataset but by now I have not got the same number of training/testing samples as you mentioned in your supplementary material. Could you elaborate the selection strategy a little? What kind of photo-consistency metrics did you use for selecting the training/testing samples? Did you compute the p-c error between rgb or gray image pairs? When computing the p-c error will you ignore the invisible pixels(pixels in the reference image which are not visible in the measurement image)? I'd appreciate it if you can reply. Thank you!

TUM validate dataset

Dear,
How to get the TUM validate dataset from original TUM dataset? What is the proportion?

Thank you

About the license for this model

Thank you for sharing your great code. 😄

What is the license for this model? I'd like to cite it to the repository I'm working on if possible, but I want to post the license correctly.

Thank you.

more image pose pairs

Dear WANG-KX,
Thanks for you great project, I have used two image pose pairs to predict depth, the result is not very good, now I want to know how to use more than two image pose pairs to predict depth.
Thanks for you reply.

Sincerely,
HuBoni

About the depth of MVS test set

Hi,
Sorry for bothering again.
In other issues and your paper you mentioned that the depth values of the DeMoN dataset are in metric scale. I checked the depth of the dataset and most of the data are normal. However, when I checked the test set of MVS dataset, I found something wrong with the depth values. To be specific, in the MVS test set, there are some samples of a chair:

The mean depth value of corresponding depth map is about 7.4.
However, when it comes to samples of a building:

The mean depth value of corresponding depth map is only around 2.2.
This is really strange and I wonder if you noticed this problem in your experiment?
I am looking forward to your reply.

Best regards!

how could I train my model or using other image size?

Good job!
But my image size is (372,240),how could I use the network to fitting it?
Or how could I train the network on my dataset?
Looking forward to your reply, thanks.

hkust-aerial-robotics / mvdepthnet Goto Github PK

mvdepthnet's Introduction

MVDepthNet

A Real-time Multiview Depth Estimation Network

1.0 Prerequisites

2.0 Download the model parameters and the samples

3.0 Run the example

4.0 Use your own data

4.1 Use multiple images

5.0 Acknowledgement

mvdepthnet's People

Contributors

Stargazers

Watchers

Forkers

mvdepthnet's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs