sunghoonim / dpsnet Goto Github PK

View Code? Open in Web Editor NEW

236.0 236.0 52.0 188.29 MB

[ICLR19] DPSNet: End-to-end Deep Plane Sweep Stereo

License: MIT License

Python 93.33% Shell 6.67%

dpsnet's People

Contributors

Stargazers

Watchers

dpsnet's Issues

Significance of multiplying and dividing the camera intrinsic params by 4

I am not able to understand the Significance of multiplying and dividing the camera intrinsic params by 4 constant in file PSnet.py file to create variables
intrinsics4 and intrinsics_inv4

Is it related to resizing of the images from the original calibration resolution or something else?
Also please shed some light on how this parameter 4 should be adjust according to different configurations

Some questions about the learning rate in training

I found that during your network training, the learning rate has not been updated, it seems that learning rate has always been 2e-04, because your code when updating the learning rate is: lr = args.lr * (0.1 ** (epoch // 10)) ,however ,epoch range 0 to 10. So I want to ask if this learning rate is set like this. Look forward your reply!! Thanks!!

Why are the reference image / target images name flipped?

Hi, I know this is a trivial question but would just like to clarify the train_loader and dpsnet passed in parameters as I noticed the variable names are different. Why aren't the naming conventions consistent?

Loading in the data:
for i, (tgt_img, ref_imgs, ref_poses, intrinsics, intrinsics_inv, tgt_depth) in enumerate(train_loader):

Doing a forward pass:
depths = dpsnet(tgt_img_var, ref_imgs_var, pose, intrinsics_var, intrinsics_inv_var)

In PSNet Class:
def forward(self, ref, targets, pose, intrinsics, intrinsics_inv):

About retraining this network using my datasets

Hello, I want to use your network to test my own data set, I feel that the result is not as good as your test set, so I decided to collect my own data set for retraining. So I would like to ask how many datasets do I need to prepare? Are 2000 datasets enough? I look forward to your reply!!! Thanks.

CUDA error: out of memory

When I test on a 12GB TITAN X GPU, I got this
Traceback (most recent call last):
File "test.py", line 127, in
main()
File "test.py", line 90, in main
output_depth = dpsnet(tgt_img_var, ref_imgs_var, pose, intrinsics_var, intrinsics_inv_var)
File "/home/weixk15/anaconda3/envs/DU/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/weixk15/work/DPSNet/models/PSNet.py", line 106, in forward
cost0 = self.dres1(cost0) + cost0
File "/home/weixk15/anaconda3/envs/DU/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/weixk15/anaconda3/envs/DU/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/weixk15/anaconda3/envs/DU/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/weixk15/anaconda3/envs/DU/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/weixk15/anaconda3/envs/DU/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/weixk15/anaconda3/envs/DU/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 421, in forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: out of memory

Using images from Multiple Cameras

I am planning to use views from multiple cameras in DPSNet framework and have made necessary changes for same. However I am not sure if I should take 'ref' intrinsics as inverse or the 'tgt' intrinsics as inverse for the forward pass of the model. This confusion is added to because reversal of interchange of ref and tgt keywords.

Clarification for the codes

Hi!

Thank you for sharing the codes and your amazing paper.

I am just a beginner in this area, so I am going through your codes for some insights. However, it seems that I met some problems, mostly in architecture parts.

In your paper, you mention that there will be a 7x7 filter for the the first layer, but I did not manage to find such layer. I wonder if it is my misunderstanding of the code.
Also, you mentioned the four fixed-size average pooling blocks with size 16, 8, 4, 2. But I found the average pooling kernel size in feature_extraction class in submodule.py to be 32, 16, 8, 4. I wonder if this part is the spatial pyramid pooling you mentioned in your paper.

Thank you in advance for solving my questions!!!

question about target images

Hi, thanks to the code.But I just confused that if you only use one pic for target image in training?I print the index of data loader in sequences_folders.py, but got only one index number.

Could not find a format to read the specified file

Line 59 in preparedata_test.py has imageio.imread(img.tobytes()).This gives an error saying that could not find a format to read the specified file in mode 'i'

question about training datasets

Channel axis dimension is not valid.

Hi. I get an error while running test.py on torchversion1.0. I can't run test.py because of scipy.misc.imsave. The input of imsave should be WHC, but in the present code, input value is given to CWH. So, I'm making an error in the imsave function.

in my case, I change imsave code

disp = (255*tensor2array(torch.from_numpy(output_disp_n), max_value=args.nlabel, colormap='bone'))
imsave(output_dir/'{:04d}_disp{}'.format(i,'.png'), disp)

disp = (255*tensor2array(torch.from_numpy(output_disp_n), max_value=args.nlabel, colormap='bone')).astype(np.uint8).transpose((1, 2, 0))
imsave(output_dir/'{:04d}_disp{}'.format(i,'.png'), disp)

thank you

question regarding colmap evaluation

Thank you for sharing the code! I have two questions about your evaluation of colmap. 1) Did you run colormap using the only the several images in the test set (mostly 2 stereo images per scene) or you run colmap on the full sequence? 2) Did you provide colmap with the known poses or you let it to also recover the poses?

Error occured while preparing the downloaded dataset?

I found that error occured when running the python .\preparedata_train.py after downloading the dataset.
The error information is in the following:

Traceback (most recent call last):
  File ".\preparedata_train.py", line 109, in <module>
    preparedata()
  File ".\preparedata_train.py", line 87, in preparedata
    dump_example(scene)
  File ".\preparedata_train.py", line 60, in dump_example
    img = imageio.imread(img.tobytes())
  File "D:\ProgramData\Anaconda3\lib\site-packages\imageio\core\functions.py", line 265, in imread
    reader = read(uri, format, "i", **kwargs)
  File "D:\ProgramData\Anaconda3\lib\site-packages\imageio\core\functions.py", line 182, in get_reader
    "Could not find a format to read the specified file in %s mode" % modename
ValueError: Could not find a format to read the specified file in single-image mode

The error occured when using the imageio.imread api. I checked the inputed variable, which is a flattened image.
So I want to ask if there are any solution to solve this problem? Thanks a lot.

Reproduce the benchmark

I cannot reproduce the benchmark using your code and pretrained model.

dataset	method	Abs_rel	Abs_diff	Sq_rel	rms	Log_rms	a1	a2	a3
mvs	Dpsnet-paper	0.0722	0.2095	0.0798	0.4928	0.1527	0.8930	0.9502	0.9760
mvs	Dpsnet-code	0.0809	0.1901	0.0660	0.3996	0.1531	0.8952	0.9580	0.9851
sun3d	Dpsnet-paper	0.1470	0.3234	0.1071	0.4269	0.1906	0.7892	0.9317	0.9672
sun3d	Dpsnet-code	0.1576	0.3404	0.1268	0.4538	0.1998	0.7917	0.9341	0.9790

This table is what I got by running test.py on sun3d and mvs. Can you help me about this?

Thanks sincerely for your extraordinary work and code.

Data Download

No such file or directory error when downloading dataset from download_traindata.sh script file

Questions regarding the point cloud reconstruction evaluation

Hi,

Thanks for making the code in public.

I've looked at the paper and found out that the method has impressive results on the ETH3D depth estimation, but I was wondering if it also performs well in the actual benchmark that ETH3D proposes. (i.e point cloud reconstruction).

Just to check, do you have some preliminary results on how it performs on the full point-cloud evaluation on ETH3D or tanksandtemples?

Thanks!

Reproduce results on ETH3D

Hi, Thank you for sharing the code! It's great!

I'm wondering if I could reproduce the results on ETH3D in Table 2 in your paper.
I tried the following code as mentioned in README:
python test_ETH3D.py ./dataset/ETH3D_results/ --sequence-length 3 --output-print --pretrained-dps ./pretrained/dpsnet.pth.tar
But I got different results from the ones in the paper.

Depth Results :

abs_rel	abs_diff	sq_rel	rms	log_rms	a1	a2	a3
0.0952	0.5250	0.2231	1.0379	0.1743	0.8786	0.9451	0.9674

Do you have any ideas what I'm missing?

Code for converting depth maps to ply

I would like to know like which code has been used to convert depth maps to PLY files to submit to ETH3D

MVS Dataset not used for training?

Hi! I found that in your download_train.py, you download the mvs_train dataset but according to your datapreparation_train.py, you didn't use MVS dataset for training. I can see mvs_test in your test data preparation. So I'm assuming that you ONLY use MVS dataset for testing. Is that correct?

I tried to retrain your network using your default configuration without putting MVS dataset into training, but the result accuracy on mvs_test is visibly smaller than using your pretrained model (and the paper). So I'm wondering if you put MVS dataset into training to get that pretrained model?

What's the use of order.txt file?

Hi, I am trying to use DPSNet to generate depth map of my dataset, I have used colmap to get the sfm results, but I found the order.txt was needed if I used test_ETH3D.py. What is the meaning of order.txt? How can I generate it?
Thanks

Question about Table 1 and Figure 4

Hi @sunghoonim ,

Thank you for sharing codes.

I'm going to compare DPSNet with other methods on DeMoN's testing dataset, but I wonder how you produce results of COLMAP, DeMoN, and DeepMVS.
I simply ran each software and qualitatively evaluate depth maps, but the result is far from your table.
When I visualized depth maps, I found that there was not much difference.

I want to know the following things:

When using COLMAP, do you set camera intrinsics with provided ones or just use automatic reconstruction?
How do you convert DeMoN's results to absolute metrics? Do you have any code available online?
Could you share results on DeMoN's testing dataset?

Thanks.

About the efficiency

Hi,
The depth map present in the paper is really amazing, I wonder how much time it cost to produce a single depth map?

Thanks,
Miller

What's the version of libraries and programs?

Could you please provide the version of libraries and programs? For example, if I don't use tensorboardX==1.2, the code will report an error. And I don't know what version of scipy should be used. Thanks very much.
UserWarning: From scipy 0.13.0, the output shape of zoom() is calculated with round() instead of int() - for these inputs the size of the returned array has changed.

  UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.

"See the documentation of nn.Upsample for details.".format(mode))

Incompatible problem with scipy

Hi, I want to ask what version of scipy module you used.

When I used scipy 0.19.1, [custom_transforms.py line 64] got warning message.

"From scipy 0.13.0, the output shape of zoom() is calculated "
"with round() instead of int() - for these inputs the size of "
"the returned array has changed."

This message is occured because entering in [ if output_shape != output_shape_old: ].

As a result of this warning, I think, [submodule.py line 120] gives cuda runtime error giving wrong size's inputs to convolution network(self.firstconv).

Below things are [image size after zoom / image size before zoom]
(311, 484) (310, 483)
(370, 360) (370, 359)
(252, 394) (251, 393)
(352, 568) (352, 567)
......
(411, 614) (410, 613)
(319, 431) (319, 430)
(346, 528) (346, 527)
(256, 532) (255, 532)
(306, 578) (305, 577)

I would appreciate it so mush if you can help me about this problem.

ETH3d Dataset

Hi!
Could you provide the link to use the preprocessed ETH3D dataset for testing

Question about ETH3D Dataset Testing

Hi, I just have some problems with your test_eth3d code.First, I don't know why the image resized to 832544.I see in test code, you add some padding on gt_depth map.Second, when I use your code for DeepMVS and DeMoN which also included in test_ETH3D.py, I found most result are inf(I just make all the prediction depth map and gt map in 810540).In my experiments, I found that some value of DeepMVS are inf after mask(output_depth_[mask]/scale, line 92), so I changed it to the max depth, but the result is worse than your paper.Is there any wrong in my experiment?

cuda error: out of memory

I get in trouble when trying to train the suggested RGBD dataset using GTX1080Ti.
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/generic/THCStorage.cu:58.
I have tried the python 3.6.4 with torch 1.0.1, 0.4.0, 0.4.1, none of them works. All of them is ok for testing. The weired thing is that both 0.4.0 and 0.4.1 achieve about 0.5s per image, while 1.0.1 version achieve 4s per image. Anyway, should I try modify some parameters of commmend line to make it work for training on GTX1080Ti?

About the number of images

Hi,

Thank you for sharing the code.

I would like to ask you how to produce Figure 7 in your paper.
I think that test sets in MVS, SUN3D, RGBD, and Scenes11 contain only two views, but your evaluation has done using more than two views.
Which dataset do you use?

Testing on other datasets

Hi, I'm trying to test DPSNet on my own datasets. I've got several images together with their corresponding camera intrinsic and poses, but without depth maps. When I was testing them, error occurs that the .npy file is missing. So my questions are:
(1) Are .npy files the ground truth depth maps for the input images
(2) Are .npy files necessary for testing, or how can I run the testing process without .npy files.

Thanks a lot!

about using this network

hi, sunghoonim, thank you very much for sharing this network.
i have only sequence of pair of stereo images , it is captured by a pair of left/right cameras ,

in only want to estimate the depth of a stereo pair images ( i have no pose.txt info for images sequence ), can I use this network and how to train and test with my dataset. Thanks a lot.

Reason for multiplying 0.3 with the min depth and number of depth labels

Hi, what's the reason for doing args.nlabel*args.mindepth*0.3 instead of just args.nlabel*args.mindepth? I tried and I notice the difference, but am not sure what's the explanation for the depth map which looks better when it is multiplied with 0.3.

Testing code question

Dear author:

Thank you for providing such a great job. I was wondering that you set the maximum depth to 64 in the training code; however, in the test code, you set it to 10.
Is there any reason for your settings?

sunghoonim / dpsnet Goto Github PK

dpsnet's People

Contributors

Stargazers

Watchers

Forkers

dpsnet's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs