sunghoonim / dpsnet Goto Github PK
View Code? Open in Web Editor NEW[ICLR19] DPSNet: End-to-end Deep Plane Sweep Stereo
License: MIT License
[ICLR19] DPSNet: End-to-end Deep Plane Sweep Stereo
License: MIT License
I am not able to understand the Significance of multiplying and dividing the camera intrinsic params by 4 constant in file PSnet.py file to create variables
intrinsics4
and intrinsics_inv4
Is it related to resizing of the images from the original calibration resolution or something else?
Also please shed some light on how this parameter 4 should be adjust according to different configurations
I found that during your network training, the learning rate has not been updated, it seems that learning rate has always been 2e-04, because your code when updating the learning rate is: lr = args.lr * (0.1 ** (epoch // 10))
,however ,epoch range 0 to 10. So I want to ask if this learning rate is set like this. Look forward your reply!! Thanks!!
Hi, I know this is a trivial question but would just like to clarify the train_loader and dpsnet passed in parameters as I noticed the variable names are different. Why aren't the naming conventions consistent?
Loading in the data:
for i, (tgt_img, ref_imgs, ref_poses, intrinsics, intrinsics_inv, tgt_depth) in enumerate(train_loader):
Doing a forward pass:
depths = dpsnet(tgt_img_var, ref_imgs_var, pose, intrinsics_var, intrinsics_inv_var)
In PSNet Class:
def forward(self, ref, targets, pose, intrinsics, intrinsics_inv):
Hello, I want to use your network to test my own data set, I feel that the result is not as good as your test set, so I decided to collect my own data set for retraining. So I would like to ask how many datasets do I need to prepare? Are 2000 datasets enough? I look forward to your reply!!! Thanks.
When I test on a 12GB TITAN X GPU, I got this
Traceback (most recent call last):
File "test.py", line 127, in
main()
File "test.py", line 90, in main
output_depth = dpsnet(tgt_img_var, ref_imgs_var, pose, intrinsics_var, intrinsics_inv_var)
File "/home/weixk15/anaconda3/envs/DU/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/weixk15/work/DPSNet/models/PSNet.py", line 106, in forward
cost0 = self.dres1(cost0) + cost0
File "/home/weixk15/anaconda3/envs/DU/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/weixk15/anaconda3/envs/DU/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/weixk15/anaconda3/envs/DU/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/weixk15/anaconda3/envs/DU/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/weixk15/anaconda3/envs/DU/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/weixk15/anaconda3/envs/DU/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 421, in forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: out of memory
I am planning to use views from multiple cameras in DPSNet framework and have made necessary changes for same. However I am not sure if I should take 'ref' intrinsics as inverse or the 'tgt' intrinsics as inverse for the forward pass of the model. This confusion is added to because reversal of interchange of ref and tgt keywords.
Hi!
Thank you for sharing the codes and your amazing paper.
I am just a beginner in this area, so I am going through your codes for some insights. However, it seems that I met some problems, mostly in architecture parts.
In your paper, you mention that there will be a 7x7 filter for the the first layer, but I did not manage to find such layer. I wonder if it is my misunderstanding of the code.
Also, you mentioned the four fixed-size average pooling blocks with size 16, 8, 4, 2. But I found the average pooling kernel size in feature_extraction class in submodule.py to be 32, 16, 8, 4. I wonder if this part is the spatial pyramid pooling you mentioned in your paper.
Thank you in advance for solving my questions!!!
Hi, thanks to the code.But I just confused that if you only use one pic for target image in training?I print the index of data loader in sequences_folders.py, but got only one index number.
Line 59 in preparedata_test.py has imageio.imread(img.tobytes()).This gives an error saying that could not find a format to read the specified file in mode 'i'
Hi. I get an error while running test.py on torchversion1.0. I can't run test.py because of scipy.misc.imsave. The input of imsave should be WHC, but in the present code, input value is given to CWH. So, I'm making an error in the imsave function.
in my case, I change imsave code
disp = (255*tensor2array(torch.from_numpy(output_disp_n), max_value=args.nlabel, colormap='bone'))
imsave(output_dir/'{:04d}_disp{}'.format(i,'.png'), disp)
to
disp = (255*tensor2array(torch.from_numpy(output_disp_n), max_value=args.nlabel, colormap='bone')).astype(np.uint8).transpose((1, 2, 0))
imsave(output_dir/'{:04d}_disp{}'.format(i,'.png'), disp)
thank you
Thank you for sharing the code! I have two questions about your evaluation of colmap. 1) Did you run colormap using the only the several images in the test set (mostly 2 stereo images per scene) or you run colmap on the full sequence? 2) Did you provide colmap with the known poses or you let it to also recover the poses?
I found that error occured when running the python .\preparedata_train.py
after downloading the dataset.
The error information is in the following:
Traceback (most recent call last):
File ".\preparedata_train.py", line 109, in <module>
preparedata()
File ".\preparedata_train.py", line 87, in preparedata
dump_example(scene)
File ".\preparedata_train.py", line 60, in dump_example
img = imageio.imread(img.tobytes())
File "D:\ProgramData\Anaconda3\lib\site-packages\imageio\core\functions.py", line 265, in imread
reader = read(uri, format, "i", **kwargs)
File "D:\ProgramData\Anaconda3\lib\site-packages\imageio\core\functions.py", line 182, in get_reader
"Could not find a format to read the specified file in %s mode" % modename
ValueError: Could not find a format to read the specified file in single-image mode
The error occured when using the imageio.imread
api. I checked the inputed variable, which is a flattened image.
So I want to ask if there are any solution to solve this problem? Thanks a lot.
I cannot reproduce the benchmark using your code and pretrained model.
dataset | method | Abs_rel | Abs_diff | Sq_rel | rms | Log_rms | a1 | a2 | a3 |
---|---|---|---|---|---|---|---|---|---|
mvs | Dpsnet-paper | 0.0722 | 0.2095 | 0.0798 | 0.4928 | 0.1527 | 0.8930 | 0.9502 | 0.9760 |
mvs | Dpsnet-code | 0.0809 | 0.1901 | 0.0660 | 0.3996 | 0.1531 | 0.8952 | 0.9580 | 0.9851 |
sun3d | Dpsnet-paper | 0.1470 | 0.3234 | 0.1071 | 0.4269 | 0.1906 | 0.7892 | 0.9317 | 0.9672 |
sun3d | Dpsnet-code | 0.1576 | 0.3404 | 0.1268 | 0.4538 | 0.1998 | 0.7917 | 0.9341 | 0.9790 |
This table is what I got by running test.py on sun3d and mvs. Can you help me about this?
Thanks sincerely for your extraordinary work and code.
No such file or directory error when downloading dataset from download_traindata.sh script file
Hi,
Thanks for making the code in public.
I've looked at the paper and found out that the method has impressive results on the ETH3D depth estimation, but I was wondering if it also performs well in the actual benchmark that ETH3D proposes. (i.e point cloud reconstruction).
Just to check, do you have some preliminary results on how it performs on the full point-cloud evaluation on ETH3D or tanksandtemples?
Thanks!
Hi, Thank you for sharing the code! It's great!
I'm wondering if I could reproduce the results on ETH3D in Table 2 in your paper.
I tried the following code as mentioned in README:
python test_ETH3D.py ./dataset/ETH3D_results/ --sequence-length 3 --output-print --pretrained-dps ./pretrained/dpsnet.pth.tar
But I got different results from the ones in the paper.
Depth Results :
abs_rel | abs_diff | sq_rel | rms | log_rms | a1 | a2 | a3 |
---|---|---|---|---|---|---|---|
0.0952 | 0.5250 | 0.2231 | 1.0379 | 0.1743 | 0.8786 | 0.9451 | 0.9674 |
Do you have any ideas what I'm missing?
I would like to know like which code has been used to convert depth maps to PLY files to submit to ETH3D
Hi! I found that in your download_train.py, you download the mvs_train dataset but according to your datapreparation_train.py, you didn't use MVS dataset for training. I can see mvs_test in your test data preparation. So I'm assuming that you ONLY use MVS dataset for testing. Is that correct?
I tried to retrain your network using your default configuration without putting MVS dataset into training, but the result accuracy on mvs_test is visibly smaller than using your pretrained model (and the paper). So I'm wondering if you put MVS dataset into training to get that pretrained model?
Hi, I am trying to use DPSNet to generate depth map of my dataset, I have used colmap to get the sfm results, but I found the order.txt was needed if I used test_ETH3D.py. What is the meaning of order.txt? How can I generate it?
Thanks
Hi @sunghoonim ,
Thank you for sharing codes.
I'm going to compare DPSNet with other methods on DeMoN's testing dataset, but I wonder how you produce results of COLMAP, DeMoN, and DeepMVS.
I simply ran each software and qualitatively evaluate depth maps, but the result is far from your table.
When I visualized depth maps, I found that there was not much difference.
I want to know the following things:
Thanks.
Hi,
The depth map present in the paper is really amazing, I wonder how much time it cost to produce a single depth map?
Thanks,
Miller
Could you please provide the version of libraries and programs? For example, if I don't use tensorboardX==1.2, the code will report an error. And I don't know what version of scipy should be used. Thanks very much.
UserWarning: From scipy 0.13.0, the output shape of zoom() is calculated with round() instead of int() - for these inputs the size of the returned array has changed.
UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
Hi, I want to ask what version of scipy module you used.
When I used scipy 0.19.1, [custom_transforms.py line 64] got warning message.
"From scipy 0.13.0, the output shape of zoom() is calculated "
"with round() instead of int() - for these inputs the size of "
"the returned array has changed."
This message is occured because entering in [ if output_shape != output_shape_old: ].
As a result of this warning, I think, [submodule.py line 120] gives cuda runtime error giving wrong size's inputs to convolution network(self.firstconv).
Below things are [image size after zoom / image size before zoom]
(311, 484) (310, 483)
(370, 360) (370, 359)
(252, 394) (251, 393)
(352, 568) (352, 567)
......
(411, 614) (410, 613)
(319, 431) (319, 430)
(346, 528) (346, 527)
(256, 532) (255, 532)
(306, 578) (305, 577)
I would appreciate it so mush if you can help me about this problem.
Hi!
Could you provide the link to use the preprocessed ETH3D dataset for testing
Hi, I just have some problems with your test_eth3d code.First, I don't know why the image resized to 832544.I see in test code, you add some padding on gt_depth map.Second, when I use your code for DeepMVS and DeMoN which also included in test_ETH3D.py, I found most result are inf(I just make all the prediction depth map and gt map in 810540).In my experiments, I found that some value of DeepMVS are inf after mask(output_depth_[mask]/scale, line 92), so I changed it to the max depth, but the result is worse than your paper.Is there any wrong in my experiment?
I get in trouble when trying to train the suggested RGBD dataset using GTX1080Ti.
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/generic/THCStorage.cu:58.
I have tried the python 3.6.4 with torch 1.0.1, 0.4.0, 0.4.1, none of them works. All of them is ok for testing. The weired thing is that both 0.4.0 and 0.4.1 achieve about 0.5s per image, while 1.0.1 version achieve 4s per image. Anyway, should I try modify some parameters of commmend line to make it work for training on GTX1080Ti?
Hi,
Thank you for sharing the code.
I would like to ask you how to produce Figure 7 in your paper.
I think that test sets in MVS, SUN3D, RGBD, and Scenes11 contain only two views, but your evaluation has done using more than two views.
Which dataset do you use?
Hi, I'm trying to test DPSNet on my own datasets. I've got several images together with their corresponding camera intrinsic and poses, but without depth maps. When I was testing them, error occurs that the .npy file is missing. So my questions are:
(1) Are .npy files the ground truth depth maps for the input images
(2) Are .npy files necessary for testing, or how can I run the testing process without .npy files.
Thanks a lot!
hi, sunghoonim, thank you very much for sharing this network.
i have only sequence of pair of stereo images , it is captured by a pair of left/right cameras ,
in only want to estimate the depth of a stereo pair images ( i have no pose.txt info for images sequence ), can I use this network and how to train and test with my dataset. Thanks a lot.
Hi, what's the reason for doing args.nlabel*args.mindepth*0.3
instead of just args.nlabel*args.mindepth
? I tried and I notice the difference, but am not sure what's the explanation for the depth map which looks better when it is multiplied with 0.3.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.