GithubHelp home page GithubHelp logo

haofengac / monodepth-fpn-pytorch Goto Github PK

View Code? Open in Web Editor NEW
323.0 9.0 70.0 4.2 MB

Single Image Depth Estimation with Feature Pyramid Network

License: MIT License

Jupyter Notebook 97.54% Python 2.46%
depth-map depth-estimation kitti-dataset nyu-depth feature-pyramid-network depth-prediction monocular-depth pytorch

monodepth-fpn-pytorch's People

Contributors

bhpfelix avatar haofengac avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

monodepth-fpn-pytorch's Issues

Angle Based loss versus cosine inverse loss

Thanks for your great work!

Have you experimented with arccos based normal loss. Does the performance vary when using arccos instead of (1 - normalized inner product)?

The code I am referring to is -

prod = ( grad_fake[:,:,None,:] @ grad_real[:,:,:,None] ).squeeze(-1).squeeze(-1)
fake_norm = torch.sqrt( torch.sum( grad_fake2, dim=-1 ) )
real_norm = torch.sqrt( torch.sum( grad_real
2, dim=-1 ) )

Problem of training phrase

I am training your code on the Stanford 2D-3D-S dataset, but I got the predict depth map between 0 and 1, but the real depth can reach to 128, Do you know why?

constants.py

Hi, I can't find constants.py in the repository, however it does present in the imports, what can I do with that?

Lack of clarity over dataset

I Can read from the kitti dataset code, that there is some kitti_training_images.txt file demanded, but I can't see any such file from any kitti depth and raw dataset downloads, Can you make it more clear as to how we need to procure thekitti dataset in order to directly use the code

Pretrained model

Hi, is a pretrained model available ? It would greatly help with depelopment times. Thank you in advance!

代码

d5, d4, d3, d2 = self.up1(self.agg1(p5)), self.up2(self.agg2(p4)), self.up3(self.agg3(p3)), self.agg4(p2)
,,H,W = d2.size()
vol = torch.cat( [ F.upsample(d, size=(H,W), mode='bilinear') for d in [d5,d4,d3,d2] ], dim=1 )
请问一下,这几行是什么意思?作用是干什么的?

The code is incomplete

Hi @xanderchf,

Thanks for your great work. It would be a great help if you can you release the full code.

fill in the missing values of the sparse depth maps of kitti dataset

Hi,

Thanks for posting your code!

I have a question for the depth maps in kitti dataset. You mentioned that before training you use the NYu tool box to fill in the sparse depth maps of kitti dataset. I am wondering:

  1. Are the sparse depth image from the official website under the section "depth completion/depth prediction" with 14G?

  2. Can you tell more specifically how to use the tool box to fill in the sparse depth map? And how accurate is it to use the filled-in depth map as ground truth label.

Thank you very much!

Endoscope images

Hi,

Thanks for the code.
I am about to use this repo for training to be able to estimate the depth for the images acquired from a stereo endoscope under the water. As I can see this and most of the mono depth methods are applied for the street and cars. Is there anything that I should do or not to do for training the model when my aim is to get the depth for underwater and small distant objects?

Also I noticed the amount of overlap between stereo images for me is not as much as typical images from the street views. So my problem is a smaller amount of the overlap.

Thanks for reading

Lack of instructions

Now I faced a couple of problems: there are no such modules as "constants", "utils" and etc, but its not a major problem. I'm trying to run main_fpn.py with python main_fpn.py --epochs 40 --cuda --bs 4 --num_workers 3 --output_dir output_dir/, but only got NaN's as output from network and as values of losses.

Could you please write more detailed instructions how to run your code?

I can't find Utils.py

Hi @xanderchf
Thanks for your code!
But I can't find the utils.py , It would be helpful if you can release this part of code.

Thanks!

Problem with choosing ground truth form, filled depth or project depth

Hi @xanderchf

Thanks to your post, I tried to fill in missing values in ground truth by using NYU toolbox - fill_depth_colorization in matlab.
and applied colorized map,
g t_1
like this.

But I'm curious about one thing,
when you compute errors between ground truth and prediction,
did you use filled depth or project depth?
In other words, is your network outputs filled depth or project depth?
Thanks

Problem with the NYU dataset Matlab code

Hi I just want to know if you make any modification for the file process_raw.m from https://github.com/janivanecky/Depth-Estimation/tree/master/dataset or other file in the official Toolbox for the NYU dataset. Im getting an error about the get_synched_framed, its seem that it cannot go through the scene subfile (like basement_0001a), its return 0 file in the scene but all the files are there. I did it on Octave in Linux Mint and I used the Raw dataset, Single File (~428 GB) from https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html.

Thanks a lot for your help.

Metrics code

There are a lot of information about evaluation of this network and benchmarks.
But could you explain how do you calculate this metrics (ARD, SRD, metrics with threshold and etc.) and submit your code for them?

How to set grad loss factor?

In your repo, the grad loss factor is set to 10.
The reason is to make depth loss and grad loss in a same range? For example, 0.0x?

How to avoid output saturation?

I'm training your model on the NYU Depth V2 dataset but can't avoid the model's predictions quickly degenerating to outputting a depth value for the entire image. How did you avoid this? The model doesn't appear to learn at all, the loss doesn't decrease, it just oscillates around the same value, and very little gradient is passed to any of the parameters, if any. I've tried different learning rates with the same result.

what is the setting of the param :DOUBLE_BIAS and WEIGHT_DECAY

First of all. thanks for your code. I don't know the setting about hte param : DOUBLE_BIAS and WEIGHT_DECAY . it show me that:

[epoch 0][iter 10] loss: nan RMSElog: nan grad_loss: nan normal_loss: nan
[epoch 0][iter 20] loss: nan RMSElog: nan grad_loss: nan normal_loss: nan
[epoch 0][iter 30] loss: nan RMSElog: nan grad_loss: nan normal_loss: nan
[epoch 0][iter 40] loss: nan RMSElog: nan grad_loss: nan normal_loss: nan
[epoch 0][iter 50] loss: nan RMSElog: nan grad_loss: nan normal_loss: nan
[epoch 0][iter 60] loss: nan RMSElog: nan grad_loss: nan normal_loss: nan
[epoch 0][iter 70] loss: nan RMSElog: nan grad_loss: nan normal_loss: nan

Why when computing RMSE loss, fake and real must be multiplied by 10?

You do transforms.ToTensor() on depth after loading it. I found that PyTorch will actually do depth = depth/255.
I guess, when computing loss, it would be better to multiply 255 in order to get a correct loss. I haven't read your code thoroughly, and don't know what you have done when you preprocessed the data. I feel confused why fake and real must be multiplied by 10 when computing loss.

I'm a beginner in this field. Thanks for your great work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.