GithubHelp home page GithubHelp logo

jiaxiongq / deeplidar Goto Github PK

View Code? Open in Web Editor NEW
244.0 244.0 50.0 756 KB

Deep Surface Normal Guided Depth Prediction for Outdoor Scene from Sparse LiDAR Data and Single Color Image (CVPR 2019)

License: MIT License

Python 100.00%

deeplidar's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deeplidar's Issues

Down- and up-sampling on KITTI dataset

Hi,

Thanks for open-sourcing this great piece of work!

I have tried to run the code using the KITTI depth completion data as described in the instructions and have had some issues with mismatched tensors. Upon closer inspection, it appears that this arises from lines such as the following (from depthCompleNew.py):

normal2 = F.upsample(normal2, (normal2.size()[2] * 2,normal2.size()[3] * 2),mode='bilinear',align_corners=True)

The mismatch seems to occur because the upsample simply doubles the downsampled size rather than matching the input size, causing inconsistent sizes if the input tensor has a dimension with an odd value.

This is an issue for me since the KITTI data I am using sometimes contains images with an odd number of pixels in one of the dimensions. Since this has not seemed to be an issue for you or others using this dataset, I am wondering if I am supposed to preprocess the KITTI dataset or get the dataset you used from somewhere aside the main KITTI AWS server?

Thanks for the help!

about low-cost LiDAR

i have a question, your paper point that accurate depth from a low-cost LiDAR. your mean it is a low beams lidar? like 4-12 beams lidar. how about use 4 beams lidar and single RGB image to produce dense dapth.

How to inference on other datasets

Hello JiaxiongQ!
I begin to train this excellent work recently use KITTI datasets same as you said in README.md. And I want to know how to inference my own datasets which depth_map from velodyne64 is .npy format, and get the final dense_depth_map as .npy format as well.
Or, if I use your pretrained model directly on my own datasets(which has a depth map and a sync pic), what should I change on your code.
Thanks for your great code.

how to ger surface normal?

您好,感谢代码的开源。
想跑您的代码,第一步surface_normal文件夹中应该怎么使用?我是想基于kitti提取。
是直接调用tool.cpp中的calplanenormal吗?
希望得到您的回复,谢谢。

How to visualize

Hello.How did you visualize the depth map to RGB?I can't find this part in your code.Looking forward to your reply.

about synthetic dataset

Thank you for your excellent work.
I have some questions about the detailed information of the synthetic dataset. For example, in "SEQ0" of "Town 11", how are the lidar depth images saved? ori_depth * 256 and save them in uint16 datatype like the kitti did, or some other processing? Meanwhile, how the images of boundary, normal are saved? Can you give us some details about them? What's the difference of depth images between "lidar" and "lidar_m"? So as the "normal" and "normal_m". If possible, we hope a readme file to briefly explain some details. Thank you so much.

DCU

hello:
Thank you for releasing your code.
I wonder , in the ablation study of your paper, can I treat "-normal" as a pure DCU module and get the accuracy in your paper? and the corresponding network code is the submodule?

question about environment

您好:
首先很感谢能开源这样优秀的代码,我是一个刚接触这个方向的小白。遇到了以下的问题,希望你帮忙协助确认和解决,谢谢!
按照您说的要安装Python2.7和PyTorch(0.4.0+),但是PyTorch官网上说PyTorch does not support Python 2.7 on Windows. Please install with Python 3.您是用的linux,或者mac系统吗

Training Part 2 (TrainD) and Surface Normals

Hi, Thank you for providing the code!
My issue is that I want to train for sparse depth in part 2 of the training on another dataset, they have not provided camera calibration parameters. I want to adjust your code so that it will still run without the parameters! So lets assume I do not have the variable "params"

  1. In the total loss for TrainD, what is the purpose of line 124 to 131? Why do you use the conv2D layers on the matrix k1 and k2?

  2. Will the nomal loss work if I make it the same as in Train N file? or are the computations from line 87 to line 109 really necessary to get accurate surface normals?

Thanks!

The consumption of GPU memory seems abnomal

hi, @JiaxiongQ, thks for the wonderful work. When I try to implement the training process on Synshtic dataset, it seem abnaomal about GPU memory consumption.
The parameters of the model have the 143981012, the size of saved ".tar" model is about 500M,
When I train the network on 4 * rtx2080ti ,which has 11GB memory, it can only set the batch size to 4 (mean bs can only set to 1 for each GPU). The memory consumption over 7GB for each GPU
Q1: Is that normal for GPU memory consumption?
Q2: If it is normal, do you have any idea to reduce the memory consumption ? batchsize=1 will take a lot time to train the model

Looking forward to your reply, thks

推理

我使用的原始lidar的点云是这样的
image
使用的图像是rgb
我是用test的 代码 出来的图像非常的奇怪
image
您能给一个使用的例子吗,我来确定一下我没用弄错输入
点云图像也是16位吗,点云我是根据激光和转换矩阵自己生成的 ,这个需要保存成16位数据吗?
torchvision是 0.2.0 pytrch
环境
image

Subsampling Results

Hi @JiaxiongQ

Could you please share the numbers for the subsampling figure? I am trying to reproduce the numbers but they seem to increase rather quickly for me.
Has the network been fine-tuned/re-trained for every level of sparsity or does the network automatically handle the additional sparsity of input?

About the evaluation result on Delta metric

Thanks for the excellent work. In your paper, I found that you train/test your model on NYU_v2 dataset and the delta metric are calculated, for delta(1.25^3), you reach 100%, I can't found the evaluation code in your project.
Here are some questions:

  1. When you calculate the delta(1.25^3), did you make the approximate?e.g. (100% acturally is 99.998%)
  2. If not, can you plz provide the evaluation code?

Looking forward to your reply. Thanks a lot

Why is the output of 'predict_normalE2' set to 2?

Thank you for releasing the code. I have a question about generating dense depth.
In 'depthCompletionNew_block' and 'depthCompletionNew_blockN' functions, the final output is generated by 'predict_normalE2'.
However, depth map is usually a 1-channel map, the output of 'predict_normalE2' is 2 channels.
What's the theory for this?
Thank you very much!

def predict_normalE2(in_planes):
return nn.Conv2d(in_planes, 2, kernel_size=1, stride=1, padding=0, bias=True)

Details about the synthetic lidar data

Thank you for sharing the code and synthetic data. I would like to ask two questions:

  1. Could you please confirm if the released lidar data (e.g., "000020.bin") records the intensity (remission) ? Or simply speaking is the "lidar.bin" file in the format of "xyz coordinate only" or "xyz coordinate + remission" as in KITTI.

  2. what do the suffixes "_m" and "_s" refer to in the 'lidar_m' and 'normal_s' folders?

Looking forward to your reply.

surface normals for synthetic data

Hi! Thanks for your great work.
How did you compute surface normal ground truth for carla data? From the ground truth dense depth map by local plane fitting?

test.py for val_selection_cropped

hello when i try to reproduce the infrence result, i use the KITTI val_selection_cropped data,
but the pred make me confused, when i print the pred.max(), i find all are around 10^5, Shouldn't it be about a few hundred?and the predict picture is as below
image

image

image

final pred depth visualization

你好
我想问下您最后是如何将估计的单通道的深度图显示成您论文中的彩色图片的?
我用其他方法试过,都不太理想,所以想问下您是如何转换为彩色图片的
期待你的回复,谢谢

If train N, train D and train networks are trained separately?

十分感谢您将这份优秀的代码开源,而且这对我正在研究的方向十分有用。由于没有注释,我在读一些地方有些困难。之前通过邮件向您询问多有打扰,如果您有时间的话可以在这帮我解答一下疑惑吗?
Q1.trainN,trainD和train是单独训练的吗?如果只需要最后的深度图,我是否只需要训练train就好?因为我没有发现他们之间有参数共享,可能是因为我没有发现。
Q2.可以请您回答一下,这里训练网络生成法线的意义吗?我没有发现在生成深度图时使用到了生成的法线。
Q3.valid_mask = (target > 0.0).detach() //这里当target大于零时阶段其梯度的意义是什么呢?
Q4.outC, outN, maskC3, maskN3, normals2 = model(inputl, sparse, mask) //这里的mask代表什么呢?是lidar的数据吗?那生成的maskN3,maskC3又代表什么?
如果有时间还请您解答一下,十分感谢!!!

Question about Attention Map and MaskFt()

Hello, JiaxiongQ!
I am reading your code of depthCompletionNew.py, and I found that in RGB way and Depth way, we get 2 dense depth maps and their masks(or Attenion map), why not just use this to calculate F.softmax() for weights fo 2 dense depth maps? But you input this 2 masks to a MaskFt() function to get another 2 masks. Can you tell me sth more about the funtion of MaskFt()?

Thanks a lot!

Prediction result seems abnormal.

Thanks for your wonderful work!
I encountered some problem while visualizing the results on kitti using your pretrained model. I ploted the results ('pred') from test.py(pred, time_temp = test(imgL, sparse, mask)). The visualization results is abnormal. Any advice on this?
图片
I used the image from the training set from 2D detection and get the sparse lidar depth map by projecting the lidar point cloud into image plane.

Are the used camera intrinsics values correct?

I noticed the camera intrinsics f, cx, and cy used in the surface-normals tool and in trainLoader.py differ from the ones provided with KITTI for many but not all values. If I'm not mistaken, the correct camera matrix values for depth images are in theP_rect_02 and P_rect_03 projection matrices in the calib_cam_to_cam.txt files.

The values currently used in code:

INTRINSICS = {
    "2011_09_26": (721.5377, 596.5593, 149.8540),
    "2011_09_28": (707.0493, 604.0814, 162.5066),
    "2011_09_29": (718.3351, 600.3891, 159.5122),
    "2011_09_30": (707.0912, 601.8873, 165.1104),
    "2011_10_03": (718.8560, 607.1928, 161.2157),
}

Values from KITTI calibration files:

INTRINSICS = {
    "2011_09_26": (721.5377, 609.5593, 172.8540),
    "2011_09_28": (707.0493, 604.0814, 180.5066),
    "2011_09_29": (718.3351, 600.3891, 181.5122),
    "2011_09_30": (707.0912, 601.8873, 183.1104),
    "2011_10_03": (718.8560, 607.1928, 185.2157),
}

In particular, all cy values differ by a varying integer amount. Did you crop the images vertically, perhaps?

Question about save of dense depth map

Hello, JiaxiongQ!
When I run test.py, I am a little bit confused about the save of dense depth image, can u tell me what u consider about the save process. ( I always like to use cv2.imsave() to save pngs)
image
as we see, pred is dense depth map,
and * 256, get pred_show to minimize the error of saving?
and buffer from tobytes()&&T.shape is for what reasons and meanings.
can you tell us what happened in the save process.

Thanks for your reply!

Question about dataset and trainloader

Hello JiaxiongQ!
I find it in your code that you only import a trainloader in the beginning of your training epochs. Does it means when epoch iterates, in each epoch and for the same image, we only crop the same region(256x1216size) of image to put in the model for training. Does it make sense to reload the trainloader in the beginning of every training epochs for more randomness?

About suface_normal Prediction

Hello, JiaxiongQ!
when i read your code, i found that after you predict the surface normal vector, you perform some transformation operations on it. the code is:
outputN = torch.zeros_like(normals2)
outputN[:,:,:,0] = -normals2[:,:,:,0]
outputN[:,:,:,1] = -normals2[:,:,:,2]
outputN[:,:,:,2] = -normals2[:,:,:,1]
I'm confused about this part. Can you explain it,Thank you very much.

issue

您好:
首先很感谢能开源这么优秀的代码,但是在复现您的训练过程时我产生了一些困惑,希望您能解答我的困惑,谢谢!
使用trainN.py来训练网络模型的第一部分,主要使用到dataloader模块下的trainLoaderN.py和nomalLoader.py,其中,nomalLoader.py将需要使用的三种数据的路径用三个list保存下来,然后,利用trainLoaderN.py去加载数据。我的困惑有3点:
1.我下载了KITTI数据集,有三个压缩包,在这里应该只使用到了2个,分别是data_depth_velodyne和data_depth_annotated。data_depth_velodyne好像是稀疏的雷达数据,data_depth_annotated比较稠密,是不是就是利用data_depth_annotated中的数据来产生surface normal的呢?
2.您还使用到彩色图片,请问your_filepath/data_depth_velodyne/train/../image_02/data/和相应的image_03/data/下的color image文件是在KITTI哪个tab下载下来的?
3.您在nomalLoader.py中将color image的左右图像的路径好像都是用一个列表对象imagesl存储的,训练时不用分清楚是左图还是右图吗?

The application of ‘get_transform’ functions

processed = preprocess.get_transform(augment=False)

Hello, JiaxiongQ!
Thanks for your excellent code. When I read you code in trainloader.py, I found you apply
processed = preprocess.get_transform(augment=False) before each input(sparse, image, mask, and so). And I found that this function is used for making the input from numpy array format into Tensor format and normalize the whole input into 0~1 by dividing it by 256. But I am confused whether it is a must, will it influence the final output result?

How to prepare the data for `trainN.py`?

Hi, thanks for this amazing repo. @JiaxiongQ

I'm trying to get trainN.py and nomalLoader.py to work in order to train the first NN.
This is what I understood so far that I need in order to train:

  1. Download data_depth_velodyne which is the sparse Lidar dataset.
  2. Download data_depth_annotated which is the ground-truth (Dense) Lidar dataset.
  3. Use the second repo. in order to generate from the ground-truth Dense Lidar dataset the ground-truth normals.
  4. Download ALL the RGB Kitti images from all the categories ( City | Residential | Road | Campus | Person | Calibration ), Is there a link to download all at once instead of downloading one by one?

Question 1: Do I need to extract all the RGB Images into the folders one by one into data_depth_velodyne/train/..*sync/ - I need to add image_02 and image_03 folders to each of the sync folders? (This is implied from your code)

Question 2: Is there a way to download all the RGB Images in one-shot instead of clicking one by one and extracting them one by one to all the folders?

In nomalLoader.py the function dataloader(filepath) returns 3 variables: left_train,normalS_train,normal_gts which are:
a. left_train - the RGB Kitty Image folders 'data_depth_velodyne/train/..*sync/image02 & 03/data.
b. normalS_train - - the Sparse lidar folders 'data_depth_velodyne/train/..*sync/proj_depth/velodyne_raw/image02 & 03/.
c. normal_gts is the folder which has all the normals I generated from dense gt: data_depth_annotated/*_sync/proj_depth/groundtruth/image_02 & image_03 -> gt/out/train/*_sync/image_02 & image_03 or should it be all in gt/out/train/*_sync/? Because in the code there isn't anything about concatinating the image_02 & image_03.

Question 3: please look at c., I asked there about the ground-truth normals.

Question 4: When and where the synthetic data is used? Do we use it also in trainN.py? Do we use it in all the 3 NNs?

Question 5: How many epochs is recommended to train on?
Other than that, thank you. It took me so many hours just to get to the point I understand how to get the data ready (and still trying), I'll definitely add a guide on how to prepare the data to train after this post, so others can save many hours to understand the process.

train process and train data

你好
看了您的代码和相关论文材料,关于训练过程和使用到的数据集我有以下几个疑问,
1、使用合成数据集去训练normal估计网络模块,该部分的参数在之后的训练过程中就保持不变了,是吗?
2、然后使用trainD.py训练后面的网络,这里使用的训练数据集是kitti数据集吗?为什么trainD中使用的模型没有attention 模块呢?
3、再执行train.py进行整体模型的训练,这里也是使用的kitti数据集吗?
4、如果是上面的训练过程,是不是根本不需要计算kitti数据的法线呢?
麻烦你了,期待你的回复
谢谢~祝好!

Broken tar file

I download and tried to untar downloaded tar file, however, get an error. Can you update the pre-trained model download link?

How to train on other datasets

您好,打扰了。有几个问题想咨询您。
1、如果想在您的模型上用一些数据去finetuning,应该怎么训练呢?是直接使用train.py文件就好了吗?
2、如果只使用tarin.py,数据的输入是不包括surface normal吗?我看到train.py数据的输入只有三个,即all_left_img,all_sparse,all_depth = lsn.dataloader(datapath)。
感谢

Attention maps issue

Hello. I am trying to implement your code and faced weird behavior of Attention Maps (predMaskC and predMaskN). I am using your pretrained model. I also tried torchvision 0.2.0 and torch 0.4.0, but it didn't help. Do you have any suggestions?
all

Code of 'calplanenormal' is missing in 'surface_normal/clean.hpp'

您好!
在生成surface normal的示例程序clean.hpp中,最关键的calplanenormal函数被删掉了。demo函数中调用的函数 calplanenormal 是一个没有经过声明和定义的函数。
Mat calplanenormal(Mat &src){ void demo{ Mat src=imread(INPUT_FILE_NAME,CV_LOAD_IMAGE_ANYDEPTH); Mat res=calplanenormal(src); imwrite(OUTPUT_NAME,res); }
Cindy的版本(https://github.com/Cindy-xdZhang/surface-normal)中是定义了这个函数的

RGB Images aren't the same size as raw and gt depth data

Hi, @JiaxiongQ.
Thanks for this amazing repo and paper, I'd love to get your help.
I'm trying to run the test.py file.
My setup:

   gt_fold = '/home/someuser/depth_est/train/2011_09_26_drive_0001_sync/proj_depth/groundtruth/image_02/'
    left_fold = '/home/someuser/depth_est/Kitti_RGB/2011_09_26/2011_09_26_drive_0001_extract/image_02/data/'
    lidar2_raw ='/home/someuser/depth_est/velodyne/train/2011_09_26_drive_0001_sync/proj_depth/velodyne_raw/image_02/'

I've downloaded the RGB Images from here:
http://www.cvlibs.net/datasets/kitti/raw_data.php?type=city
Which are:
2011_09_26_drive_0001 (0.4 GB)
Length: 114 frames (00:11 minutes)
Image resolution: 1392 x 512 pixels

Their resolution isn't compatible with the raw and gt depth data that I downloaded from cvlibs:
http://www.cvlibs.net/datasets/kitti/eval_depth.php?benchmark=depth_completion
(I downloaded the 14GB and 2GB which only contains folders with train and val)

I guess I need to download this one for compatiable RGB Images: " Download manually selected validation and test data sets (5 GB)" ?

Also about the baidu files, I'm not able to download them from:
https://pan.baidu.com/s/1ayNWa7_9Ia2f6_lYzW8paA?errno=0&errmsg=Auth%20Login%20Sucess&&bduss=&ssnerror=0&traceid=#list/path=%2F

It just downloads some baidu .exe file instead of your files. How can I solve it? (I'm currently trying to find a solution for this too over some guides in Reddit)

Warped visualization

Hi, I am using the scene_vis repo (linked here) to visualize the DeepLiDAR depth maps as 3D point-clouds and am encountering some odd visuals. I have attached two images below. The first is a projected point cloud visualization using depth maps generated via IP-Basic (classical method). The second is a point cloud of the same scene using a depth map from DeepLiDAR. As you can see, there is warping around the top and bottom. I have modified the projection matrix according to the cropped image size of DeepLiDAR's inputs and there is still no improvement. Any suggestions?

Screen Shot 2021-07-27 at 3 45 05 PM

Screen Shot 2021-07-27 at 3 44 46 PM

surface normal data trained for trainN.py

你好:
首先很感谢能开源这样优秀的代码,但是有以下一些问题是我在复现你的训练过程中遇到的,希望你帮忙协助确认和解决,谢谢!

Q1:生成surface normal的代码,是从lidar depth和groundtruth depth生成的吧?且生成的surface normal都是3通道的?

Q2:进行第一步训练时,trainN.py的数据输入是rgb image,surface normal(lidar depth生成的)和surface normal(groundtruth depth生成的)。是用这三个数据作为输入吗?

Q3:我按照Q2的3个数据作为输入,发现你的trainN.py调用的trainLoader.py而不是trainLoaderN.py,是有错误吗?

Q4:我按照你的代码,使用trainN.py调用的trainLoader.py,读如的图片都是3通道的,为什么可以在代码中要把它reshape到1通道?是存在笔误吗?(如果我按照trainLoaderN.py,只有对其中一个surface normal作为多通道转单通道的处理,另外一个surface normal却没有,是什么原因呢)

how to train traind.py seperately?

您好!
非常感谢您开源这个SOTA的代码。在训练过程中有几个问题如下:

  1. 在我的任务里,需要尽量轻便的深度补全网络,因此打算只从头训练trainD,py,在dataloader里发现有三个fileppath。分别是filepathl、filepathd、filepathgt,前两个分别是KITTI depth数据集的data_depth_velodyne/train和data_depth_annotated/train文件夹,想知道第三个filepathgt在我的任务里应该是什么呢,或者是否可以做修改使得trainD.py进行独立的训练呢。
  2. 训练完trainD.py后,有不同surface normal,仅在trainD.py上得到的model和结果进一步训练train.py来提高rmse或者其他metrics的可能性吗~?

MAE metric unusually high

Hi, I'm running the pretrained model on the cropped validation data and then evaluating the generated depth maps using the code for the error metrics in test.py. The values for RMSE, iRMSE, and iMAE are within the expected order of magnitude/ range, but the MAE is coming unusually high (significantly higher than RMSE which is theoretically impossible). Any thoughts on why this may be happening?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.