GithubHelp home page GithubHelp logo

fangchangma / sparse-to-dense.pytorch Goto Github PK

View Code? Open in Web Editor NEW
442.0 16.0 99.0 63 KB

ICRA 2018 "Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image" (PyTorch Implementation)

Python 100.00%
depth-prediction depth-estimation depth-image slam pytorch sparse-depth-samples deep-learning depth-completion

sparse-to-dense.pytorch's People

Contributors

abdo-eldesokey avatar akariasai avatar timethy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sparse-to-dense.pytorch's Issues

Getting benchmark results

Hi,

Thank you for publicly releasing this code. I was wondering what parameters to use to arrive at the same/similar results as those in your paper / KITTI depth completion benchmark.

Should I be using the default parameters?

Best regards,
Shreyas

Data generation script

Dear Ma Fang Cheng,
Thanks for your great work!

Can you please share the HDF5 file generation script? Thanks.

Best.

Kitti Validation Split

Hi,

in your paper you talk about "[...] a random subset of 3200 images from the test sequences [...]" for validation on the Kitti Dataset.

A fresh checkout of this repo will validate the network against the full 40k validation images though, which results in very different results as reported in your paper (i.e. RMSE of 4.7m vs 6.2m).

Could you elaborate on which 3200 images you took specifically?

Question about input image resize

Hi, I have a question about image resize in train_transform of nyu_dataloader. It is at line 60 of nyu_dataloader.py. I wonder how can resize operation speed up rotation?

No rgb image normalization during pre-process

Are rgb images normalized before input into the model for training? I don't find it and it seem that rgb images are only divided by 255 and transformed to tensor. If not, why depth estimation task doesn't need the normalization process, subtract the mean and then divide by the standard deviation, which is a routine process for other CV tasks such as semantic segmentation and object detection?

Is there an easy way to run inference on a different dataset

Currently the supported method to do inference is via the --evaluate option. However, in using this, the data loaders are written to process HDF5 formats. I want to run inference on the KITTI depth completion benchmark dataset. I have the sparse depth, ground truth depth and rgb images. However, doing so will require modifying the kitti_dataloader. Is there an easy straightforward way of doing this? Thank you

The principle of implementing a simple Visual Odometry (VO) algorithm

I would like to inquire about the specific implementation principles or methods for the following statement in "D. Application: Dense Map from Visual Odometry Features": "we implement a simple visual odometry (VO) algorithm with data from one of the test scenes in the NYU-Depth-v2 dataset."Thank you~

Saving the upproj model fails

Hi

I have a problem when training using the upproj layer. What I do is

python main.py -a resnet50 -d upproj -m rgb

At the beginning, everything goes well, and the model can also be transferred to GPU. But it always reports error when trying to save the torch model (after validation of epoch 0, I can see the REL/RMSE of the validation)

Traceback (most recent call last):
File "main.py", line 393, in
main()
File "main.py", line 236, in main
}, is_best, epoch)
File "main.py", line 377, in save_checkpoint
torch.save(state, checkpoint_filename)
File "/cm/shared/apps/pytorch/0.1.12/lib64/python2.7/site-packages/torch/serialization.py", line 120, in save
return _save(obj, f, pickle_module, pickle_protocol)
File "/cm/shared/apps/pytorch/0.1.12/lib64/python2.7/site-packages/torch/serialization.py", line 186, in _save
pickler.dump(obj)
cPickle.PicklingError: Can't pickle <class 'models.UpProjModule'>: attribute lookup models.UpProjModule failed

But if I use deconv3, everything works well.

Could you help me with this issue? Thanks a lot.

Share the trained model ?

Hi, thanks for the outstanding works. I wonder if you can share the trained model based on NYU2 because the results I ran out in Pytorch are different from the paper. More specifically, I defined the following parameters: samples=200, modality=rgbd, resnet50, decoder=upproj, criterion=l1, lr=0.01, then I got the following results: RMSE=0.244, Delta1=0.967, REL=0.049, which are different from the paper. I would appreciate it if you could reply me. Thank you.

Scaling factor cancels out for depth values

I just found your code as I was searching for NYU Depth Data loaders. First of all, thanks a lot for making this public.

I think your scaling is not applied correctly.
As you divide by s, you reverse this by applying the transformation for the image and the depth values. Therefore you have the same depth values, but a scaled image. To correct this, you have to divide the depth values by the squared scaling when you still apply the same transformation.

Not sure whether this has an impact at all, but the camera world geometry is not correct as a result and maybe that has also an impact on the training itself.

Benchmark on KITTI vs NYU Depth v2

RMS on NYU Depth v2 dataset using Ours-200 gets 0.230
RMS on KITTI dataset using Ours-200 gets 3.851

Why RMS difference is so high on two different datasets (0.230 and 3.851)?

Question training data resolution!

The training data is resized and cropped to 228304. On inference stage, i think the input image must be resized to the same size as training.
In my SLAM application, the resolution of input image is 480
640. In order to infer, input images must be downsampled to size of 228304.
Can we train the model using resolution of 480
640? Thus avoid downsampling and upsampling.

CUDA sync?

Thank you so much for the great work!

This might be a stupid question but I am getting the following error 'runtimeerror: attempting ti deserialize object on a CUDA device but torch.cuda.is_availble() is false.' This is my first time dealing with Pytorch so I guess my question is how do you run the code or make it recognize CUDA (that I downloaded) to get rid of that error? Am I making sense?

Many thanks

Different sparse input when each sample input is loaded

Hi,
I have read your implementation and I have a question about your implementation of sparse depth input generation in NYU dataset. You generated the sparse input when the ground truth depth is loaded by dense_to_sparse function. But this sparse input maybe not the same at the next epoch when the ground truth depth is loaded again.
Do I understand correctly? If I misunderstand something, please explain it for me!

Thanks,

GPU Usage

Dear Ma Fang Cheng,
Thanks for your work, it's great!

I noticed that the GPU usage is far from 100% on my machine and I was curious as to how difficult you think it is to pipeline the reading and pre-processing of the training data with running the GPU.

I.e. while the GPU is running on a batch, the CPU can already begin reading in the next batch.
Since I have a rather slow CPU I think this could help quite a bit.

What do you think?

Best,
Tim

An issue with "resume" mode

Hi,
When I call --resume mode, I get an error from line 109 in main.py.
I checked an found that when you load the checkpoint, you overwrite args in line 104, which will remove the path to the checkpoint that is needed for line 109.
I suggest that you save the path in an intermediate variable.
Thank you.

Using another model

Can I use another model other than resnet18,50? I modified the create new model part on main.py, but I can't use the model I want. Can you tell me how?

`

create new model

else:
    train_loader, val_loader = create_data_loaders(args)
    print("=> creating Model ({}-{}) ...".format(args.arch, args.decoder))
    in_channels = len(args.modality)
    if args.arch == 'resnet50':
        model = ResNet(layers=50, decoder=args.decoder, output_size=train_loader.dataset.output_size,
            in_channels=in_channels, pretrained=args.pretrained)
    elif args.arch == 'resnet18':
        model = ResNet(layers=18, decoder=args.decoder, output_size=train_loader.dataset.output_size,
            in_channels=in_channels, pretrained=args.pretrained)
    elif args.arch == 'resnet101':
        model = ResNet(layers=101, decoder=args.decoder, output_size=train_loader.dataset.output_size,
            in_channels=in_channels, pretrained=args.pretrained)
    elif args.arch == 'resnet152':
        model = ResNet(layers=152, decoder=args.decoder, output_size=train_loader.dataset.output_size,
            in_channels=in_channels, pretrained=args.pretrained)
    print("=> model created.")
    optimizer = torch.optim.SGD(model.parameters(), args.lr, \
        momentum=args.momentum, weight_decay=args.weight_decay)

    # model = torch.nn.DataParallel(model).cuda() # for multi-gpu training
    model = model.cuda()`

Failed to reproduce the RGB based problem, whereas the RGBd problem works fine for me.

Here is my reproduced result for RGBd:

1.uniform sampling number: 20 , backbone: resnet50, RMSE=0.351
2.uniform sampling number: 50 , backbone: resnet50, RMSE=0.281
3.uniform sampling number: 200 , backbone: resnet50, RMSE=0.236

As for RGB based problem:
the reported RMSE is 0.514 according to TABLE II, but only 0.807 for me, the result folder name of my experiments is:
nyudepthv2.sparsifier=uar.samples=0.modality=rgb.arch=resnet50.decoder=deconv3.criterion=l1.lr=0.01.bs=16.pretrained=True, the change I have made is to change the "modality" value to "rgb".

As far as I know:
1: the data in http://datasets.lids.mit.edu/sparse-to-dense/data/nyudepthv2.tar.gz was splitted to two folders, namely train (47584 samples) and val(654 samples);
2: There are some descriptions in paper:
“We use the official split of data, where 249 scenes are used for training and the remaining 215 for testing. In particular, for the sake of benchmarking, the small labeled test dataset with 654 images is used for evaluating the final performance”
"For training, we sample spatially evenly from each raw video sequence from the training dataset"

There are three questions I want to confirm:

A: As for 215 testing scenes, since there is only the train & val folder, I'm confused where the testing data?
B: The train data, about 47584 samples, I'm wondering if it comes from the aforementioned "official split of data" of the train, then apply sample strategy to reduce the number?
There are indeed “407,024 new unlabeled frames” mentioned in https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html, I wonder is this "official split of data" is a split of this “407,024 new unlabeled frames”, could you specified the instruction of "official split of data"?
C: Does the experiment settings of the RGB based question differ from the RGBd based question? such as backbone or training parameters, or did you try using more data for training in RGB based questions, I mean by using higher sampling ratio from "official split of data" of the train so we can get more data used to train models. Could you give me some advice to narrow the gap of 0.514 VS 0.807 measured by RMSE.

Thanks for your time and looking forward to your reply!

Exploding REL value

During testing I noticed that the REL value is sometimes really large (e.g. 149966823318618112) which seems like we are dividing through almost zero.

After some debugging I found that this is indeed true,
in metrics.py:42
:
self.absrel = (abs_diff / target).mean()
the minimal value of target is sometimes in the magnitude of e-10.

I believe this comes from resizing with interpolation of the depthmaps, resulting in sometimes really small but non-zero values.

I suggest that we can do
valid_mask >= EPS
for some EPS around 10e-9 or something like that.

Or of course, "fix" the interpolation in the data transform phase somehow.

replace the method of "misc.imresize(img, self.size, self.interpolation, 'F')"

the function of "scipy.misc" is be used at dataloaders/transforms.py.
now, the function of "scipy.misc" has been discarded, i look for the function of "skimage.transform.resize" to replace it.
but when i replace it, the code get am error: 'float' object is not iterable.
Is there any way to fix this, or is there any other functions that can replace the "scipy.misc" function.
thanks.

[NYU] Different Scaling in Training and Validation

When training on NYU both depth and rgb images are scaled for computational efficiency before applying rotation.

During training the images are scaled to:

transforms.Resize(250.0 / iheight)

and during validation:

transforms.Resize(240.0 / iheight)

I think that both scale-factors must be the same. Different scales implicitly change the distance of objects and therefore make it hard for the network to derive possibly learned "depth from scale".

pose information for processed data

Hi, there is no pose information in the processed NYU data. Could you share the processed NYU training data with pose information? Many thanks!

Question about decoder

Hi,
You said only deconv operation is implemented in decoder, but in model.py there are implementations of upproj and upconv. Is there anything wrong with upproj and upconv? Or you simply mean they are not recommended

about the preprocessed NYU Depth V2

Hi, thanks for sharing the great work. I have a question. Is the preprocessed NYU Depth V2 augmented following your paper? And by the way, how many images in the dataset? Thanks.

A issue with the arguments

Hi,
I just noticed that there is a conflict between the two arguments "Data" and "Decoder" as both of them are treated by default as -d.
This causes and error message when running main.py
Thank you.

Question about pretrained model

Hi, thanks for sharing the code. These days I'm trying your code but I found that Lg10=nan(nan) and I didn't make it to train the net. I guess it was I didn't use any pretrained model. I wonder is the pretrained model on ImageNet necessary when using this version?

How is the loss calculated for KITTI dataset ?

I would like to know how is the loss calculated for KITTI dataset since depth information comes from LiDAR and is sparse(18k/208k in the paper).
Does it only calculate loss for those prediction that have a corresponding groundtruth?
If it does, how to make sure that those prediction that don't have their corresponding groundtruth are within a range or not

How can I use this Git from Windows OS

I saw lambda function doesn't work at Windows OS.
So I change that function to 0.

But, work_id is a matter this time.
How can I running this Git at Windows OS?

Problem with uniform sampling

Hi. I found a discrepancy between the implementation of uniform sampling in dense_to_sparse.py between the method discussed in the paper. Basically, the paper suggests that the Bernoulli probability will be applied to only valid depth pixels from ground truth, whereas the code samples from all pixels. For this reason, the input contains some invalid depth inputs (the ones that are missing in ground truth, with values of zero in the input depth matrix), which is meaningless during training. I've changed the code and apparently it yields better result compared to the original implementation. Here's the modification:

Original:
screen shot 2018-09-07 at 12 02 52 pm

Modified:
screen shot 2018-09-07 at 1 38 53 pm

It might not be the most efficient way of implementation, but I think the overall idea is correct. Could you please take a look at this? Thanks

Same Training Data on Resume

When creating the train-loader, every worker is initialized with its ID as a random seed.

    if not args.evaluate:
        train_loader = torch.utils.data.DataLoader(
            train_dataset, batch_size=args.batch_size, shuffle=True,
            num_workers=args.workers, pin_memory=True, sampler=None,
            worker_init_fn=lambda work_id:np.random.seed(work_id))

When I decide to train the model for more than 15 epochs (i.e. with the --resume and --epochs arguments), the work_ids will be the same as in the first training, leading to essentially training the network with the same data twice.

I would suggest to modify above code e.g. to:

worker_init_fn=lambda work_id:np.random.seed(work_id+args.epochs))

to generate new training data when resuming the training.

License for repo

Hi, I am experimenting with your work here (which is great btw) and I noticed this repo (PyTorch) does not have a license (as opposed to the original torch version). Any reason for that? Just though it would be great to have something here so people (like myself) are more comfortable playing with the code, making changes, etc. Have you considered adding a license to this version?

Thanks

Implementing SLAM

Hello everyone, I want to implement Monocular SLAM, using only RGB images can I use the same method to implement that...?? From where the sparse depth will come??

Training Stops after first Epoch

Hi,
i am trying to train the model with
python3 main.py -a resnet50 -d deconv3 -m rgbd -s 100
on the nyudepthv2 dataset.

Here is the output of the program right after the first epoch is finished:


Train Epoch: 0 [5920/5948]      t_Data=0.004(0.004) t_GPU=0.093(0.093) RMSE=0.41(0.54) MAE=0.27(0.38) Delta1=0.888(0.762) REL=234.655(1369143557398969.500) Lg10=0.052(nan) 
=> output: results/NYUDataset.modality=rgbd.nsample=100.arch=resnet50.decoder=deconv3.criterion=l1.lr=0.01.bs=8
Train Epoch: 0 [5930/5948]      t_Data=0.004(0.004) t_GPU=0.093(0.093) RMSE=0.33(0.54) MAE=0.24(0.38) Delta1=0.941(0.762) REL=0.351(1366834725411658.500) Lg10=0.046(nan) 
=> output: results/NYUDataset.modality=rgbd.nsample=100.arch=resnet50.decoder=deconv3.criterion=l1.lr=0.01.bs=8
Train Epoch: 0 [5940/5948]      t_Data=0.004(0.004) t_GPU=0.091(0.093) RMSE=0.29(0.54) MAE=0.18(0.38) Delta1=0.965(0.763) REL=0.074(1366117206601379.500) Lg10=0.030(nan) 

Traceback (most recent call last):
  File "main.py", line 355, in <module>
    main()
  File "main.py", line 195, in main
    result, img_merge = validate(val_loader, model, epoch)
  File "main.py", line 299, in validate
    img_merge = utils.merge_into_row(input, target, depth_pred)
  File "/upb/departments/pc2/scratch/maaft/sparse-to-dense/utils.py", line 15, in merge_into_row
    img_merge = np.hstack([rgb, depth, pred])
  File "/root/anaconda3/lib/python3.6/site-packages/numpy/core/shape_base.py", line 288, in hstack
    return _nx.concatenate(arrs, 1)
ValueError: all the input array dimensions except for the concatenation axis must match exactly

GPU is a GTX1080ti.

Any idea what to do?

Best regards,
maaft

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.