GithubHelp home page GithubHelp logo

vincentfung13 / mine Goto Github PK

View Code? Open in Web Editor NEW
406.0 16.0 43.0 7.41 MB

Code and models for our ICCV 2021 paper "MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis"

License: MIT License

Python 98.64% Shell 1.36%
deep-learning novel-view-synthesis nerf depth-estimation 3d-reconstruction computer-vision 3d-vision

mine's People

Contributors

vincentfung13 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mine's Issues

KITTI traning code

Hi,
I was wondering if you plan to release KITTI training code at any time?
Apart from this, are the released model checkpoints all pretrained on ImageNet? Thanks!

Question about KITTI raw dataset

Hi,
Thanks for sharing your work! I am wondering when will you release the dataset pipeline for KITTI raw and other datasets.
By the way, how to evaluate the network for each dataset? And what's the reported performance in LLFF dataset? I can't found them in the paper.
Thanks!

Correspondence of formula and code(torch.cumprod)

Dear authors:
Thanks for your impressive work. I found the opetation "torch.cumprod" in code
def plane_volume_rendering(rgb_BS3HW, sigma_BS1HW, xyz_BS3HW, is_bg_depth_inf): transparency_acc = torch.cumprod(transparency + 1e-6, dim=1) # BxSx1xHxW
However, I can't see an equation that contains cumprod opetation in paper "MINE:...". Where should I refer to the corresponding formula.
Thanks a lot.

Hyperparameters for training

Hello,
would like to congratulate you on such great work!

Are the hyperparameters for the kitti_raw dataset included in params_kitti_raw.yaml the same ones used to reach the results in the paper or should they be changed?

Training on multiple images per scene

Hi,

I noticed in your code that there is an option to train MINE with multiple images as input. In that case, there is no scale ambiguity, right? Can you give an example of a data-loader for that case?

how to prepear my dataset?

hi,
thanks for your good job!
but if i want to train my data? how to process?
i see the llff data have cameras.bin images.bin,points3D.bin。。。how to get these?
could you share the code for that?
Thanks.

Reproducibility Discrepancy

I've been trying to reproduce your results on KITTI Raw dataset using the published code and also used the code in here to create the same splits and preprocessing indicated in the paper. I ran an evaluation run using the pretrained weights of KITTI (32 layers) but the following results are the ones I got which don't align with the results of the paper.

[2021-11-10 18:40:17,526 synthesis_task_kitti.py] Evaluation finished, average losses:
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_loss_rgb_src 0.011722 (0.013352)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_loss_ssim_src 0.018813 (0.022328)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_loss_rgb_tgt 0.058807 (0.064112)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_loss_ssim_tgt 0.343890 (0.348406)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_lpips_tgt 0.214107 (0.253099)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_psnr_tgt 18.623119 (18.415305)

I've also included attachments of the synthesized target and src images. There could be an issue with the kitti data loader that I created so I can share it with you to point out the issue that's causing the discrepancy. If not, I would appreciate it if you can share with me your KITTI data loader to trace where the error is myself.

src_final
tgt_1

Why image normalization twice

Hi,
Image normalization is realized by "img_transforms" when loading image in function of "nerf_dataset.py" . Why normalize the input image again in "ResnetEncoder forward step" ???

Question about train my own photo set

It's a great work!
When I train my own photo set with llff's params, the code will report two errors:"assert len(xyzs) >= visible_points_count" in nerf_dataset.py or "Matrix inverse contains nan!" in utils.py.Some data can be successfully trained, some data will report the first error, and some data will report the second error.
I would like to know if there are issues with the training data captured or if improvements can be made from the code? If there is a problem with the captured dataset, how should the correct image be captured?

Question about Training Data Requirements

Hi,
Thanks for your interesting work.

I have a question regarding training data but I seem not to be able to find it in the paper.
Do you need ground truth depth maps during training or not?
Say I give you a purely image dataset like CIFAR-10, can you run your method on this data or it should contain "additional" information? If so, what is this "additional" information?

I know that during inference you only need the image, but I want to know what information is required during training.

Sincerely,
Hadi.

out of memory

I train on the LLFF dataset with two 2080ti gpus, but it reports "out of memory". I changed the batch size from 2 to 1 in config file but still not work. What should I do?

KITTI split and LPIPS computation

Hi,

Thank you for the fantastic work! I have two small questions regarding model evaluation.

  1. KITTI raw data split
    Section 4.1 mentions that there are 20 city sequences from KITTI Raw used for training and 4 sequences used for test. However, there are 28 city sequences in KITTI Raw in total. Do you use the rest of 4 sequences anywhere in the pipeline? Are the 20 training sequences and 4 test sequences exactly the same as used in Tulsiani 2018, as implemented here?

  2. LPIPS computation
    You computed LPIPS here. According the dataloader implemented here, your inputs to LPIPS are in range [0, 1] while LPIPS expects inputs in range [-1, 1] as mentioned in their doc. Am I missing anything here, or the input should indeed be normalized to have the correct LPIPS score?

Thank you in advance for the time.

Questions about eq(3), eq (8) and eq(12) in the paper

I have some questions about the equations in the paper.
I think those equations should be corrected.
If I misunderstood something, please let me know.

(3)
in the paper
image

expected
image

(8) Parenthesis position is somewhat weird.
in the paper
image
expected
image

(12) Scale factor defined in MPI and MINE is in a reverse relationship, but equations do not reflect the difference.
in the paper
image
expected
image

Qualitative comparision about KITTI

Hi,
there is a qualitative comparision with single-view MPI on KITTI dataset in your paper,
but I do not find their pretrained model on KITTI from their repository.
Did you train their model to get the qualitative results?
Could you provide me a copy of these qualitative results? (just for academic purposes)
Thank you.

无法训练

image
在没有下载resnet50和vgg16预训练模型的情况下,loss为0,无法训练

Inplement detail about plane homography warping between src camera and tgt camera.

In operations/homography_sampler.py file,
1
Line 107-108 calculate plane homography warping matrix between src camera and tgt camera, following the equation:
2
While the K_inv should be K_tgt_inv, not the K_src_inv, K should be K_src. This issue will not happen when K_tgt=K_src, but cause error when intrinsics are not equal.
H_tgt_src = torch.matmul(K_src, torch.matmul(R_tnd, K_tgt_inv))

Preprocessing and Training Flow for Other Datasets

Hello authors, thank you for your great work.

You noted in the README:

Apart from the LLFF dataset, we experimented on the RealEstate10K, KITTI Raw and the Flowers Light Fields datasets - the data pre-processing codes and training flow for these datasets will be released later.

I believe the last update on this was in October 2021, so I am following up. Will you be able to release the dataloaders/code soon?

All the best,

minimum hardward requirements

Thank you for your nice work!

What if I want to run your code, do I need 48 V100 GPUs as you mentioned in the paper?

What are the minimum requirements to run this code?

Thanks in advance.

实时渲染问题

十分有幸可以拜读您的出色工作,但是我有一个问题,就是该工作是否可以实时处理视频以及直播流?
谢谢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.