vincentfung13 / mine Goto Github PK
View Code? Open in Web Editor NEWCode and models for our ICCV 2021 paper "MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis"
License: MIT License
Code and models for our ICCV 2021 paper "MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis"
License: MIT License
Hi,
I was wondering if you plan to release KITTI training code at any time?
Apart from this, are the released model checkpoints all pretrained on ImageNet? Thanks!
Hi,
Thanks for sharing your work! I am wondering when will you release the dataset pipeline for KITTI raw and other datasets.
By the way, how to evaluate the network for each dataset? And what's the reported performance in LLFF dataset? I can't found them in the paper.
Thanks!
Dear authors:
Thanks for your impressive work. I found the opetation "torch.cumprod" in code
def plane_volume_rendering(rgb_BS3HW, sigma_BS1HW, xyz_BS3HW, is_bg_depth_inf): transparency_acc = torch.cumprod(transparency + 1e-6, dim=1) # BxSx1xHxW
However, I can't see an equation that contains cumprod opetation in paper "MINE:...". Where should I refer to the corresponding formula.
Thanks a lot.
Hello,
would like to congratulate you on such great work!
Are the hyperparameters for the kitti_raw dataset included in params_kitti_raw.yaml the same ones used to reach the results in the paper or should they be changed?
Hi,
I noticed in your code that there is an option to train MINE with multiple images as input. In that case, there is no scale ambiguity, right? Can you give an example of a data-loader for that case?
hi,
thanks for your good job!
but if i want to train my data? how to process?
i see the llff data have cameras.bin images.bin,points3D.bin。。。how to get these?
could you share the code for that?
Thanks.
I've been trying to reproduce your results on KITTI Raw dataset using the published code and also used the code in here to create the same splits and preprocessing indicated in the paper. I ran an evaluation run using the pretrained weights of KITTI (32 layers) but the following results are the ones I got which don't align with the results of the paper.
[2021-11-10 18:40:17,526 synthesis_task_kitti.py] Evaluation finished, average losses:
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_loss_rgb_src 0.011722 (0.013352)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_loss_ssim_src 0.018813 (0.022328)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_loss_rgb_tgt 0.058807 (0.064112)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_loss_ssim_tgt 0.343890 (0.348406)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_lpips_tgt 0.214107 (0.253099)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_psnr_tgt 18.623119 (18.415305)
I've also included attachments of the synthesized target and src images. There could be an issue with the kitti data loader that I created so I can share it with you to point out the issue that's causing the discrepancy. If not, I would appreciate it if you can share with me your KITTI data loader to trace where the error is myself.
Hi,
Image normalization is realized by "img_transforms" when loading image in function of "nerf_dataset.py" . Why normalize the input image again in "ResnetEncoder forward step" ???
Have you tried estimated depth evaluation on KITTI dataset?
It's a great work!
When I train my own photo set with llff's params, the code will report two errors:"assert len(xyzs) >= visible_points_count" in nerf_dataset.py or "Matrix inverse contains nan!" in utils.py.Some data can be successfully trained, some data will report the first error, and some data will report the second error.
I would like to know if there are issues with the training data captured or if improvements can be made from the code? If there is a problem with the captured dataset, how should the correct image be captured?
Hi,
Thanks for your interesting work.
I have a question regarding training data but I seem not to be able to find it in the paper.
Do you need ground truth depth maps during training or not?
Say I give you a purely image dataset like CIFAR-10, can you run your method on this data or it should contain "additional" information? If so, what is this "additional" information?
I know that during inference you only need the image, but I want to know what information is required during training.
Sincerely,
Hadi.
I train on the LLFF dataset with two 2080ti gpus, but it reports "out of memory". I changed the batch size from 2 to 1 in config file but still not work. What should I do?
Hi,
Thank you for the fantastic work! I have two small questions regarding model evaluation.
KITTI raw data split
Section 4.1 mentions that there are 20 city sequences from KITTI Raw used for training and 4 sequences used for test. However, there are 28 city sequences in KITTI Raw in total. Do you use the rest of 4 sequences anywhere in the pipeline? Are the 20 training sequences and 4 test sequences exactly the same as used in Tulsiani 2018, as implemented here?
LPIPS computation
You computed LPIPS here. According the dataloader implemented here, your inputs to LPIPS are in range [0, 1] while LPIPS expects inputs in range [-1, 1] as mentioned in their doc. Am I missing anything here, or the input should indeed be normalized to have the correct LPIPS score?
Thank you in advance for the time.
when I run the demo, then got the issue: No module named 'kornia'
When using 1060-3g inference, the GPU memory is not enough, how should I modify it? thanks
I have some questions about the equations in the paper.
I think those equations should be corrected.
If I misunderstood something, please let me know.
(8) Parenthesis position is somewhat weird.
in the paper
expected
(12) Scale factor defined in MPI and MINE is in a reverse relationship, but equations do not reflect the difference.
in the paper
expected
Hi,
there is a qualitative comparision with single-view MPI on KITTI dataset in your paper,
but I do not find their pretrained model on KITTI from their repository.
Did you train their model to get the qualitative results?
Could you provide me a copy of these qualitative results? (just for academic purposes)
Thank you.
In operations/homography_sampler.py file,
Line 107-108 calculate plane homography warping matrix between src camera and tgt camera, following the equation:
While the K_inv should be K_tgt_inv, not the K_src_inv, K should be K_src. This issue will not happen when K_tgt=K_src, but cause error when intrinsics are not equal.
H_tgt_src = torch.matmul(K_src, torch.matmul(R_tnd, K_tgt_inv))
Hello authors, thank you for your great work.
You noted in the README:
Apart from the LLFF dataset, we experimented on the RealEstate10K, KITTI Raw and the Flowers Light Fields datasets - the data pre-processing codes and training flow for these datasets will be released later.
I believe the last update on this was in October 2021, so I am following up. Will you be able to release the dataloaders/code soon?
All the best,
Thank you for your nice work!
What if I want to run your code, do I need 48 V100 GPUs as you mentioned in the paper?
What are the minimum requirements to run this code?
Thanks in advance.
Thanks for open-sourcing the great work!
Look forward to friends who pay attention to this work and hope to discuss it. Thank you.
WeChat link as follows:
https://wx3.sinaimg.cn/mw690/b0e12477gy1gwcfjj9ez9j20tc10xjuv.jpg
十分有幸可以拜读您的出色工作,但是我有一个问题,就是该工作是否可以实时处理视频以及直播流?
谢谢
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.