donydchen / mvsplat Goto Github PK

View Code? Open in Web Editor NEW

523.0 21.0 22.0 459 KB

🌊 [ECCV'24] MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Home Page: https://donydchen.github.io/mvsplat

License: Other

Shell 0.88% Python 99.12%

cost-volume gaussian-splatting novel-view-synthesis feed-forward-gaussian-splatting eccv2024

mvsplat's People

Contributors

Stargazers

Watchers

mvsplat's Issues

Error in evaluation with n_contexts=3

Hi, thanks for the great work. I ran your DTU evaluation with the view sampler "evaluation_index_dtu_nctx3.json" and it results in the following:

qkv = rearrange(qkv, "(v b) n t -> b n (v t)", v=self.n_frames)

File "/home/shubhendujena/anaconda3/envs/mvsplat/lib/python3.10/site-packages/einops/einops.py", line 591, in rearrange
return reduce(tensor, pattern, reduction="rearrange", **axes_lengths)
File "/home/shubhendujena/anaconda3/envs/mvsplat/lib/python3.10/site-packages/einops/einops.py", line 533, in reduce
raise EinopsError(message + "\n {}".format(e))
einops.EinopsError: Error while processing rearrange-reduction pattern "(v b) n t -> b n (v t)".
Input tensor shape: torch.Size([3, 384, 256]). Additional info: {'v': 2}.
Shape mismatch, can't divide axis of length 3 in chunks of 2

I'd be grateful if you could help me fix this

Thanks in advance

out of memory

Hi，thanks for your amazing work！I came to a problem that my cuda is out of memory，i got 4 A100（40g per A100），but it still said that out of memory(already tried to reduce the batchsize). I personally just want to use a single A100 to train the code ,but i dont know how to modify it to just use one A100, and with the 40 g memory , i guess the batchsize needs to be modified too.
But i m still a freshman in school, got no help and no clues on how to successfully run it..(please forgive my poor English...)

Got errors when evaluating.

Hi, what a great job!
I have set the cuda devices set CUDA_VISIBLE_DEVICES=0 and i run the evaluation code:python -m src.main +experiment=acid checkpointing.load=checkpoints/acid.ckpt mode=test dataset/view_sampler=evaluation dataset.view_sampler.index_path=assets/evaluation_index_acid.json test.compute_scores=true
there is something wrong with my code:
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Loading model from: C:\Users\user.conda\envs\mvsplat\lib\site-packages\lpips\weights\v0.1\vgg.pth
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
[W socket.cpp:663] [c10d] The client socket has failed to connect to [DESKTOP-M7IQG5O]:2765 (system error: 10049
Error executing job with overrides: ['+experiment=acid', 'checkpointing.load=checkpoints/acid.ckpt', 'mode=test', 'dataset/view_sampler=evaluation', 'dataset.view_sampler.index_path=assets/evaluation_index_acid.json', 'test.compute_scores=true']
Traceback (most recent call last):
File "F:\Github\mvsplat\src\main.py", line 143, in train
trainer.test(
File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 754, in test
return call._call_and_handle_interrupt(
File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\pytorch_lightning\trainer\call.py", line 43, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 105, in launch
return function(*args, **kwargs)
File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 794, in _test_impl
results = self._run(model, ckpt_path=ckpt_path)
File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 943, in _run
self.strategy.setup_environment()
File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 154, in setup_environment
self.setup_distributed()
File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 203, in setup_distributed
_init_dist_connection(self.cluster_environment, self._process_group_backend, timeout=self._timeout)
File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\lightning_fabric\utilities\distributed.py", line 291, in _init_dist_connection
torch.distributed.init_process_group(torch_distributed_backend, rank=global_rank, world_size=world_size, **kwargs)
File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\torch\distributed\c10d_logger.py", line 74, in wrapper
func_return = func(*args, **kwargs)
File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\torch\distributed\distributed_c10d.py", line 1148, in init_process_group
default_pg, _ = _new_process_group_helper(
File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\torch\distributed\distributed_c10d.py", line 1268, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in
I follow the instructions and i don't know why? Plz help!

RuntimeError: unable to write to file </torch_14471_3616880733_1>: No space left on device (28)

Hi, when running the following test command:

python -m src.main +experiment=re10k checkpointing.load=checkpoints/re10k.ckpt mode=test dataset/view_sampler=evaluation dataset.view_sampler.index_path=assets/evaluation_index_re10k_video.json test.save_video=true test.save_image=false test.compute_scores=false

I get this error:

Saving outputs to /home/ali/git/mvsplat/outputs/2024-04-15/11-22-42.
rm: cannot remove 'outputs/local': No such file or directory
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Loading model from: /opt/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Restoring states from the checkpoint path at checkpoints/re10k.ckpt
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Loaded model weights from the checkpoint at checkpoints/re10k.ckpt
Testing: |                                                                                                                                                                                                                        | 0/? [00:00<?, ?it/s]ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
Error executing job with overrides: ['+experiment=re10k', 'checkpointing.load=checkpoints/re10k.ckpt', 'mode=test', 'dataset/view_sampler=evaluation', 'dataset.view_sampler.index_path=assets/evaluation_index_re10k_video.json', 'test.save_video=true', 'test.save_image=false', 'test.compute_scores=false']
Traceback (most recent call last):
  File "/home/ali/git/mvsplat/src/main.py", line 143, in train
    trainer.test(
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 754, in test
    return call._call_and_handle_interrupt(
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 794, in _test_impl
    results = self._run(model, ckpt_path=ckpt_path)
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 987, in _run
    results = self._run_stage()
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1026, in _run_stage
    return self._evaluation_loop.run()
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 182, in _decorator
    return loop_run(self, *args, **kwargs)
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 128, in run
    batch, batch_idx, dataloader_idx = next(data_fetcher)
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/loops/fetchers.py", line 133, in __next__
    batch = super().__next__()
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/loops/fetchers.py", line 60, in __next__
    batch = next(self.iterator)
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/utilities/combined_loader.py", line 341, in __next__
    out = next(self._iterator)
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/utilities/combined_loader.py", line 142, in __next__
    out = next(self.iterators[0])
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
    data = self._next_data()
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
    return self._process_data(data)
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
    data.reraise()
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/torch/_utils.py", line 694, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 42, in fetch
    return self.collate_fn(data)
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 265, in default_collate
    return collate(batch, collate_fn_map=default_collate_fn_map)
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 127, in collate
    return elem_type({key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem})
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 127, in <dictcomp>
    return elem_type({key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem})
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 127, in collate
    return elem_type({key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem})
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 127, in <dictcomp>
    return elem_type({key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem})
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 119, in collate
    return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 160, in collate_tensor_fn
    storage = elem._typed_storage()._new_shared(numel, device=elem.device)
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/torch/storage.py", line 866, in _new_shared
    untyped_storage = torch.UntypedStorage._new_shared(size * self._element_size(), device=device)
  File "/opt/conda/envs/mvsplat/lib/python3.10/site-packages/torch/storage.py", line 262, in _new_shared
    return cls._new_using_fd_cpu(size)
**RuntimeError: unable to write to file </torch_14471_3616880733_1>: No space left on device (28)**


Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Testing: |

I am using an RTX 3070Ti to run the test with 8GB of VRAM. I reckon this is not enough memory for the test to run.
Is it possible to make the test run on a GPU with less memory? (maybe by reducing batch size or the number of test videos loaded in to GPU memory)

Thanks for this great work.

ghosting issue on n3dv dataset

Hello, I tested on the n3dv dataset using the model and weights provided by you and found that all NVS images have a ghosting issue. Since Gaussians are based on back-projecting to the world coordinate system from two views, I suspect the problem may be due to the accuracy of the camera parameters (as the camera parameters in the n3dv dataset are float64 and I converted them to float32) or could it be that image scaling is causing the Gaussian back-projection to not accurately align in depth (I scaled the images from 2k2k to 512512). Looking forward to your reply!
The following images are respectively the NVS RGB image and the depth map from the source view：

why fix the fx and fy when center-crop the image patch in the code?

hello. when you apply center-crop for the image, I think you should fix the cx and cy for the image intrinsics. But in your implementation, you fix the fx and fy. Is this your implementation bug?

mvsplat/src/dataset/shims/patch_shim.py

Lines 15 to 21 in 378ff81

 # Center-crop the image. 

 image = views["image"][:, :, :, row : row + h_new, col : col + w_new] 

 # Adjust the intrinsics to account for the cropping. 

 intrinsics = views["intrinsics"].clone() 

 intrinsics[:, :, 0, 0] *= w / w_new # fx 

 intrinsics[:, :, 1, 1] *= h / h_new # fy

how support LLFF Mip-NeRF 360 dataset?

thank your mvsplat , i like it very much !

how support LLFF Mip-NeRF 360 dataset?

Reproducing results with multiple GPUs

Hi Yuedong, thank you for open source your great work!

When I trained the model using 3 Nvidia RTX 3090s (batch size 4 per GPU), I got significantly worse results on the re10k.

psnr 22.12379274863242
ssim 0.7298626045353773
lpips 0.22073094525619313

Will fewer batchsize or multi-GPU training significantly affect the performance of the model?
By the way, I use the official weights and can get results consistent with the paper.

psnr 26.386906073201686
ssim 0.8690403559103327
lpips 0.12837660807718004

Difference in diff-gaussian-rasterization-modified

Hi, thanks for the great work. Can you please give some details what are the changes in the new diff-gaussian-rasterization package? Would the model trained with the new package compatible with old one?

where can I download gmdepth-scale1-resumeflowthings-scannet-5d9d7964.pth?

Concerns about the initial number of 3D Gaussians

Hi, thanks for the great work. The paper mentions D=128, and "After obtaining the multi-view depth predictions, we directly unproject them to 3D point clouds using the camera parameters."
So is the initial number of 3D Gaussians 128*K? Isn't this number too small? Approximately how many 3D Gaussians are there after the training is completed?

Custom dataset training

Hi, thanks for the great work. I have some questions about custom data training.

In the paper, re10k data training only input 2 context-view rgb images and corresponding intrinsics and extrinsics，and output a novel view rgb.

About the znear and zfar, in “dataset_re10k.py”, it is set 1 and 100. Should znear and zfar be modified if trained on my custom dataset? What 1 and 100 mean? Meter?
About extrinsic and intrinsic, according to pixelsplat, “Our extrinsics are OpenCV-style camera-to-world matrices. This means that +Z is the camera look vector, +X is the camera right vector, and -Y is the camera up vector. Our intrinsics are normalized, meaning that the first row is divided by image width, and the second row is divided by image height.”
I don’t know what the dimension of T vector of extrinsic, is the T vector in meters? And according to your “dataset_re10k.py”, the extrinsic of raw data is “w2c” and you return “w2c.inverse()” as c2w in function “convert_poses()”. Is my understanding correct?
The num of context view is 3 in my custom dataset. In the paper, it is trained with 2 context view. Where can I modify it?
By the way, the paper use MVS cost volume, but the model is mainly trained with 2-input-view setting. Did you try to train with mutiple-input-view-setting?

Got wrong results

Hi! Dear author, I follow the instructions in README.md to run evaluation part. I have downloaded the pretrained models and sub-datasets and saved to the checkpoints and datasets respectively, but I got wrong results. It seems that I missed something important. Do you have advice to deal with it?
python -m src.main +experiment=acid checkpointing.load=checkpoints/acid.ckpt mode=test dataset/view_sampler=evaluation dataset.view_sampler.index_path=assets/evaluation_index_acid.json test.compute_scores=true

Saving outputs to /data5/ly/mmdet/mvsplat/outputs/2024-04-07/09-31-02.
rm: cannot remove 'outputs/local': No such file or directory
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/8
Saving outputs to /data5/ly/mmdet/mvsplat/outputs/2024-04-07/09-31-02.
rm: cannot remove 'outputs/local': No such file or directory
Saving outputs to /data5/ly/mmdet/mvsplat/outputs/2024-04-07/09-31-02.
rm: cannot remove 'outputs/local': No such file or directory
Saving outputs to /data5/ly/mmdet/mvsplat/outputs/2024-04-07/09-31-02.
rm: cannot remove 'outputs/local': No such file or directory
Saving outputs to /data5/ly/mmdet/mvsplat/outputs/2024-04-07/09-31-02.
Saving outputs to /data5/ly/mmdet/mvsplat/outputs/2024-04-07/09-31-02.
rm: cannot remove 'outputs/local': No such file or directory
rm: cannot remove 'outputs/local': No such file or directory
Saving outputs to /data5/ly/mmdet/mvsplat/outputs/2024-04-07/09-31-02.
rm: cannot remove 'outputs/local': No such file or directory
Saving outputs to /data5/ly/mmdet/mvsplat/outputs/2024-04-07/09-31-02.
rm: cannot remove 'outputs/local': No such file or directory
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Initializing distributed: GLOBAL_RANK: 5, MEMBER: 6/8
Initializing distributed: GLOBAL_RANK: 4, MEMBER: 5/8
Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/8
Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/8
Initializing distributed: GLOBAL_RANK: 6, MEMBER: 7/8
Initializing distributed: GLOBAL_RANK: 7, MEMBER: 8/8
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/8

distributed_backend=nccl
All distributed processes registered. Starting with 8 processes

Restoring states from the checkpoint path at checkpoints/acid.ckpt
LOCAL_RANK: 5 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 4 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 7 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 6 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 3 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
Loaded model weights from the checkpoint at checkpoints/acid.ckpt
Testing DataLoader 0: 0%| | 0/16 [00:00<?, ?it/s]Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Testing DataLoader 0: 6%|██████████▏ | 1/16 [00:04<01:04, 0.23it/s]Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Testing DataLoader 0: 12%|████████████████████▍ | 2/16 [00:04<00:31, 0.45it/s]Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Testing DataLoader 0: 31%|██████████████████████████████████████████████████▉ | 5/16 [00:04<00:10, 1.02it/s]Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Testing DataLoader 0: 38%|█████████████████████████████████████████████████████████████▏ | 6/16 [00:05<00:08, 1.18it/s]Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Testing DataLoader 0: 44%|███████████████████████████████████████████████████████████████████████▎ | 7/16 [00:05<00:06, 1.34it/s]Loading model from: /data2/ly/conda/envs/mvsplat/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Testing DataLoader 0: 81%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 13/16 [00:06<00:01, 2.13it/s]psnr 5.61385152890132
ssim 0.0009663698885840579
lpips 0.7411693197030288
encoder: 8 calls, avg. 0.061416834592819214 seconds per call
decoder: 24 calls, avg. 0.0018823047478993733 seconds per call
Testing DataLoader 0: 81%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 13/16 [00:06<00:01, 2.12it/s]
psnr 5.61385152890132
ssim 0.0009663698885840579
lpips 0.7411693197030288
encoder: 8 calls, avg. 0.06180933117866516 seconds per call
decoder: 24 calls, avg. 0.0019558072090148926 seconds per call
psnr 5.61385152890132
ssim 0.0009663698885840579
lpips 0.7411693197030288
encoder: 8 calls, avg. 0.06272295117378235 seconds per call
decoder: 24 calls, avg. 0.002048651377360026 seconds per call
psnr 5.61385152890132
ssim 0.0009663698885840579
lpips 0.7411693197030288
encoder: 8 calls, avg. 0.0654122531414032 seconds per call
decoder: 24 calls, avg. 0.0019436180591583252 seconds per call
psnr 5.61385152890132
ssim 0.0009663698885840579
lpips 0.7411693197030288
encoder: 8 calls, avg. 0.09683313965797424 seconds per call
decoder: 24 calls, avg. 0.002124359210332235 seconds per call
psnr 5.61385152890132
ssim 0.0009663698885840579
lpips 0.7411693197030288
psnr 5.61385152890132
ssim 0.0009663698885840579
encoder: 8 calls, avg. 0.09448182582855225 seconds per call
decoder: 24 calls, avg. 0.0020943681399027505 seconds per call
lpips 0.7411693197030288
encoder: 8 calls, avg. 0.10512921214103699 seconds per call
decoder: 24 calls, avg. 0.0021263360977172847 seconds per call
psnr 5.61385152890132
ssim 0.0009663698885840579
lpips 0.7411693197030288
encoder: 8 calls, avg. 0.10539361834526062 seconds per call
decoder: 24 calls, avg. 0.002041985591252645 seconds per call

about training with multi GPU

hi, i encounter one problem. When i run the code with multiple GPUs distributed on the different nodes on the slurm. I find that I can not execute GPUS on different nodes. I wonder does the code support the distribution on different nodes?

missing license

will this have an MIT license similar to pixelSplat and UniMatch?

RuntimeError: DataLoader worker (pid 1201923) is killed by signal: Floating point exception.

Thanks for your great work! @donydchen
I tried to train MVSplat using processed Realestate10K dataset provided by pixelSplat's author, but following error occurred.
The training loop run successfully for 10K steps.
I have no idea what this is. Maybe a zero division? Have you faced this error before?

Error executing job with overrides: ['+experiment=re10k', 'data_loader.train.batch_size=8'] Traceback (most recent call last): File "/home/liang/mvsplat/src/main.py", line 141, in train trainer.fit(model_wrapper, datamodule=data_module, ckpt_path=checkpoint_path) File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit call._call_and_handle_interrupt( File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 987, in _run results = self._run_stage() File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1033, in _run_stage self.fit_loop.run() File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 205, in run self.advance() File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 363, in advance self.epoch_loop.run(self._data_fetcher) File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 140, in run self.advance(data_fetcher) File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 223, in advance batch = call._call_strategy_hook(trainer, "batch_to_device", batch, dataloader_idx=0) File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 309, in _call_strategy_hook output = fn(*args, **kwargs) File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 278, in batch_to_device return model._apply_batch_transfer_handler(batch, device=device, dataloader_idx=dataloader_idx) File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 347, in _apply_batch_transfer_handler batch = self._call_batch_hook("transfer_batch_to_device", batch, device, dataloader_idx) File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 336, in _call_batch_hook return trainer_method(trainer, hook_name, *args) File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 157, in _call_lightning_module_hook output = fn(*args, **kwargs) File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/pytorch_lightning/core/hooks.py", line 613, in transfer_batch_to_device return move_data_to_device(batch, device) File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/lightning_fabric/utilities/apply_func.py", line 103, in move_data_to_device return apply_to_collection(batch, dtype=_TransferableDataType, function=batch_to) File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/lightning_utilities/core/apply_func.py", line 72, in apply_to_collection return _apply_to_collection_slow( File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/lightning_utilities/core/apply_func.py", line 104, in _apply_to_collection_slow v = _apply_to_collection_slow( File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/lightning_utilities/core/apply_func.py", line 104, in _apply_to_collection_slow v = _apply_to_collection_slow( File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/lightning_utilities/core/apply_func.py", line 96, in _apply_to_collection_slow return function(data, *args, **kwargs) File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/lightning_fabric/utilities/apply_func.py", line 97, in batch_to data_output = data.to(device, **kwargs) File "/home/liang/anaconda3/envs/mvsplat/lib/python3.10/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 1201923) is killed by signal: Floating point exception. Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Function of times_per_scene

Hi,

Thank you for the work. Could you please clarify the meaning behind "times_per_scene" in dataset_re10k.py?

Thanks in advance

Assistance Needed with DTU Cross-Generalization Test Reproduction

Hi, sincerely appreciate sharing this amazing work!
I am currently working on reproducing the DTU cross-generalization test results as described in your recent publication.

Despite my efforts to follow the experimental setup outlined in the paper, including ensuring the camera requirements are met (with normalized intrinsic parameters and cam2world matrices for extrinsic parameters), I've encountered difficulties in replicating the results presented in your paper, specifically the quality of the images.

For reference, here are the results I obtained:

Would it be possible for you to share the specific dataloader used for the DTU evaluation or provide any guidance or recommendations that could aid in accurately evaluating the tests?

Thanks in advance.

Sincerely,

About torch version

Hi, thanks for your great work!
The version of CUDA on my machine is 11.6. Does this mean I have to upgrade CUDA version at least to 11.8 to match torch==2.1.2? It is possible that I use the cu116 and torch==1.13.1 and the matched torchvision and torchaudio.
Hopes your reply!

less effective results by overfitting on re10k subset

Great work.
I try to quickly validate the effectiveness of the network by "overfitting" on a small re10k subset, and the results seem to be less than my expectation. I wonder if I miss some key points of your work. Below are the settings.

Dataset: re10k subset
Training platform: 4 GPUs of 4090, batch_size=16 with 4 for each GPU.
Hyper-params: same as your newly-released codes, I didn't change any.

Training command keys:
+experiment=re10k
data_loader.train.batch_size=4
checkpointing.every_n_train_steps=5000

Test command keys:
+experiment=re10k
checkpointing.load=outputs/2024-04-15/17-57-20/checkpoints/epoch_1499-step_15000.ckpt
mode=test
dataset/view_sampler=evaluation
test.compute_scores=true

The results:
Testing DataLoader 0: 93%|██████████████▊ | 38/41 [00:06<00:00, 6.03it/s]
psnr 21.53944028051276
ssim 0.7834970355033875
lpips 0.21417335558094477
encoder: 33 calls, avg. 0.0347950892014937 seconds per call
decoder: 99 calls, avg. 0.0010259873939282966 seconds per call

That is, after "overfitting" on re10k subset by 1499 epochs /15000 steps, the model gets a PSNR with 21.54 on this subset (the renderring visualizations are also not good), much less than my expectation. Generally, I expect that the model could reach PSNR~30 after "overfitting on a small subset" by 100 epochs.

diff-gaussian-rasterization-modified is rightly built and installed. I have checked the test results of your released re10k model, which are consistent with Table. 1 (PSNR=26+)

I have referred to issue 14, i.e., the test results are good after large-scale training.

Maybe your proposed model is not suitable for "overfitting" on a small subset, right? But why? If so, it seems counter-intuitive in this field.
I prefer it is that I miss some key points. Look forward to your clarification. Thanks.

Attachment is the training log, for your checking.
20240415_175717.log

What are the best practices for incorporating custom dataset inputs?

Hi @donydchen, kudos to your work. Your work MVSplat has my keen interest and I'm particularly interested in using it with custom datasets, but I'm having some trouble with a few things.

1.Do i have to upload custom dataset to the YouTube and use the URL, like in your dataset? Is there another approach I can take? If yes, could you please tell that approach?
2.How do you generate timestamps, camera poses, images, and keys for a particular video?

Thank you in advance.

about test on co3d with pre-trained model

Hi, I tested the pre-trained model on the co3d dataset. However, the results seem very bad. I checked 1: the intrinsic and extrinsic parameters of the input with the epipolar model. 2 I checked the reshaped images for 256 * 256. 3: I adjusted the depth of the near and far carefully. I wonder if is it because of the generalizability of the pre-trained model？ Thank you so much.

Training Time

Thank you for your excellent work！

I noted that your evaluation mainly focus on rendering speed，then how many hours does it take to train the model？

question about extension to multi-view (>2) inputs and cost volume

Thanks for your great contribution to this promising and interesting field.

I noticed that the paper's main experiment focused on two-view inputs, similar to PixelSplat. However, as you mentioned in the article, the MVS-based method can naturally be applied to multi-views(>2). Can the current pre-trained model directly extend to multi-view (>2) input?

Besides, the cost volume used in the paper needs the (near, far) plane for discrete depth sampling. So when we extend to other datasets w/o gt (near, far) as input, how should we deal with it? Also, while each view has a separate cost volume, when the view becomes dense and reso becomes larger, how to deal with the increased parameters and the need for cross-view information exchanging?

About scale multiplier

Hi, thanks for this great work. I don't understand why a multiplier generated by intrinsic and pixel_size is used for scales.

        scale_min = self.cfg.gaussian_scale_min
        scale_max = self.cfg.gaussian_scale_max
        scales = scale_min + (scale_max - scale_min) * scales.sigmoid()
        h, w = image_shape
        pixel_size = 1 / torch.tensor((w, h), dtype=torch.float32, device=device)
        multiplier = self.get_scale_multiplier(intrinsics, pixel_size)
        scales = scales * depths[..., None] * multiplier[..., None]

Is this to convert the scale factor from the image space to the camera space?

Thanks in advance!

about training data

Hi, i have a question about training data. Have you trained only one checkpoints with all the training data? I mean i don't want to train different checkpoints for different dataset? And i wonder if it is possible and how should i organize the dataset? if i need to write different dataloader? Thanks.

Creating world-space covariance matrices

Hi, thanks for the great work. I was trying to understand your code, and I have doubts with the following operation.

    # Create world-space covariance matrices.
    covariances = build_covariance(scales, rotations)
    c2w_rotations = extrinsics[..., :3, :3]
    covariances = c2w_rotations @ covariances @ c2w_rotations.transpose(-1, -2)

Could you please clarify why this is being done?

Thanks in advance

Difficulty Exporting .ply File from MVSplat Scenes

Hi, I am writing to bring to your attention an issue I encountered while attempting to export .ply files from scenes generated by MVSplat after running the Evaluation phase.

Background:
I utilized the following inputs and steps:

Inputs:
--Dataset: re10k
--Location: datasets/re10k/test
Output:
Scenes Generated after evolution phase: test -> re10k List of scenes: [ 0c4c5d5f751aabf5 28e8300e004ab30b 57d25dafabb5a238 67a69088a2695987 a56ba2efb5e3fdd9... so on ]

Issue Details:

After processing the Evaluation phase on the re10k dataset, I attempted to export .ply files from the generated scenes using the provided script:
python -m src.paper.generate_point_cloud_figure_mvsplat
+experiment=re10k
checkpointing.load=checkpoints/re10k.ckpt
mode=test
dataset/view_sampler=evaluation

I made a modification to load index.json in generate_point_cloud_figure_mvsplat.py as follows:

   with open("datasets/re10k/test/index.json") as f:
          test_cfgs = json.load(f)

However, upon running the script, I encountered errors.

Questions:
1. How can I specify scenes as inputs in the script?
2. What steps are necessary to successfully export .ply files from the generated scenes after the Evaluation phase?

I appreciate your assistance in resolving this issue. Please let me know if there are any further steps or information required from my end.

Thank you

Custom Dataset + Exporting to PLY

Hi,
I was wondering if there were any resources for training on custom data (video/images + COLMAP camera poses)?
Also, is there a way for exporting the model to a renderable PLY format?

Question about Cost volume construction

Hello, I don't understand the inverse depth domain and the specific operations of warp. I can't find explanations for the inverse depth domain online.
Can you provide some information about them? Thanks!

Why normalize the intrinsics and extrinsics in the covert_dtu.py?

For the DTU dataset, I noticed your code has normalized the camera intrinsics in the convert.py? As a result, the pose and intrinsics in the code are hard to understand and it is not convenient to use this code for 3rd dataset (Waymo or Mipnerf360).

Resolution Requirements

Thank you for the incredible work. I have 2 questions:

How can I test the models at a resolution different from 256x256?
How would the latency of this approach change at higher resolutions?

Thank you for your time.

query regarding render with big view change

Hi, thanks for the great work. I have some question about the advantages and limitations of feature matching cost volume formulation:

I see in the paper that the rendered novel views have large view overlap w.r.t. the input views, this makes sense because the feature matching clue between input views can only help determining the depth of those pixel in the input views, but not the occluded side of the view (e.g., the back side of a car)
If my observation is reasonable, does it mean that if I want to learn novel view synthesis where the novel view is significanly different than the input view (like shifting to the occluded side of input view), the feature matching volume alone won't help much?

Thanks

About the prediction of opacity.

As in the paper, Sect. 3.2, the opacity is predicted from the matching confidence with 2 convolution layers. However, I can only find the function map_pdf_to_opacity in the code that maps densities to opacities.

I wonder which one is the final implementation. Looking forward to your reply!

	# Center-crop the image.
	image = views["image"][:, :, :, row : row + h_new, col : col + w_new]

	# Adjust the intrinsics to account for the cropping.
	intrinsics = views["intrinsics"].clone()
	intrinsics[:, :, 0, 0] *= w / w_new # fx
	intrinsics[:, :, 1, 1] *= h / h_new # fy

donydchen / mvsplat Goto Github PK

mvsplat's People

Contributors

Stargazers

Watchers

Forkers

mvsplat's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs