Xinyi Ye1 Yuhang Zang4 Zhiguo Cao1 Wei Li2 Ziwei Liu2
3Great Bay University 4Shanghai AI Laboratory
TL;DR: MVSGaussian is a Gaussian-based method designed for efficient reconstruction of unseen scenes from sparse views in a single forward pass. It offers high-quality initialization for fast training and real-time rendering.
- [2024.07.16] The latest updated code supports multi-batch training (details) and inference, and a single 3090 GPU is sufficient to reproduce all of our experimental results.
- [2024.07.16] Added a Demo (Custom Data) that only requires multi-view images as input.
- [2024.07.10] Code and checkpoints are released.
- [2024.07.01] Our work is accepted by ECCV2024.
- [2024.05.21] Project Page | arXiv | YouTube released.
We present MVSGaussian, a new generalizable 3D Gaussian representation approach derived from Multi-View Stereo (MVS) that can efficiently reconstruct unseen scenes. Specifically, 1) we leverage MVS to encode geometry-aware Gaussian representations and decode them into Gaussian parameters. 2) To further enhance performance, we propose a hybrid Gaussian rendering that integrates an efficient volume rendering design for novel view synthesis. 3) To support fast fine-tuning for specific scenes, we introduce a multi-view geometric consistent aggregation strategy to effectively aggregate the point clouds generated by the generalizable model, serving as the initialization for per-scene optimization. Compared with previous generalizable NeRF-based methods, which typically require minutes of fine-tuning and seconds of rendering per image, MVSGaussian achieves real-time rendering with better synthesis quality for each scene. Compared with the vanilla 3D-GS, MVSGaussian achieves better view synthesis with less training computational cost. Extensive experiments on DTU, Real Forward-facing, NeRF Synthetic, and Tanks and Temples datasets validate that MVSGaussian attains state-of-the-art performance with convincing generalizability, real-time rendering speed, and fast per-scene optimization.
-
Clone our repository
git clone https://github.com/TQTQliu/MVSGaussian.git cd MVSGaussian
-
Set up the python environment
conda create -n mvsgs python=3.7.13 conda activate mvsgs pip install -r requirements.txt pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 -f https://download.pytorch.org/whl/torch_stable.html
-
Install Gaussian Splatting renderer
git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive pip install gaussian-splatting/submodules/diff-gaussian-rasterization pip install gaussian-splatting/submodules/simple-knn
First, prepare the multi-view image data, and then run colmap. Here, we take examples/scene1
(examples data) as an example:
python lib/colmap/imgs2poses.py -s examples/scene1
And execute the following command to obtain novel views:
python run.py --type evaluate --cfg_file configs/mvsgs/colmap_eval.yaml test_dataset.data_root examples/scene1
or videos:
python run.py --type evaluate --cfg_file configs/mvsgs/colmap_eval.yaml test_dataset.data_root examples/scene1 save_video True
-
DTU
Download DTU data and Depth raw. Unzip and organize them as:
mvs_training ├── dtu ├── Cameras ├── Depths ├── Depths_raw └── Rectified
-
Download NeRF Synthetic, Real Forward-facing, and Tanks and Temples datasets.
-
Train generalizable model
To train a generalizable model from scratch on DTU, specify
data_root
inconfigs/mvsgs/dtu_pretrain.yaml
first and then run:python train_net.py --cfg_file configs/mvsgs/dtu_pretrain.yaml train.batch_size 4
You can specify the
gpus
inconfigs/mvsgs/dtu_pretrain.yaml
.Our code also supports multi-gpu training. The released pretrained model (paper) was trained with 4 RTX 3090 GPUs with a batch size of 1 for each GPU:
python -m torch.distributed.launch --nproc_per_node=4 train_net.py --cfg_file configs/mvsgs/dtu_pretrain.yaml distributed True gpus 0,1,2,3 train.batch_size 1
You can also use 4 GPUs, with a batch size of 4 for each GPU:
python -m torch.distributed.launch --nproc_per_node=4 train_net.py --cfg_file configs/mvsgs/dtu_pretrain.yaml distributed True gpus 0,1,2,3 train.batch_size 4
We provide the results as a reference below:
GPU number Batch size DTU Real Forward-facing NeRF Synthetic Tanks and Temples Training time (per epoch) Training memory Checkpoint PSNR SSIM LPIPS PSNR SSIM LPIPS PSNR SSIM LPIPS PSNR SSIM LPIPS 1 4 28.23 0.963 0.075 24.19 0.860 0.164 26.57 0.948 0.070 23.50 0.879 0.137 ~12min ~22G 1gpu_4batch 4 1 28.21 0.963 0.076 24.07 0.857 0.164 26.46 0.948 0.071 23.29 0.878 0.139 ~5min ~7G 4gpu_1batch (paper) 4 4 28.56 0.964 0.073 24.02 0.858 0.165 26.28 0.947 0.072 23.14 0.876 0.147 ~14min ~23G 4gpu_4batch -
Per-scene optimization
One strategy is to optimize only the initial Gaussian point cloud provided by the generalizable model.
bash scripts/mvsgs/llff_ft.sh bash scripts/mvsgs/nerf_ft.sh bash scripts/mvsgs/tnt_ft.sh
We provide optimized Gaussian point clouds for each scenes here.
You can also run the following command to get the results of vanilla 3D-GS, whose initialization is obtained via COLMAP.
bash scripts/3dgs/llff_ft.sh bash scripts/3dgs/nerf_ft.sh bash scripts/3dgs/tnt_ft.sh
It is worth noting that for the LLFF dataset, the point cloud in the original dataset is obtained by using all views. For fair comparison, we only use the training view set to regain the point cloud, so we recommend downloading the LLFF dataset we processed.
(Optional) Another approach is to optimize the entire pipeline, similar to NeRF-based methods.
Here we take the
fern
on the LLFF as an example:cd ./trained_model/mvsgs mkdir llff_ft_fern cp dtu_pretrain/latest.pth llff_ft_fern cd ../.. python train_net.py --cfg_file configs/mvsgs/llff/fern.yaml
-
Evaluation on DTU
Download the pretrained model and put it into
trained_model/mvsgs/dtu_pretrain/latest.pth
Use the following command to evaluate the pretrained model on DTU:
python run.py --type evaluate --cfg_file configs/mvsgs/dtu_pretrain.yaml mvsgs.cas_config.render_if False,True mvsgs.cas_config.volume_planes 48,8 mvsgs.eval_depth True
The rendered images will be saved in
result/mvsgs/dtu_pretrain
. -
Evaluation on Real Forward-facing
python run.py --type evaluate --cfg_file configs/mvsgs/llff_eval.yaml
-
Evaluation on NeRF Synthetic
python run.py --type evaluate --cfg_file configs/mvsgs/nerf_eval.yaml
-
Evaluation on Tanks and Temples
python run.py --type evaluate --cfg_file configs/mvsgs/tnt_eval.yaml
-
Render videos
Add the
save_video True
argument to save videos, such as:python run.py --type evaluate --cfg_file configs/mvsgs/llff_eval.yaml save_video True
For optimized Gaussians, add
-v
to save videos, such as:python lib/render.py -m output/$scene -p $dir_ply -v
See
scripts/mvsgs/nerf_ft.sh
for$scene
and$dir_ply
.
If you find our work useful for your research, please cite our paper.
@article{liu2024mvsgaussian, title={MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo}, author={Liu, Tianqi and Wang, Guangcong and Hu, Shoukang and Shen, Liao and Ye, Xinyi and Zang, Yuhang and Cao, Zhiguo and Li, Wei and Liu, Ziwei}, journal={arXiv preprint arXiv:2405.12218}, year={2024} }
This project is built on source codes shared by Gaussian-Splatting, ENeRF, MVSNeRF and LLFF. Many thanks for their excellent contributions!
If you have any questions, please feel free to contact Tianqi Liu (tq_liu at hust.edu.cn).
-