GithubHelp home page GithubHelp logo

sp9103 / directvoxgo Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sunset1995/directvoxgo

0.0 0.0 0.0 4.83 MB

DirectVoxGO reconstructs a scene representation from a set of calibrated images capturing the scene. We achieve NeRF-comparable novel-view synthesis quality with super-fast convergence.

License: Other

Python 78.04% C++ 5.84% Cuda 16.11%

directvoxgo's Introduction

DirectVoxGO

DirectVoxGO (Direct Voxel Grid Optimization, CVPR2022, see our paper) reconstructs a scene representation from a set of calibrated images capturing the scene. The current version have improved speed by 1.8~3.5x and supported forward-facing scenes comparing to the initial publication.

  • NeRF-comparable quality for synthesizing novel views of inward-bounded scene.
  • Super-fast convergence: Our 5 mins/scene vs. NeRF's 10~20+ hrs/scene.
  • No cross-scene pre-training required: We optimize each scene from scratch.
  • Better rendering speed: Our 0.36~0.07 secs vs. NeRF's 29 secs to synthesize a 800x800 images.
  • The newly implemented cuda extension is built just-in-time by pytorch at the first time you run the code.

Below run-times (mm:ss) of our optimization progress are measured on a machine with a single RTX 2080 Ti GPU.

github_teaser_inward_bounded.mp4
github_teaser_forward_facing.mp4

Update

See the report for detail of below update:

  • 2022.03.28: Reporting our large model on forward-facining scenes.
  • 2022.02.10: Extending for forward-facining scenes.
  • 2022.01.23: Improving training speed by 1.8~3.5x. Some intermediate steps are re-implemented in cuda but not fully fused---flexible but sacrficing speed. Telsa V100, RTX 2080 Ti, and RTX 1080 Ti are tested.
  • 2021.11.23: Support CO3D dataset.
  • 2021.11.23: Initial release.

Installation

git clone [email protected]:sunset1995/DirectVoxGO.git
cd DirectVoxGO
pip install -r requirements.txt

Pytorch and torch_scatter installation is machine dependent, please install the correct version for your machine.

Dependencies (click to expand)
  • PyTorch, numpy, torch_scatter: main computation.
  • scipy, lpips: SSIM and LPIPS evaluation.
  • tqdm: progress bar.
  • mmcv: config system.
  • opencv-python: image processing.
  • imageio, imageio-ffmpeg: images and videos I/O.

Download: datasets, trained models, and rendered test views

Directory structure for the datasets (click to expand; only list used files)
data
├── nerf_synthetic     # Link: https://drive.google.com/drive/folders/128yBriW1IG_3NJ5Rp7APSTZsJqdJdfc1
│   └── [chair|drums|ficus|hotdog|lego|materials|mic|ship]
│       ├── [train|val|test]
│       │   └── r_*.png
│       └── transforms_[train|val|test].json
│
├── Synthetic_NSVF     # Link: https://dl.fbaipublicfiles.com/nsvf/dataset/Synthetic_NSVF.zip
│   └── [Bike|Lifestyle|Palace|Robot|Spaceship|Steamtrain|Toad|Wineholder]
│       ├── intrinsics.txt
│       ├── rgb
│       │   └── [0_train|1_val|2_test]_*.png
│       └── pose
│           └── [0_train|1_val|2_test]_*.txt
│
├── BlendedMVS         # Link: https://dl.fbaipublicfiles.com/nsvf/dataset/BlendedMVS.zip
│   └── [Character|Fountain|Jade|Statues]
│       ├── intrinsics.txt
│       ├── rgb
│       │   └── [0|1|2]_*.png
│       └── pose
│           └── [0|1|2]_*.txt
│
├── TanksAndTemple     # Link: https://dl.fbaipublicfiles.com/nsvf/dataset/TanksAndTemple.zip
│   └── [Barn|Caterpillar|Family|Ignatius|Truck]
│       ├── intrinsics.txt
│       ├── rgb
│       │   └── [0|1|2]_*.png
│       └── pose
│           └── [0|1|2]_*.txt
│
├── deepvoxels         # Link: https://drive.google.com/drive/folders/1ScsRlnzy9Bd_n-xw83SP-0t548v63mPH
│   └── [train|validation|test]
│       └── [armchair|cube|greek|vase]
│           ├── intrinsics.txt
│           ├── rgb/*.png
│           └── pose/*.txt
│
├── nerf_llff_data     # Link: https://drive.google.com/drive/folders/128yBriW1IG_3NJ5Rp7APSTZsJqdJdfc1
│   └── [fern|flower|fortress|horns|leaves|orchids|room|trex]
│
└── co3d               # Link: https://github.com/facebookresearch/co3d
    └── [donut|teddybear|umbrella|...]
        ├── frame_annotations.jgz
        ├── set_lists.json
        └── [129_14950_29917|189_20376_35616|...]
            ├── images
            │   └── frame*.jpg
            └── masks
                └── frame*.png

Synthetic-NeRF, Synthetic-NSVF, BlendedMVS, Tanks&Temples, DeepVoxels datasets

We use the datasets organized by NeRF, NSVF, and DeepVoxels. Download links:

Download all our trained models and rendered test views at this link to our logs.

LLFF dataset

We use the LLFF dataset organized by NeRF. Download link: nerf_llff_data.

CO3D dataset

We also support the recent Common Objects In 3D dataset. Our method only performs per-scene reconstruction and no cross-scene generalization.

GO

Train

To train lego scene and evaluate testset PSNR at the end of training, run:

$ python run.py --config configs/nerf/lego.py --render_test

Use --i_print and --i_weights to change the log interval.

Evaluation

To only evaluate the testset PSNR, SSIM, and LPIPS of the trained lego without re-training, run:

$ python run.py --config configs/nerf/lego.py --render_only --render_test \
                                              --eval_ssim --eval_lpips_vgg

Use --eval_lpips_alex to evaluate LPIPS with pre-trained Alex net instead of VGG net.

Reproduction

All config files to reproduce our results:

$ ls configs/*
configs/blendedmvs:
Character.py  Fountain.py  Jade.py  Statues.py

configs/nerf:
chair.py  drums.py  ficus.py  hotdog.py  lego.py  materials.py  mic.py  ship.py

configs/nsvf:
Bike.py  Lifestyle.py  Palace.py  Robot.py  Spaceship.py  Steamtrain.py  Toad.py  Wineholder.py

configs/tankstemple:
Barn.py  Caterpillar.py  Family.py  Ignatius.py  Truck.py

configs/deepvoxels:
armchair.py  cube.py  greek.py  vase.py

Your own config files

Check the comments in configs/default.py for the configuable settings. The default values reproduce our main setup reported in our paper. We use mmcv's config system. To create a new config, please inherit configs/default.py first and then update the fields you want. Below is an example from configs/blendedmvs/Character.py:

_base_ = '../default.py'

expname = 'dvgo_Character'
basedir = './logs/blended_mvs'

data = dict(
    datadir='./data/BlendedMVS/Character/',
    dataset_type='blendedmvs',
    inverse_y=True,
    white_bkgd=True,
)

Development and tuning guide

Extention to new dataset

Adjusting the data related config fields to fit your camera coordinate system is recommend before implementing a new one. We provide two visualization tools for debugging.

  1. Inspect the camera and the allocated BBox.
    • Export via --export_bbox_and_cams_only {filename}.npz:
      python run.py --config configs/nerf/mic.py --export_bbox_and_cams_only cam_mic.npz
    • Visualize the result:
      python tools/vis_train.py cam_mic.npz
  2. Inspect the learned geometry after coarse optimization.
    • Export via --export_coarse_only {filename}.npz (assumed coarse_last.tar available in the train log):
      python run.py --config configs/nerf/mic.py --export_coarse_only coarse_mic.npz
    • Visualize the result:
      python tools/vis_volume.py coarse_mic.npz 0.001 --cam cam_mic.npz
Inspecting the cameras & BBox Inspecting the learned coarse volume

Speed and quality tradeoff

We have reported some ablation experiments in our paper supplementary material. Setting N_iters, N_rand, num_voxels, rgbnet_depth, rgbnet_width to larger values or setting stepsize to smaller values typically leads to better quality but need more computation. Only stepsize is tunable in testing phase, while all the other fields should remain the same as training.

Concurrent works

Plenoxels directly optimize voxel grids and achieve super-fast convergence as well. They use sparse voxel grids but require custom CUDA implementation. They use spherical harmonics to model view-dependent RGB w/o using MLPs. Some of their components could be adapted to our code in the future extension:

  1. Total variation (TV) and Cauchy sparsity regularizer.
  2. Use NDC to extend to forward-facing datas.
  3. Use MSI to extend to unbounded inward-facing 360 datas.
  4. Replace current local-feature conditioned tiny MLP with the spherical harmonic coefficients.

VaxNeRF use Visual Hull to speedup NeRF training. Only 30 lines of code modification are required based on existing NeRF code base.

Instant-ngp use hash table to store the grid values. The collision is alleviated by multi-resolution (conceptual) grids and the follow-up MLP can handle the collision effecitvely. The efficient cuda implementation improve speed as well.

Acknowledgement

The code base is origined from an awesome nerf-pytorch implementation, but it becomes very different from the code base now.

directvoxgo's People

Contributors

sunset1995 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.