GithubHelp home page GithubHelp logo

fraunhoferhhi / neural-deferred-shading Goto Github PK

View Code? Open in Web Editor NEW
241.0 6.0 21.0 25.9 MB

Multi-View Mesh Reconstruction with Neural Deferred Shading (CVPR 2022)

License: Other

Python 100.00%
3d-reconstruction computer-vision deep-learning differentiable-rendering inverse-graphics neural-rendering pytorch computer-graphics machine-learning optimization

neural-deferred-shading's Introduction

Neural Deferred Shading

alt text

Official code for the CVPR 2022 paper "Multi-View Mesh Reconstruction with Neural Deferred Shading", a method for fast multi-view reconstruction with analysis-by-synthesis.

Installation

Setup the environment and install basic requirements using conda

conda env create -f environment.yml
conda activate nds

Nvdiffrast

To install Nvdiffrast from source, run the following in the main directory:

git clone https://github.com/NVlabs/nvdiffrast.git
cd nvdiffrast
python -m pip install .

pyremesh

Option 1 (preferred): Install pyremesh from pre-built packages in the pyremesh subdirectory.

From the main directory, run:

python -m pip install --no-index --find-links ./ext/pyremesh pyremesh

Option 2: Install pyremesh from source.

Follow the instructions at https://github.com/sgsellan/botsch-kobbelt-remesher-libigl.

Reconstructing DTU Scans

Download the full dataset (2.3 GB) or two samples (300 MB) and unzip the content into the main directory. For example, after unzipping you should have the directory ./data/65_skull.

To start the reconstruction for the skull, run:

python reconstruct.py --input_dir ./data/65_skull/views --input_bbox ./data/65_skull/bbox.txt

or for a general scan:

python reconstruct.py --input_dir ./data/{SCAN-ID}_{SCAN-NAME}/views --input_bbox ./data/{SCAN-ID}_{SCAN-NAME}/bbox.txt

You will find the output meshes in the directory ./out/{SCAN-ID}_{SCAN-NAME}/meshes.

Data Conversion from IDR Format to NDS Format

The DTU dataset in the NDS format is derived from the dataset in IDR format (found here), which includes masks for a selection of objects. After downloading the dataset from IDR, you can convert it from the IDR format to the NDS format by calling the import script as:

import_dtu_from_idr.py PATH/TO/IDR/DATASET/DIRECTORY PATH/TO/OUTPUT/DIRECTORY

Reconstructing Custom Scenes

Our pipeline expects the input data in a specific structure, which you have to follow for your own scenes.

Views (--input_dir)

The main input is a folder with views, where each view consists of an RGB(A) image and the corresponding camera pose and camera intrinsics. An example folder with N views could look like this (the views do not have to be numbered and can have any file names):

📂views
├─🖼️1.png
├─📜1_k.txt
├─📜1_r.txt
├─📜1_t.txt
⋮
├─🖼️N.png
├─📜N_k.txt
├─📜N_r.txt
└─📜N_t.txt

If present, the alpha channel of the image is used as object mask.

The files ..._k.txt, ..._r.txt, and ..._t.txt contain numpy-readable arrays with the camera pose (R, t) and intrinsics (K) in the standard OpenCV format, so K and R are 3x3 matrices and t is a 3-dimensional column vector, such that

$$ \begin{pmatrix} x & y & 1 \end{pmatrix}^\top \sim \mathbf{K}(\mathbf{R}\begin{pmatrix} X & Y & Z \end{pmatrix}^\top + \mathbf{t}).$$

The image-space coordinates (x, y) are in pixels, so the top left of the image is (x, y) = (0, 0) and the bottom right is (x, y) = (width, height).

Bounding Box (--input_bbox)

Another input to our pipeline is a bounding box of the scene. The bounding box is described by a single text file, which contains a numpy-readable array of size 2x3. The first row has the world space coordinates of the minimum point and the second row those of the maximum point.

For example, if the bounding box is a cube with side length 2 centered at (0, 0, 0), then bbox.txt would simply contain

-1 -1 -1
 1  1  1

Initial Mesh (--initial_mesh)

If you would like to start your reconstruction from a custom initial mesh instead of using one of the pre-defined options, you need to provide its path. The mesh file can have any standard format (obj, ply, ...). We use trimesh for loading, so check their list of supported formats.

Customizing Loading Routines

If you want to tinker with our data loading routines to adapt them to your format, have a look at nds.utils.io.read_views() and nds.core.view.View.load().

Using the Interactive Viewer

We provide an interactive viewer based on OpenGL to inspect the reconstructed meshes and their learned appearance. Before you can launch the viewer, install the additional dependencies with

conda activate nds
pip install glfw==2.5.3 moderngl==5.6.4 pyrr==0.10.3 pyopengl==3.1.6

The pycuda dependency needs to be build from source with OpenGL support. In your preferred directory, run

git clone --recursive https://github.com/inducer/pycuda.git
cd pycuda
git checkout v2022.1

conda activate nds
python ./configure.py --cuda-enable-gl
python setup.py install

The viewer is launched by running the python script view.py, providing the mesh, the neural shader and a bounding box as input. For example, the reconstruction results for the DTU skull can be viewed by running

python .\view.py --mesh .\out\65_skull\meshes\mesh_002000.obj --shader .\out\65_skull\shaders\shader_002000.pt --bbox .\out\65_skull\bbox.txt

Profiling Mode

For the runtime experiments, we added a profiling mode to our reconstruction script that benchmarks individual parts of the code. Since the profiling mode is rather invasive, we have provided it in a separate profiling branch.

The reconstruction can be started in profiling mode by passing the --profile flag to reconstruct.py.

After reconstruction, the output directory will contain the additional file profile.json with the (hierarchical) runtimes.

Citation

If you find this code or our method useful for your academic research, please cite our paper

@InProceedings{worchel:2022:nds,
      author    = {Worchel, Markus and Diaz, Rodrigo and Hu, Weiwen and Schreer, Oliver and Feldmann, Ingo and Eisert, Peter},
      title     = {Multi-View Mesh Reconstruction with Neural Deferred Shading},
      booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      month     = {June},
      year      = {2022},
      pages     = {6187-6197}
}

Troubleshooting

CUDA Out of Memory

The reconstruction can be quite heavy on GPU memory and in our experiments we used a GPU with 24 GB.

The memory usage can be reduced by reconstructing with a smaller image resolution. Try passing --image_scale 2 or --image_scale 4 to reconstruct.py, which uses 1/2th or 1/4th of the original resolution. Expect lower memory consumption and better runtime but degraded reconstruction accuracy.

Reconstruction Hangs at Remeshing

While the remeshing step can take some time especially at higher mesh resolutions, it sometimes hangs indefinitely. This issue comes from calling the function remesh_botsch in the pyremesh package, which does not return.

For now, the reconstruction has to be aborted and restarted.

neural-deferred-shading's People

Contributors

mworchel avatar rodrigodzf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

neural-deferred-shading's Issues

Camera convention

Hi there!

Which camera convention do you use? Is Rt a c2w or w2c matrix here?

def to_gl_camera(camera, resolution, n=1000, f=5000):

    projection_matrix = Renderer.projection(fx=camera.K[0,0],
                                            fy=camera.K[1,1],
                                            cx=camera.K[0,2],
                                            cy=camera.K[1,2],
                                            n=n,
                                            f=f,
                                            width=resolution[1],
                                            height=resolution[0],
                                            device=camera.device)

    Rt = torch.eye(4, device=camera.device)
    Rt[:3, :3] = camera.R
    Rt[:3, 3] = camera.t

    gl_transform = torch.tensor([[1., 0,  0,  0],
                                [0,  1., 0,  0],
                                [0,  0, -1., 0],
                                [0,  0,  0,  1.]], device=camera.device)

    Rt = gl_transform @ Rt
    return projection_matrix @ Rt

How to determine the input bounding box size?

I want to do some testing on my data, but I'm struggling to find a working input bounding box size. Does someone have hints on how to find the right size?

Whenever I use a bounding box instead of an initial mesh, I get the following error:

Traceback (most recent call last):
  File "reconstruct.py", line 84, in <module>
    mesh_initial = generate_mesh(args.initial_mesh, views, AABB.load(args.input_bbox), device=device)
  File "/home/nep/robot_locomotion/neural-deferred-shading/nds/utils/geometry.py", line 355, in generate_mesh
    v, f = mesh_generators[generator_name]()
  File "/home/nep/robot_locomotion/neural-deferred-shading/nds/utils/geometry.py", line 350, in <lambda>
    'vh32': (lambda: compute_visual_hull(views, aabb, grid_size=32, device=device)),
  File "/home/nep/robot_locomotion/neural-deferred-shading/nds/utils/geometry.py", line 249, in compute_visual_hull
    return marching_cubes(voxels, voxels_occupancy, gradient_direction='ascent')
  File "/home/nep/robot_locomotion/neural-deferred-shading/nds/utils/geometry.py", line 203, in marching_cubes
    vertices, faces, normals, values = measure.marching_cubes_lewiner(voxel_occupancy.cpu().numpy(), level=0.5, spacing=spacing, **kwargs)
  File "/home/nep/.local/lib/python3.8/site-packages/skimage/measure/_marching_cubes_lewiner.py", line 276, in marching_cubes_lewiner
    return _marching_cubes_lewiner(volume, level, spacing, gradient_direction,
  File "/home/nep/.local/lib/python3.8/site-packages/skimage/measure/_marching_cubes_lewiner.py", line 302, in _marching_cubes_lewiner
    raise ValueError("Surface level must be within volume data range.")

A OneDrive download link to one of my datasets: https://1drv.ms/u/s!AjFJcUGSEjrpgcdYVbeeM28llLLshg?e=R1auVR
Preview:
views

Some insights

Hi, thanks for your great work!
I am here to ask for some possible insights from you. I can see NDS can reconstruct a perfect surface with lots of details on it. I wonder how the details of surface can be captured. The mask loss can capture coarse shape so I think the details are captured by the shading loss.
But why it works perfectly without any geo-prior and only with RGB images? Could you please give some insights or something related on that?

Aside from that, I also want to know if it's necessary to capture images under a fixed light? I mean if the light position is just the direction of camera view direction and move with it, will the result quite different?

Unable to install required meshzoo version (0.10.2)

Hello,

i tried to build the conda environment with the included environment.yml and ran into the following problem.
The required version of meshzoo (0.10.2) can not be found, instead conda suggests installing version 0.9.11.
A quick google search also did not turn up any information about meshzoo 0.10.2.

I have run the example (65_skull) and it seems that the project works with meshzoo 0.9.11.

Nerf data performance

Has anyone tried this method in Lego or other scenes from nerf synthetic dataset? I find its reconstruction performance is limited.

1

1

Question about getting textured meshes

Hi, @mworchel @GreenFoxLight !!
Thank you for revealing good research. I have one question for you.

Is it possible to get mesh face (vertex if not possible) color from shader?
If there is a possible process, please let me know!

Thanks

Isolating the shader from the viewer app

Hi,
Is it possible to somehow "isolate" the neural shader from the viewer app? For example, I can use any .obj viewer app to view the output mesh. However, I couldn't find any way to apply the shader and view it through a third party app (Blender for example). This is important for me because I am using a headless cloud machine for reconstruction so I don't have a display to use the viewer app.
I realize that the shader is essentially a neural network that needs to be evaluated and this may be not possible to translate into some vertex or pixel shader code (I'm not very knowledgeable about how classic shaders work) that can be used without a powerful GPU. But still I wanted to ask if you can give any insights, or give me a direction to look into.
Thanks

ValueError: Surface level must be within volume data range.

Hi. I'm trying to reconstruct an object captured via Record3D. I have extracted camera matrices from the .r3d file, added alpha channels to the captured images to act as masks, using this background removal tool, then resized images to 384*512, and also resized the K matrices accordingly. I've attached the resulting dataset below.

cup_384_512.zip

I have tried various bounding box sizes. I tried calculating it from the camera positions. One issue is that all cameras are positioned in front of the object so this method may not work. Then, just to test it out, I tried entering increasingly large values as bounding box, from [[-0.5, -0.5, -0.5],[0.5, 0.5, 0.5]] to [[-1000, -1000, -1000],[1000, 1000, 1000]], but nothing seems to work, I keep getting the ValueError: Surface level must be within volume data range. error. Any idea what am I doing wrong? I can add more details about how I generated the matrices if necessary.

Number of Multi View Images Required

Hi,

This is great work!

How many multi-view images are required to generate an accurate mesh?

Have you tried using Stable Diffusion to create those multi-views?

Thanks!

Issue Reproducing Evaluation Metric

Hi,

Thanks for sharing the code for this project!
I'm trying to reproduce the Chamfer metric scores using the DTU evaluation script, however, the denormalized meshes produced are not aligned for comparison to the ground truth point clouds. Do you know where I can access the transformations necessary to realign them for evaluation?

Many thanks

Details about optimization

Hi, i wanna know during the training and optimization process, are there the positions of the vertices adjusted? or each time it just fit a new mesh with new vertices and triangles?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.