jryanshue / nfd Goto Github PK

Official codebase for the paper "3D Neural Field Generation using Triplane Diffusion"

License: MIT License

Shell 0.25% Python 10.64% Cuda 0.67% C++ 0.09% Dockerfile 0.01% C 87.95% Makefile 0.04% HTML 0.22% CSS 0.03% Jupyter Notebook 0.01% Cython 0.10%

nfd's Introduction

NFD

This is the official codebase for the paper "3D Neural Field Generation using Triplane Diffusion."

3D Neural Field Generation using Triplane Diffusion
J. Ryan Shue*, Eric Ryan Chan*, Ryan Po*, Zachary Ankner*, Jiajun Wu, and Gordon Wetzstein
* equal contribution

https://jryanshue.com/nfd/

Abstract: Diffusion models have emerged as the state-of-the-art for image generation, among other tasks. Here, we present an efficient diffusion-based model for 3D-aware generation of neural fields. Our approach pre-processes training data, such as ShapeNet meshes, by converting them to continuous occupancy fields and factoring them into a set of axis-aligned triplane feature representations. Thus, our 3D training scenes are all represented by 2D feature planes, and we can directly train existing 2D diffusion models on these representations to generate 3D neural fields with high quality and diversity, outperforming alternative approaches to 3D-aware generation. Our approach requires essential modifications to existing triplane factorization pipelines to make the resulting features easy to learn for the diffusion model. We demonstrate state-of-the-art results on 3D generation on several object classes from ShapeNet.

Setup

Run:

cd nfd
conda env create -f environment.yml
conda activate nfd
pip install -e .

Download pretrained models:

source download_models.sh

Sampling from pretrained models

To run the models from our paper:

cd nfd
conda activate nfd

Cars:

python gen_samples.py --ddpm_ckpt models/cars/ddpm_cars_ckpts/ema_0.9999_405000.pt \
    --decoder_ckpt models/cars/car_decoder.pt --stats_dir models/cars/statistics/cars_triplanes_stats \
    --save_dir samples/cars_samples --num_samples 8 --num_steps 250 --shape_resolution 256

Chairs:

python gen_samples.py --ddpm_ckpt models/chairs/ddpm_chairs_ckpts/ema_0.9999_200000.pt \
    --decoder_ckpt models/chairs/chair_decoder.pt --stats_dir models/chairs/statistics/chairs_triplanes_stats \
    --save_dir samples/chairs_samples --num_samples 8 --num_steps 250 --shape_resolution 256

Planes:

python gen_samples.py --ddpm_ckpt models/planes/ddpm_planes_ckpts/ema_0.9999_220000.pt \
    --decoder_ckpt models/planes/plane_decoder.pt --stats_dir models/planes/statistics/planes_triplanes_stats \
    --save_dir samples/planes_samples --num_samples 8 --num_steps 250 --shape_resolution 256

Training

Coming soon!

nfd's People

Contributors

Stargazers

Watchers

Forkers

standardgalactic jiangzt ggzhang0071 superbia-zyb hiyyg lzw27 twonp168 kwinderic justinfungi bruinxiong

nfd's Issues

Looking forward to the training code

Hi,
Thanks for sharing your great work! I wonder when will you release the training code? I would like to try training NDF.

Best,

Dataset splits used in paper

The appendix of the paper mentions the size of the datasets used. 4045 for planes for eg. But there is no mention of the train/val/test splits. Was the diffusion model trained on all of the data?

How to train.

Thanks for sharing your great work!
Could you give a example or command to train the NDF. Could you have any advice? I notice that you have answered here for the training script.
Thanks a lot! I really would like to try training NDF.

Which is better between Triplane representation(NFD) and Voxel Grid representation(SDFusion)？

Which is better between Triplane representation(NFD) and Voxel Grid representation(SDFusion)？
What are Advantages and disadvantages of different 3D representation methods?

Thanks.

Question about generating shapenet data

Hi,

Thanks for sharing the code! I wonder where did you convert meshes from the ShapeNet dataset into watertight meshes in your code base

gen_sample import

Hi,

Thank you sharing the pretrained weights for us!
When I try to run gen_sample.py as instructed, it seems that the line importing
"import neural_field_diffusion.scripts.image_sample as image_sample" takes very long and I wonder if it is the same case from your end?

Thanks in advance

Looking forward to code release

Nice work bringing diffusion models into 3D generation!

I am wondering when will you release your code? Do you have any plans?

Thanks!

Looking forward to your training code

Hi @JRyanShue , nice work! Do you have any plans to release the training code?

Looking forward to the training codes release

Thanks for your brilliant work. Looking forward to the training codes release!

small gradient when training

Hi~ thanks for the great work

I use trainable triplane and decoder to fitting objects. I use multi-view images instead of occupancy, and mse loss instead of BCE loss. The gradient for triplane seems too small , like 1e-4 even when I use 'sum' option in MSE loss.

Did you meet the same question? It's caused by little objects trained for decoder or other reasons?

Looking forward to your reply!!

pyrender.constants.RenderFlags.FACE_NORMALS flag while rendering images for FID calculation

Hi. Thanks for your great work!

I'm trying to compute FID score based on your codes of rendering fid images from given mesh.

Checking rendered images, I observed the rendered images looks like below.

And I found the normals are rendered because of the code below.
(Specifically, because of the flag pyrender.constants.RenderFlags.FACE_NORMALS)

https://github.com/JRyanShue/NFD/blob/main/nfd/neural_field_diffusion/metrics/render_utils.py#L183

What I want to ask is, why we need normals while computing FID scores.
Is it right to get the images as above and compute FID score with them?

Thanks

Training Code

Hi, thank you for your good job.

I plan to train your model on some other categories such as vessels. I wonder if you are still going to release the training code. I saw some discussion posted four months ago.

Thank you.

Visualizing triplane features

How did you visualize triplane features as in Fig. 7 (Ablation over regularized triplanes) of the paper? Was it just an average over the feature channel? Thanks!

About triplane

I notice that you use nn.embedding to represent triplane for multi-scene fitting , but nn.ParameterList for single-scene fitting, is there some reason for that such as save memory for multi-scene fitting?