GithubHelp home page GithubHelp logo

ashawkey / segment-anything-nerf Goto Github PK

View Code? Open in Web Editor NEW
284.0 12.0 12.0 61 KB

Segment-anything interactively in NeRF.

License: Apache License 2.0

Python 68.70% C++ 0.39% Cuda 28.73% C 1.00% Shell 1.18%

segment-anything-nerf's Introduction

Segment-Anything NeRF

πŸŽ‰πŸŽ‰πŸŽ‰ Welcome to the Segment-Anything NeRF GitHub repository! πŸŽ‰πŸŽ‰πŸŽ‰

Segment-Anything NeRF is a novel approach for performing segmentation in a Neural Radiance Fields (NeRF) framework. Our approach renders the semantic feature of a certain view directly, eliminating the need for the forward process of the backbone of the segmentation model. By leveraging the light-weight SAM decoder, we can achieve interactive 3D-consistent segmentation at 5 FPS (rendering 512x512 image) on a V100.

interactive_seg.mp4
open_vocabulary_seg.mp4

News

[2023/4/29] Add a demo of Open-Vocabulary Segmentation in NeRF based on X-Decoder.

Key features

  • Learn 3D consistent SAM backbone features along with RGB and density, so we can bypass the ViT-Huge encoder and use ray marching to produce SAM features efficiently.
  • Online distillation with camera augmentation and caching for robust and fast training (~1 hour per scene for two stages on a V100).

NOTE: This is a work in progress, more demonstration (e.g., open-vocabulary segmentation) and a technical report is on the way!

Install

git clone https://github.com/ashawkey/Segment-Anything-NeRF.git
cd Segment-Anything-NeRF

# download SAM ckpt
mkdir pretrained && cd pretrained
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

Install with pip

pip install -r requirements.txt

Build extension (optional)

By default, we use load to build the extension at runtime. However, this may be inconvenient sometimes. Therefore, we also provide the setup.py to build each extension:

# install all extension modules
bash scripts/install_ext.sh

# if you want to install manually, here is an example:
cd gridencoder
python setup.py build_ext --inplace # build ext only, do not install (only can be used in the parent directory)
pip install . # install to python path (you still need the gridencoder/ folder, since this only install the built extension.)

Tested environments

  • Ubuntu 22 with torch 1.12 & CUDA 11.6 on a V100.

Usage

We majorly support COLMAP dataset like Mip-NeRF 360. Please download and put them under ./data.

For custom datasets:

# prepare your video or images under /data/custom, and run colmap (assumed installed):
python scripts/colmap2nerf.py --video ./data/custom/video.mp4 --run_colmap # if use video
python scripts/colmap2nerf.py --images ./data/custom/images/ --run_colmap # if use images

First time running will take some time to compile the CUDA extensions.

### train rgb
python main.py data/garden/ --workspace trial_garden --enable_cam_center --downscale 4

### train sam features
# --with_sam: enable sam prediction
# --init_ckpt: specify the latest checkpoint from rgb training
python main.py data/garden/ --workspace trial2_garden --enable_cam_center --downscale 4 --with_sam --init_ckpt trial_garden/checkpoints/ngp.pth --iters 5000

### test sam (interactive GUI, recommended!)
# left drag & middle drag & wheel scroll: move camera
# right click: add/remove point marker
# NOTE: only square images are supported for now!
python main.py data/garden/ --workspace trial2_garden --enable_cam_center --downscale 4 --with_sam --init_ckpt trial_garden/checkpoints/ngp.pth --test --gui

# test sam (without GUI, random points query)
python main.py data/garden/ --workspace trial2_garden --enable_cam_center --downscale 4 --with_sam --init_ckpt trial_garden/checkpoints/ngp.pth --test

Please check the scripts directory for more examples on common datasets, and check main.py for all options.

Acknowledgement

  • Segment-Anything:
    @article{kirillov2023segany,
        title={Segment Anything},
        author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
        journal={arXiv:2304.02643},
        year={2023}
    }
    
  • X-Decoder:
    @article{zou2022generalized,
      title={Generalized Decoding for Pixel, Image, and Language},
      author={Zou, Xueyan and Dou, Zi-Yi and Yang, Jianwei and Gan, Zhe and Li, Linjie and Li, Chunyuan and Dai, Xiyang and Behl, Harkirat and Wang, Jianfeng and Yuan, Lu and others},
      journal={arXiv preprint arXiv:2212.11270},
      year={2022}
    }
    

Citation

If you find this work useful, a citation will be appreciated via:

@misc{segment-anything-nerf,
    Author = {Jiaxiang Tang and Xiaokang Chen and Diwen Wan and Jingbo Wang and Gang Zeng},
    Year = {2023},
    Note = {https://github.com/ashawkey/Segment-Anything-NeRF},
    Title = {Segment-Anything NeRF}
}

segment-anything-nerf's People

Contributors

ashawkey avatar charlescxk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

segment-anything-nerf's Issues

ninja: build stopped: subcommand failed

Environment:: Ubuntu 22.04, Torch 1.12.0+cu116, ninja 1.11.1, installed with pip
DataSet: 360_v2.zip
Command: python main.py data/garden/ --workspace trial_garden --enable_cam_center --downscale 4
Full Error message:
Traceback (most recent call last):
File "/home/shike/Segment-Anything-NeRF/gridencoder/grid.py", line 10, in
import _gridencoder as _backend
ModuleNotFoundError: No module named '_gridencoder'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/shike/SAM-NeRF-env/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1808, in _run_ninja_build
subprocess.run(
File "/home/zhangyh/ENTER/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/shike/Segment-Anything-NeRF/main.py", line 99, in
model = NeRFNetwork(opt).to(device)
File "/home/shike/Segment-Anything-NeRF/nerf/network.py", line 94, in init
self.grid, self.grid_in_dim = get_encoder("hashgrid", input_dim=3, level_dim=2, num_levels=16, log2_hashmap_size=19, desired_resolution=2048 * self.bound)
File "/home/shike/Segment-Anything-NeRF/encoding.py", line 69, in get_encoder
from gridencoder import GridEncoder
File "/home/shike/Segment-Anything-NeRF/gridencoder/init.py", line 1, in
from .grid import GridEncoder
File "/home/shike/Segment-Anything-NeRF/gridencoder/grid.py", line 12, in
from .backend import backend
File "/home/shike/Segment-Anything-NeRF/gridencoder/backend.py", line 31, in
backend = load(name='grid_encoder',
File "/home/shike/SAM-NeRF-env/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1202, in load
return jit_compile(
File "/home/shike/SAM-NeRF-env/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1425, in jit_compile
write_ninja_file_and_build_library(
File "/home/shike/SAM-NeRF-env/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1537, in write_ninja_file_and_build_library
run_ninja_build(
File "/home/shike/SAM-NeRF-env/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1824, in run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'grid_encoder': [1/2] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=grid_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1013" -isystem /home/shike/SAM-NeRF-env/lib/python3.9/site-packages/torch/include -isystem /home/shike/SAM-NeRF-env/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/shike/SAM-NeRF-env/lib/python3.9/site-packages/torch/include/TH -isystem /home/shike/SAM-NeRF-env/lib/python3.9/site-packages/torch/include/THC -isystem /home/zhangyh/ENTER/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS
--expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -std=c++14 -U__CUDA_NO_HALF_OPERATORS
-U__CUDA_NO_HALF_CONVERSIONS
-U__CUDA_NO_HALF2_OPERATORS
-c /home/shike/Segment-Anything-NeRF/gridencoder/src/gridencoder.cu -o gridencoder.cuda.o
FAILED: gridencoder.cuda.o
/usr/bin/nvcc -DTORCH_EXTENSION_NAME=grid_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1013" -isystem /home/shike/SAM-NeRF-env/lib/python3.9/site-packages/torch/include -isystem /home/shike/SAM-NeRF-env/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/shike/SAM-NeRF-env/lib/python3.9/site-packages/torch/include/TH -isystem /home/shike/SAM-NeRF-env/lib/python3.9/site-packages/torch/include/THC -isystem /home/zhangyh/ENTER/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -c /home/shike/Segment-Anything-NeRF/gridencoder/src/gridencoder.cu -o gridencoder.cuda.o
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with β€˜...’:
435 | function(_Functor&& __f)
|
^
/usr/include/c++/11/bits/std_function.h:435:145: note: β€˜_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with β€˜...’:
530 | operator=(_Functor&& __f)
|
^
/usr/include/c++/11/bits/std_function.h:530:146: note: β€˜_ArgTypes’
ninja: build stopped: subcommand failed.

Typo in readme? Or am I misunderstanding something....

Thanks for this nice project!

In the readme Usage section, you save the checkpoints of the phase 2 (SAM features) training to trial2_garden, as follows:

python main.py data/garden/ --workspace trial2_garden --enable_cam_center --downscale 4 --with_sam --init_ckpt trial_garden/checkpoints/ngp.pth --iters 5000

However then for testing, you load from trial_garden:

python main.py data/garden/ --workspace trial2_garden --enable_cam_center --downscale 4 --with_sam --init_ckpt trial_garden/checkpoints/ngp.pth --test --gui

Is this a typo? Shouldn't it be --init_ckpt trial2_garden/checkpoints/ngp.pth ? Or maybe I am misunderstanding how the code is working...

Thanks again!

The training steps may be not clear, got runtime error during with_sam stage

I tried the training steps mentioned in ReadMe on my own data.
`### train rgb
python main.py my_data --workspace exp_mydata --enable_cam_center

train sam

python main.py my_data --workspace exp_mydata --enable_cam_center --with_sam --init_ckpt exp_mydata/checkpoints/ngp.pth --iters 5000`

I got runtime error during the second stage training, as well as some warnings:

issue

It seems the second stage did not start training, also miss some keys when loading the init_ckpt. Could you please figure it out? Thanks.

Segmented model extraction

Hi,

First of all congratulation on the awesome paper!
I was trying out your code and go it to work till the segmentation part. I have few questions
1.How do you extract the segmented meshl from the gui?
2.Where can I find the prompt based segmentation in the gui?

Thank you

Curious about the benefit to use online distillation for sam predict?

Hello author! I would like to ask for your thoughts when implementing the code, why gt_samvit use rendered image for prediction, instead of using gt_image directly as input to sam?

with torch.no_grad():
if use_cache:
gt_samvit = data['gt_samvit']
else:
# render high-res RGB
outputs = self.model.render(rays_o, rays_d, staged=True, index=index, bg_color=bg_color, perturb=True, cam_near_far=cam_near_far, update_proposal=False, return_feats=0)
pred_rgb = outputs['image'].reshape(H, W, 3)
# encode SAM ground truth
image = (pred_rgb.detach().cpu().numpy() * 255).astype(np.uint8)
self.sam_predictor.set_image(image)
gt_samvit = self.sam_predictor.features # [1, 256, 64, 64]
# write to cache
if self.opt.cache_size > 0:
data['gt_samvit'] = gt_samvit
self.cache.insert(data)

I noticed that when generating gt_sam for supervised network learning semantics, it is using the rendered image as input to sam_predictor, why not just use the GT_image input to sam to get the gt_label? Is there any benefit to this self online distillation method?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.