bennyguo / instant-nsr-pl Goto Github PK

Neural Surface reconstruction based on Instant-NGP. Efficient and customizable boilerplate for your research projects. Train NeuS in 10min!

License: MIT License

Python 100.00%

instant-nsr-pl's People

Contributors

Stargazers

Watchers

Forkers

mmarcoo sunshineywz123 zhixiongzuo pablovela5620 arterms yukangcao dfqytcom joon4323 2017devil chikayan ldyang694 universewill cwchenwang xuxumiao777 finninmunich bwzhao kuui24 fangtiancheng ge35tay marissachanlatte rspezialetti bruinxiong linzhenyuyuchen aipenguin mvwouden riga27527 igzat1no qianqian121 fractal128 ariajamili yyeboah wangyida xu-jiayou yec22 pooncs jackzhousz yuechengithub dexin-qi chengyi-xun jiaxililearn martellz amughrabi mshnb xyp8023 congdc00 cubantonystark yuancaimaiyi kestrelm b32xn0 zbqq kite-hz marcostrinca abhishekmonogram 1172100122 xuyaojian123 camenduru alexandor91 mobiuslqm nehamjain10 liuxiaozhu01 vcadillog xuanlinli17 zzliekkas thumbmaswalker calvinren iamleon121 summertight pawtingdev serdarhelli hirotong panziqiai shuhezhang-mumc kacperkan shimomurakei eddie-cui qianfanshen fenghuayumo

instant-nsr-pl's Issues

Is it possible to share the config.yml for models?

Hi,

I'm wondering if the showing case for the chair is using the same configuration as the neus-blender.yml you provided in the repo, and if it is possible to share the training params for several scenes as a guidance of hyper-params fine-tuning. Thanks in advance!

Currently, the chair model I trained using the provided .yml cannot reproduce the mesh as good as the one in your show case. Here is my test result (256 resolution mesh).

How to extract high resolution mesh

I was wondering if it is feasible to extract a mesh with very high resolution using the Marching Cube algorithm, say 2048**3? In theory, it shouldn't require excessive memory, but we seem to be encountering difficulties with both the CPU and GPU, resulting in cuda out of memory errors. Could you help us with this problem?

OOM during backward

I tried to train DTU dataset using this implementation, but GPU memory exceed during backward after a few steps.

If I delete the rgb loss, than OOM disappear. The shape of tensor: rgb_ground_truth looks correct.

Any idea why this happened? Thanks in advance!

/lib/python3.10/site-packages/torch/autograd/__init__.py", line 197, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.54 GiB (GPU 0; 39.45 GiB total capacity; 21.52 GiB already allocated; 1.53 GiB free; 27.09 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

[error] tiny-cuda-nn/cutlass_matmul.h:332 status failed with error Error Internal

Hi guys, thanks for the excellent work!

When I try the training code, I encountered the following error after Epoch 0 finished:

Global seed set to 42
Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
Global seed set to 42
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name  | Type      | Params
------------------------------------
0 | model | NeuSModel | 12.6 M
------------------------------------
12.6 M    Trainable params
0         Non-trainable params
12.6 M    Total params
25.220    Total estimated model params size (MB)
Epoch 0: : 10000it [06:59, 23.83it/s, loss=0.000965, train/inv_s=1.47e+3, train/num_rays=8192.0]                                                                                                          Traceback (most recent call last):                                                                                                                                                    | 0/2 [00:00<?, ?it/s]
  File "/home/zhouyiren/code/instant-nsr-pl/launch.py", line 123, in <module>
    main()
  File "/home/zhouyiren/code/instant-nsr-pl/launch.py", line 112, in main
    trainer.fit(system, datamodule=dm)
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit
    self._call_and_handle_interrupt(
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 648, in _call_and_handle_interrupt
    return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
    return function(*args, **kwargs)
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _run
    results = self._run_stage()
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1252, in _run_stage
    return self._run_train()
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1283, in _run_train
    self.fit_loop.run()
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 271, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 201, in run
    self.on_advance_end()
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 241, in on_advance_end
    self._run_validation()
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 299, in _run_validation
    self.val_loop.run()
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance
    dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 143, in advance
    output = self._evaluation_step(**kwargs)
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 240, in _evaluation_step
    output = self.trainer._call_strategy_hook(hook_name, *kwargs.values())
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1704, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/strategies/ddp.py", line 358, in validation_step
    return self.model(*args, **kwargs)
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1040, in forward
    output = self._run_ddp_forward(*inputs, **kwargs)
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1000, in _run_ddp_forward
    return module_to_run(*inputs[0], **kwargs[0])
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/pytorch_lightning/overrides/base.py", line 90, in forward
    return self.module.validation_step(*inputs, **kwargs)
  File "/home/zhouyiren/code/instant-nsr-pl/systems/neus.py", line 137, in validation_step
    out = self(batch)
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zhouyiren/code/instant-nsr-pl/systems/neus.py", line 46, in forward
    return self.model(batch['rays'])
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zhouyiren/code/instant-nsr-pl/models/neus.py", line 167, in forward
    out = chunk_batch(self.forward_, self.config.ray_chunk, rays)
  File "/home/zhouyiren/code/instant-nsr-pl/models/utils.py", line 22, in chunk_batch
    out_chunk = func(*[arg[i:i+chunk_size] if isinstance(arg, torch.Tensor) else arg for arg in args], **kwargs)
  File "/home/zhouyiren/code/instant-nsr-pl/models/neus.py", line 137, in forward_
    rgb, opacity, depth = rendering(
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/nerfacc/vol_rendering.py", line 115, in rendering
    rgbs, alphas = rgb_alpha_fn(t_starts, t_ends, ray_indices.long())
  File "/home/zhouyiren/code/instant-nsr-pl/models/neus.py", line 121, in rgb_alpha_fn
    rgb = self.texture(feature, t_dirs, normal)
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zhouyiren/code/instant-nsr-pl/models/texture.py", line 26, in forward
    color = self.network(network_inp).view(*features.shape[:-1], self.n_output_dims).float()
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/tinycudann/modules.py", line 145, in forward
    output = _module_function.apply(
  File "/home/zhouyiren/anaconda3/envs/instant-nsr/lib/python3.10/site-packages/tinycudann/modules.py", line 57, in forward
    native_ctx, output = native_tcnn_module.fwd(input, params)
RuntimeError: /tmp/pip-req-build-8gaiqjyg/include/tiny-cuda-nn/cutlass_matmul.h:332 status failed with error Error Internal
Epoch 0: : 10000it [07:05, 23.49it/s, loss=0.000965, train/inv_s=1.47e+3, train/num_rays=8192.0]

About Defining the Region of Interest in COLMAP format data

I follow the instruction and train the model on COLMAP format data but got this result.

I note that the preprocess in NeuS(https://github.com/Totoro97/NeuS/tree/main/preprocess_custom_data) has another step:

So how can I define the region of interest in this repo?

Program freezes when training

Hi, thanks for the great work.

As I was trying to train the synthesed drums data on your framework (for the first time), the program freezes like this:

I understand that some scripts would be compiled first as the code is run for the first time, but it has been like more than 2 hours still fronzen.
Any advise would be very appreciated. Thanks in advance.

ImportError: /home/anaconda3/envs/instant-nsr-pl/lib/python3.8/site-packages/tinycudann_bindings_61/_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor6deviceEv

I have this error, according to the readme operation down, google and issue did not find a similar problem, you have encountered there?

pytorch error

Hi, bennyguo
I got this error when implementing your code.

ImportError: /home/eason/anaconda3/envs/nerf/lib/python3.10/site-packages/tinycudann_bindings/_86_C.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
system version:
linux 20.04
cuda 11.7
python 3.10
pytorch 1.13.1

Do you have any ideas to fix it? Thanks very much.

The instant-nsr-pl neus version of mesh is less effective than the native neus version? Is it a problem with my operation?

Textured mesh

Hello, thanks for your great code! I am using your code to get textured meshes. I have low PSNR and the rendered RGB images have very high quality. I was able to get a very good color for my mesh with your previous version but the color of the current version seems to be wrong. Could be related to the alphas that you are calculating? Should I multiply the RGB values by a parameter to get the correct colors (similar to the rendered RGB)?

ZeroDivisionError: division by zero error

Hi, I run both neus and nerf, and I got the same ZeroDivisionError in systems\neus.py and systems\nerf.py.
Here's the cmd output for running neus:
Global seed set to 42 Using 16bit native Automatic Mixed Precision (AMP) GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

| Name | Type | Params

0 | model | NeuSModel | 12.6 M

12.6 M Trainable params
0 Non-trainable params
12.6 M Total params
25.221 Total estimated model params size (MB)
Traceback (most recent call last):
File "G:\GitHub\instant-nsr-pl\launch.py", line 123, in
main()
File "G:\GitHub\instant-nsr-pl\launch.py", line 112, in main
trainer.fit(system, datamodule=dm)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 696, in fit
self._call_and_handle_interrupt(
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 650, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 735, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1166, in _run
results = self._run_stage()
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1252, in _run_stage
return self._run_train()
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1283, in _run_train
self.fit_loop.run()
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\loops\loop.py", line 200, in run
self.advance(*args, **kwargs)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 271, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\loops\loop.py", line 200, in run
self.advance(*args, **kwargs)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 203, in advance
batch_output = self.batch_loop.run(kwargs)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\loops\loop.py", line 200, in run
self.advance(*args, **kwargs)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\loops\batch\training_batch_loop.py", line 87, in advance
outputs = self.optimizer_loop.run(optimizers, kwargs)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\loops\loop.py", line 200, in run
self.advance(*args, **kwargs)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 201, in advance
result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 248, in _run_optimization
self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 358, in _optimizer_step
self.trainer._call_lightning_module_hook(
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1550, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\core\module.py", line 1705, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\core\optimizer.py", line 168, in step
step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\strategies\strategy.py", line 216, in optimizer_step
return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\plugins\precision\native_amp.py", line 85, in optimizer_step
closure_result = closure()
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 146, in call
self._result = self.closure(*args, **kwargs)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 132, in closure
step_output = self._step_fn()
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 407, in _training_step
training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1704, in _call_strategy_hook
output = fn(*args, **kwargs)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\strategies\dp.py", line 134, in training_step
return self.model(*args, **kwargs)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\parallel\data_parallel.py", line 169, in forward
return self.module(*inputs[0], **kwargs[0])
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\overrides\data_parallel.py", line 65, in forward
output = super().forward(*inputs, **kwargs)
File "C:\Users\halbe\AppData\Local\Programs\Python\Python310\lib\site-packages\pytorch_lightning\overrides\base.py", line 79, in forward
output = self.module.training_step(*inputs, **kwargs)
File "G:\GitHub\instant-nsr-pl\systems\neus.py", line 86, in training_step
train_num_rays = int(self.train_num_rays * (self.train_num_samples / out['num_samples'].sum().item()))
ZeroDivisionError: division by zero
Epoch 0: : 0it [01:22, ?it/s]
[W ..\torch\csrc\CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: driver shutting down (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)
[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)`

Performance on geometry reconstruction

Hi @bennyguo, thanks for your great code! I wonder have you evaluated this repo on the geometry construction task and make a comparison to the original neus or Instant-NSR?

How can I export textured mesh?

The exported .obj file contains only the shape, not the corresponding material and color.

how to save the trained model and the hashmap used in Hashencoding

Hi, bennyguo
I hopes to find a simple way to save the trained model and the hashmap. Do you know how to do that?

Error when running neus

Thank you for the great job.
I can run nerf but cannot run neus:
The terminal output is as follows:
Epoch 0: : 0it [00:00, ?it/s]Traceback (most recent call last): File "launch.py", line 123, in <module> main() File "launch.py", line 112, in main trainer.fit(system, datamodule=dm) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit self._call_and_handle_interrupt( File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 648, in _call_and_handle_interrupt return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch return function(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _run results = self._run_stage() File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1252, in _run_stage return self._run_train() File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1283, in _run_train self.fit_loop.run() File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run self.advance(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 271, in advance self._outputs = self.epoch_loop.run(self._data_fetcher) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run self.advance(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 203, in advance batch_output = self.batch_loop.run(kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run self.advance(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 87, in advance outputs = self.optimizer_loop.run(optimizers, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run self.advance(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 201, in advance result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position]) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 248, in _run_optimization self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 358, in _optimizer_step self.trainer._call_lightning_module_hook( File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1550, in _call_lightning_module_hook output = fn(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/core/module.py", line 1705, in optimizer_step optimizer.step(closure=optimizer_closure) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 289, in optimizer_step optimizer_output = super().optimizer_step(optimizer, opt_idx, closure, model, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 216, in optimizer_step return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/native_amp.py", line 85, in optimizer_step closure_result = closure() File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 146, in __call__ self._result = self.closure(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 141, in closure self._backward_fn(step_output.closure_loss) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 304, in backward_fn self.trainer._call_strategy_hook("backward", loss, optimizer, opt_idx) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1704, in _call_strategy_hook output = fn(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 191, in backward self.precision_plugin.backward(self.lightning_module, closure_loss, optimizer, optimizer_idx, *args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 80, in backward model.backward(closure_loss, optimizer, optimizer_idx, *args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/core/module.py", line 1450, in backward loss.backward(*args, **kwargs) File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: trying to differentiate twice a function that was marked with @once_differentiable Exception ignored in: <function tqdm.__del__ at 0x7fb7ef681af0> Traceback (most recent call last): File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1152, in __del__ File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1306, in close File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1499, in display File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1155, in __str__ File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1457, in format_dict TypeError: cannot unpack non-iterable NoneType object.
Thank you in advance. ^ ^

PSNR degrades when set sdf_bias=-0.5

Hi, when I set sdf_bias from 0 to -0.5 according to your latest updates, the psnr degrades. Could you explain the reason?

Want to be linked to nerfacc repo?

Awesome job Yuanchen! I was thinking of doing an example in nerfacc with NeuS. But you already got it worked even with pytorch-lightning!!

Do you want to be linked to nerfacc repo for showcase?

BTW I noticed you are relying on alpha marching & rendering, which is currently from your fork of nerfacc. I just add this support in nerfacc>=0.2.2, along with relaxing the pytorch requirement. Check it out if interested! nerfstudio-project/nerfacc#94

Problem with psnr and ssim

Hi @bennyguo，thank you for your excellent work. During the experiment, I found that the psnr and ssim is very low, even lower than nues, and then I found that the perspective of the rendered image and the original image are different, and the two images are not aligned.

"materials" scene cannot extract the mesh out after training on NeuS

Hi,

Thanks for your great work!

I just noticed that during test time, the "materials" scene cannot export the mesh aftering training on NeuS. Here is the error message:

File "/local-scratch/localhome/***/models/geometry.py", line 90, in isosurface
    vmin, vmax = mesh_coarse['v_pos'].amin(dim=0), mesh_coarse['v_pos'].amax(dim=0)
IndexError: amin(): Expected reduction dim 0 to have non-zero size.

Without masks, unable to extract mesh and low quality NeRF with DTU dataset (scan122 was tested)

I am testing NeuS with the DTU dataset (scan122 particularly). I am only using its RGB images without masks and computed new camera poses using COLMAP. However, it is not able to extract mesh and the NeRF is in low quality. It converged fine with masks on though. All other configs are defaults. What could I change to improve quality and extract the NeuS mesh without masks as input? Thanks

Problem in COLMAP pipeline

I have been trying out some scenes with the COLMAP-based pipeline you provide in the repo and I've been running into the issue of the model learning only the input views and not being able to generalize at all.
I first suspected that this was an issue with my data, but even on the synthetic NeRF datasets that work very well with that respective pipeline (e.g. the lego bulldozer or the chair), I face this same issue.

This might look like this:

Synthetic NeRF version (trained without the mask loss by setting the respective lambda to zero)
COLMAP version (view from the training dataset - the views not in the train dataset show nothing useful)

One thing that I noticed is that the generated masks tend to be inverted compared to what one would expect - one extreme example for this is the drums scene (image from the training dataset again):

Generally, I have experienced these issues with a wide variety of scenes (both from synthetic-NeRF & various real-world samples such as the dog used in the original NeuS publication) and with both your COLMAP preprocessing script & our internal one. I'm using the latest state of the repo with default configs except for the fused MLP being replaced with the vanilla one and the image resolutions being adjusted if needed.

This issue could also be related to #17, as some (but not all) of the resulting meshes I get look quite similar.

Have you been able to get some good samples out of that pipeline in your testing?
It would be awesome if you had some insights into this issue.

From my own dataset, I can't replicate your results, what should I think about this?

Origin of magic constants within code

Hi,
Where does 1.732 * 2 come from from the below line?

instant-nsr-pl/models/neus.py

Line 58 in 6ab0c3d

 self.render_step_size = 1.732 * 2 * self.config.radius / self.config.num_samples_per_ray 

I'm guessing that this is sqrt(3) * 2, but how was it derived?

Training NeuS with large scale scenes

Thanks for your nice work, my reconstruction targets are real outdoor scenes represented in scale of about 10 to 50 meters for each dimension. Nerf could be trained properly with hash encoding, but NeuS won't be trained correctly, which leads to train/inv_s=5 when model is converged. I have changed model.radius to 10 so that the ray could go through the correct space.

Should I change other hyper-parameters accordingly to get the NeuS model fit for large scenes? Should I increase sphere_init_radius accordingly as well?

RuntimeError: CUDA out of memory，How can I adjust the parameters to video memory consumption?

When I train the model with 2160*3840 images I get the following error, how can I tweak the parameters to reduce the video memory consumption

Run on windows

Hi there, thank you for sharing, good work.

I want to run the code on windows and it says NCCL error.

So i changed the backend from NCCL to GLOO, and an invalid scalar type error pop up.

Do you have any idea why? What is your environment running the code? Mine is python3.10 Cudatoolkit11.3 with torch 1.12.1+cu113

Appreciate!

Could you share some pretrained models?

Hello, thanks for the work.
I cloned the code and run it on my PC. Since my graphics card is GTX1660S, I have to reduce 'train_num_rays' to 128 in case OOM error occurs(other parameters remain default). Then I extract the mesh and found it has some noise and holes. I thought maybe it's hard for my card to train a good result:(
So, would you mind share some pretrained models of, for example, nerf synthetic datasets? Or share some ideas of tuning those parameters so that a 6GB card can get good result? Thanks!

NeuRIS extension

Thanks so much for this project!
Any chance to add an implementation of NeuRIS?
https://jiepengwang.github.io/NeuRIS/
https://arxiv.org/pdf/2206.13597.pdf

The basic premise is that one incorporates a surface normal prior to generating higher fidelity meshes. The currently available implementation https://github.com/jiepengwang/NeuRIS takes many hours to train (around 9 on an A6000) so using the optimization you've added here would be amazing.

I would love to know your thoughts!

Question about spherical initialzation and training

Hi, thanks for sharing your code.

I've been trying out several things and found something weird.
When using sphere initialization of the vanilla MLP, I expected the initial shape to be a sphere.
If you render the outputs of the initialized model by setting val_check_interval=1, the images (rgb, normal, depth) indeed resemble a sphere.

However, the marching cubes fail with the following error message

vmin, vmax = mesh_coarse['v_pos'].amin(dim=0), mesh_coarse['v_pos'].amax(dim=0)
IndexError: amin(): Expected reduction dim 0 to have non-zero size.

I guess this means that the aabb cube is empty.

When I looked into the code, I found that the VanillaMLP does not initialize the constants of the layers, which is different from the initialization of the paper "SAL: Sign Agnostic Learning of Shapes from Raw Data".

I think the make_linear function should be as follows

def make_linear(self, dim_in, dim_out, bias, is_first, is_last):
    layer = nn.Linear(dim_in, dim_out)
    if self.sphere_init:
        if is_last:
            torch.nn.init.constant_(layer.bias, -bias)
            torch.nn.init.normal_(layer.weight, mean=math.sqrt(math.pi) / math.sqrt(dim_in), std=0.0001)
        elif is_first:
            torch.nn.init.constant_(layer.bias, 0.0)
            torch.nn.init.constant_(layer.weight[:, 3:], 0.0)
            torch.nn.init.normal_(layer.weight[:, :3], 0.0, math.sqrt(2) / math.sqrt(dim_out))
        else:
            torch.nn.init.constant_(layer.bias, 0.0)
            torch.nn.init.normal_(layer.weight, 0.0, math.sqrt(2) / math.sqrt(dim_out))
    else:
        torch.nn.init.kaiming_uniform_(layer.weight, nonlinearity='relu')
    
    if self.weight_norm:
        layer = nn.utils.weight_norm(layer)
    return layer

Also, from forward and forward_level methods in class VolumeSDF

if 'sdf_activation' in self.config:
            sdf = get_activation(self.config.sdf_activation)(sdf + float(self.config.sdf_bias))

The if statement is True even when you simply set sdf_activation to None in the config, since it's still in the config. I found that this leads the sdf values to be all positive at the start of training. I just removed the sdf_activation in the config.

After changing this part and setting the bias of the SDF to 0.6, the initial model output is as follows:

And the result of marching cubes is indeed a sphere.

However, I found that by changing the model like this results in very poor training results.

After 1000 iterations,

Also, the mesh is completely broken

So, I guess you had a reason for this design choice? Otherwise, I think this might be the reason why training the model on my custom dataset fails.

Runtime error in cudaGraphExecUpdate() from tiny-cuda-nn

Hi, thanks for your awesome work. I get a weird error during validation:
terminate called after throwing an instance of 'std::runtime_error' what(): /tmp/pip-req-build-z4954kz1/include/tiny-cuda-nn/cuda_graph.h:124 cudaGraphExecUpdate(m_graph_instance, m_graph, &error_node, &update_result) failed with error the graph update was not performed because it included changes which violated constraints specific to instantiated graph update Aborted
Do you know what causes this problem and how to solve it? Thank you in advance!

Can I run the original version of NeuS in this code?

Hi, thanks for your great code! I am wondering if I can run the original version of NeuS (which is much slower) in this repo? Are there any configs that can achieve this?

Question about nerfacc for Unbounded Scene

def contract_to_unisphere(x, radius, contraction_type):
    if contraction_type == ContractionType.AABB:
        x = scale_anything(x, (-radius, radius), (0, 1))
    elif contraction_type == ContractionType.UN_BOUNDED_SPHERE:
        x = scale_anything(x, (-radius, radius), (0, 1))
        x = x * 2 - 1  # aabb is at [-1, 1]
        mag = x.norm(dim=-1, keepdim=True)
        mask = mag.squeeze(-1) > 1
        x[mask] = (2 - 1 / mag[mask]) * (x[mask] / mag[mask])
        x = x / 4 + 0.5  # [-inf, inf] is at [0, 1]
    else:
        raise NotImplementedError
    return x

I don't know why you implemented a x = x / 4 + 0.5 # [-inf, inf] is at [0, 1] here, could you explain it a little bit :)
Thank you!

bad result on custom colmap data

Hi,

I found that the result of neus-colmap is really bad. I used colmap on nerf_synthetic data then trained the model, and the result is very bad.

colmap on nerf_synthetic/chair, radius: 1

I have also tried to change radius in neus-colmap.yaml file as in #20. I have tried 0.5, 1, 1.5 which all gave me bad results.

colmap on nerf_synthetic/chair, radius: 0.5

colmap on nerf_synthetic/chair, radius: 1.5

I tested nerf-colmap, which seems to be working, but neus-colmap isn't working on the nerf_synthetic data. Any idea what went wrong here?

Thanks

NeuS+HashEncoding Not so Good on DTU24

Hi, Benny. Have you faced the ghost floater problem when using NeuS+HashEncoding?

The rendered image is good and converged to the GT. But the mesh/normal bump/sink in some areas and many floaters are on the air.
encoding_config={
"otype": "HashGrid",
"n_levels": 16,
"n_features_per_level": 2,
"log2_hashmap_size": 19,
"base_resolution": 16,
"per_level_scale": 1.447269237440378,
"include_xyz": True,
}
SDF Network is nn.Linear(encoding.n_output_dims, 65)
Any idea is welcome~

Code get stuck when training begin, Check the execution below

Global seed set to 42
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
Global seed set to 42
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1

distributed_backend=nccl
All distributed processes registered. Starting with 1 processes

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

| Name | Type | Params

0 | model | NeRFModel | 12.6 M

12.6 M Trainable params
0 Non-trainable params
12.6 M Total params
25.220 Total estimated model params size (MB)
Epoch 0: : 0it [00:00, ?it/s]

after this line code got stuck

Code get stuck when training begin

Global seed set to 42
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
Global seed set to 42
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1

distributed_backend=nccl
All distributed processes registered. Starting with 1 processes

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

| Name | Type | Params

0 | model | NeRFModel | 12.6 M

12.6 M Trainable params
0 Non-trainable params
12.6 M Total params
25.220 Total estimated model params size (MB)
Epoch 0: : 0it [00:00, ?it/s]

Val set overfitting (due to dataset switch incorrectly)

Hi Yuanchen,

When I train the model on my own data, I realized that my validation set gets overfitting. So I printed out the split source for each batch, then I found that self.dataset didn't switch back to train_dataloader().dataset after the first validation process finished. In other words, after the first validation process, self.dataset remains to be val_dataloader().dataset, so the model keeps training on my val set which causes the overfitting.

I think the reason is that you manually switch the self.dataset inside on_train_start() which is called only once at the very beginning outside of the training loop. So the training loop has no chance to use the training set once on_val_start() is called.

For now, my quick fix is moving self.dataset = self.trainer.datamodule.train_dataloader().dataset to on_train_batch_start(), but it is not so efficient. Otherwise, I think we need to refactor datamodule somehow (move the self.preprocess_data() to the dataloader) to avoid a manual switch.

support for LLFF data format or customized data?

Can you please let me know if it supports LLFF data for customized images?

Why does estimated sdf use opposite sign in occ_eval_fn() and get_alpha() ?

Hi, thanks for this great project!

I have a question about the estimate sdf in NeusModel. Why are the signs of prev/next opposite in occ_eval_fn() and get_alph() ?

instant-nsr-pl/models/neus.py

Line 53 in 215274e

estimated_next_sdf = sdf[...,None] - self.render_step_size * 0.5

instant-nsr-pl/models/neus.py

Line 81 in 215274e

estimated_next_sdf = sdf[...,None] + iter_cos * dists.reshape(-1, 1) * 0.5

train lego exampl with neus_blender.yml catch non-zero size error

thx for great works. when i train lego example with neus_blender.yml , i got the non-zero size error, i tried change the seed number, but that did not work.
vmin, vmax = mesh_coarse['v_pos'].amin(dim=0), mesh_coarse['v_pos'].amax(dim=0)
IndexError: amin(): Expected reduction dim 0 to have non-zero size.

Quality issues

Here are some findings about improving the reconstruction quality.

Bias terms in the geometry MLP matters

In my original implementation, I omitted the bias terms in the geometry MLP for simplicity as they're initialized to 0. However, I found that these bias terms are important for producing high quality surfaces especially for detailed regions. A possible reason is that the "shifting" brought by these biases acts as some form of normalization, making the high frequency signals easier to lean. Thanks @terryryu to mention this problem in #22. Fixed in latest commits.

MSE v.s. L1

Although the original NeuS paper adopts L1 as the photometric loss for its "robustness to outliars", we found that L1 could lead to suboptimal results in certain cases, like the Lego bulldozer:

L1, 20k iters	MSE, 20k iters

Therefore, we simultaneously adopt L1 and MSE loss in NeuS training.

Floater problem

Training NeuS without background model can lead to floaters (uncontrolled surfaces) in free space. This is because floaters in background color do no harm to rendering quality, therefore cannot be optimized when training with only photometric loss. We alleviate this problem by random background augmentation (masks needed) and the sparsity loss proposed in SparseNeus (no masks needed):

For NeRF, we adopt the distortion loss proposed by MipNeRF 360 to alleviate the floater problem in training unbounded 360 scenes.

Complex cases require long training

In the given config files, the number of training steps is set to 20000 by default. This works well for objects of simple geometries, like the chair scene in the NeRF-Synthetic dataset. However, for more complicated cases where many thin structures occur, more training iterations are needed to get high quality results. This could simply be done by setting trainer.max_steps to a higher value, like 50000 or 100000.

Cases where training does not converge

A simple way to tell whether the training is converging is to check the value of inv_s (which is shown in the progress bar by default). If inv_s is steadily increasing (often ends up with >1000), then we are good. If the training diverges, inv_s typically drops below the initialized value and gets stuck. There are many reasons that could lead to divergence. To alleviate divergence caused by unstable optimization, we adopt an learning rate warm-up strategy following the original NeuS in latest commits.

Differences wrt to Instant-NSR

Hi Yuanchen,
Great work on developing this. Could you point of the differences between this and Instant-NSR.

No CUDA tookit found

In the conda environment, I still haven't solved this problem after executing conda install tookit=11.3. What may be the cause?

Obj file has no colour

Hi,
This is some great work!
One thing i would like to understand, the wavefront(.obj) file obtained has no texture. How do i get the texture?

Problem on calculating comp_rgb

Hello! I see in your codes, it is: comp_rgb = comp_rgb + self.background_color * (1.0 - opacity)
I thought it should be comp_rgb = comp_rgb*opacity + self.background_color * (1.0 - opacity)
Did I miss something? could somebody explain that?
Thank you in advance!

Codes to be updated to supports the pytorch-lightning -- 1.9.0

Thanks to the author! When I tried it on pytorch-lightning -- 1.9.0， I had met several bugs due to the new features of pytorch-lightning. I would like to share my revised codes here:

First, the feature "_get_rank" is moved, so you should modify all the code like that:

# from pytorch_lightning.utilities.rank_zero import _get_rank
# -->
from lightning_fabric.utilities.rank_zero import _get_rank  #corrected by yy

Second, in Line 6 of /utils/callbacks.py , modification should be:

# from pytorch_lightning.callbacks.base import Callback
# -->
from pytorch_lightning.callbacks import Callback

Note that, it would help to fix the bug named "ValueError("Expected a parent")".

Is the input data format inconsistent between the instant-nsr-pl version of neus and the native neus?

Neus:
shufujia-1
cameras_sphere.npz image mask
ours:
ls ./load/nerf_synthetic/lego*
test train transforms_test.json transforms_train.json transforms_val.json val
From the output of colmap, how do I construct my input？

Data format?

Hi, can you help me with translating the format in IDR's .npz file into the transformation.json file.
Actually, I do not understand the relationship between these two formats.

Questions about radius

First of all, thank you so much for providing such wonderful work.

I have two questions.
First, what is sphere_init_radius in neus? I wonder what exactly this role does, and how it relates to radius.
The neus implementation here only provides a square bounding box (as using radius). Why would not provide a rectangular bounding box? Are there any issues with that version of implementation?

Thanks!

Cannot load data

In the process of debugging the code, I found that after loading the data, the pixel value of the image and the value of the transform matrix are 0. Does anyone understand this problem?

bennyguo / instant-nsr-pl Goto Github PK

instant-nsr-pl's People

Contributors

Stargazers

Watchers

Forkers

instant-nsr-pl's Issues

| Name | Type | Params

0 | model | NeuSModel | 12.6 M

distributed_backend=nccl All distributed processes registered. Starting with 1 processes

| Name | Type | Params

0 | model | NeRFModel | 12.6 M

distributed_backend=nccl All distributed processes registered. Starting with 1 processes

| Name | Type | Params

0 | model | NeRFModel | 12.6 M

Bias terms in the geometry MLP matters

MSE v.s. L1

Floater problem

Complex cases require long training

Cases where training does not converge

Recommend Projects

Recommend Topics

Recommend Org

Jobs

distributed_backend=nccl
All distributed processes registered. Starting with 1 processes

distributed_backend=nccl
All distributed processes registered. Starting with 1 processes