kai-46 / nerfplusplus Goto Github PK

View Code? Open in Web Editor NEW

908.0 908.0 101.0 48.68 MB

improves over nerf in 360 capture of unbounded scenes

License: BSD 2-Clause "Simplified" License

Python 100.00%

nerfplusplus's People

Contributors

Stargazers

Watchers

Forkers

cule zumbalamambo zebrajack chomolungma xhuangcv deep-learning-newbie menghanxia kobiso saikiran321 mihaibujanca nirvanalan mldl chuong dkasuga avani17101 mrkarimid alexnewtown alicepang metadimensions wps1215 kurtjcu grgkopanas opal-ai bruinxiong yonghoonkwon baldrlector zhangmozhe asharma-fy hyunmin-h tne-ai zhiwenfan ionvision hoverinc hgliu1998 vsuman-fy denissebastian esx2ve fingerx charles23r xukuanhit jiaxiongq santolina ethan-jiang-1 gitshohoku fredy-zhang hengfei-wang 3a1b2c3 peterouzh jingfengrong maximevandegar ashishd anhquancao yuancaimaiyi hyunjin7 suyaoyu1999 cryptowealth-technology rudyryk hcp6897 junyoungpp calmke jiazxzx storijimmy starhiking wuao652 zjudzl ailon-island yq-liang ntphuc0101 yesheng-thu kirstihly tomzhang stefan-ax wulouzhu noursoltani zivzone zaki0207 holoversed 147-enpu yxpandjay stickice sunshineywz123 knightever kkodoo limzh00 yuzhongruicn phoenixdigitalfx 41xu ginko2501 tuskaw miaowei-hnu sugasin2813 qianqian121 jiahao68 aurora-chevalier goodyenough laotie1528 chendudai roronoa-17 chenzihao008 lulala661

nerfplusplus's Issues

How to visualise the cameras?

Thank you for the script you have provided for visualizing the cameras in the scene.

Could you please share the usage instructions for that script too?

Thanks in advance.

What is the main point of shape-radiance ambiguity

Dear author,

While I read your nerf++ paper, I coudn't fully understand shape-radiance ambiguity (Section 3 of nerf++ paper).

Is the purpose of Figure 2 experiment illustrating the ambiguity to show that NERF model can fit to arbitrary 3d shape setting of training data ?
And if it were correct (verified by Figure 2 experiment), how this fact is related to the Factor 1 ("c" must become a high-frequency function as "sigma" deviates from the correct shape) ?
Why the Factor 2 (NERF MLP structure implicitly regularize to make "c" have smooth BRDF prior w.r.d. "d") helps NERF to avoid the shape-radiance ambiguity ?
How the Factor 1 and 2 is logically related ? It seems unrelated since the Factor 1 argues NERF MLP has a limited capacity to model high complexity given incorrect shape, and the Factor 2 argues NERF MLP model implictly regularize to make "c" smooth w.r.d "d" at any given "x"

Thanks you.

Best regards,
YJHong.

how to synthesis the final 360 view video after trained?

Hi @Kai-46, the ddp_test_nerf.py didn't contain how to synthesis the final video, can you give some help? thanks~

How to understand the intersect_sphere in your code?

def intersect_sphere(ray_o, ray_d):
'''
ray_o, ray_d: [..., 3]
compute the depth of the intersection point between this ray and unit sphere
'''
# note: d1 becomes negative if this mid point is behind camera
d1 = -torch.sum(ray_d * ray_o, dim=-1) / torch.sum(ray_d * ray_d, dim=-1)
p = ray_o + d1.unsqueeze(-1) * ray_d
# consider the case where the ray does not intersect the sphere
ray_d_cos = 1. / torch.norm(ray_d, dim=-1)
p_norm_sq = torch.sum(p * p, dim=-1)
if (p_norm_sq >= 1.).any():
raise Exception('Not all your cameras are bounded by the unit sphere; please make sure the cameras are normalized properly!')
d2 = torch.sqrt(1. - p_norm_sq) * ray_d_cos

return d1 + d2

General use case

Does this method only work for 360 unbounded scenes? Does this work on, for example, forward facing scenes in NeRF? Has anyone tested?
I currently tried applying this on a driving scene, where the images are photos taken from a forward-moving car. I defined the sphere center as the last position, and the radius as 8 times the distance travelled (like for T&T dataset), poses are like the image below.

When I use NeRF, it works well with the NDC setting since everything lies inside the frustum in front of camera 0. However with NeRF++, it fails to distinguish the foreground(fg) and background(bg): when I check the training output, it learns everything as fg and the bg is all black. And since the faraway scenery is bg, it learns it very badly. I therefore have question if it only works for 360 unbounded scenes, where the fg/bg is easier to distinguish?

train on my own datasets, the loss is nan

Hi @Kai-46, after using colmap to get the pose and intrinsics of my own datasets and train from scratch, the loss got nan, and I print the network's output, the ret['rgb']'s values are all nan.

I wonder whether the pose and intrinsics are wrong(data[key]['K'] stores the intrinsic and data[key]['W2C'] pose, right? )or I need to adjust some hyperparameters of the training phase?

Hope you can help, thanks~

About shape-radiance ambiguity

Thanks for your work!

I have some questions about the solution you construct to demonstrate the shape-radiance ambiguity in Paragraph 2 of Section 3:

To illustrate this ambiguity, imagine that for a given scene we represent the geometry as a unit
sphere. In other words, let us fix NeRF’s opacity field to be 1 at the surface of the unit sphere,
and 0 elsewhere. Then, for each pixel in each training image, we intersect a ray through that pixel
with the sphere, and define the radiance value at the intersection point (and along the ray direction)
to be the color of that pixel. This artificially constructed solution is a valid NeRF reconstruction
that perfectly fits the input images.

1, Does it means that the opacity field inside the unit sphere is fixed to 0?
2, If only the opacity field at the surface be 1, the integral in Eq.(2) should be zero, since there are at most only two non-zero points along the ray.
3, Or you let $dt$(the step size of the numerical integration) to 1?

So I cannot figure out why this is a valid solution...Can you help me?

question on camera position

Thanks for open source this great repo!

In my situation, I sample camera positions on the surface of a unit sphere which is centered at the world origin. Sample cameras are distributed along x-axis and looking at the world origin. Object/scene are supposed to be around the world origin. Then, with these camera positions, I use look-at rule to to calculate camera2world transform matrices. My question is that will this camera setting be compatible with the requirements of nerf++, because I noticed that "Opencv camera coordinate system is adopted, i.e., x--->right, y--->down, z--->scene. "? Here's an example of 32 sampled camera positions in x, y, z format.

[[ 0. , 0. , 1. ],
[-0.011, 0. , 1. ],
[-0.016, 0. , 1. ],
[ 0.02 , 0. , 1. ],
[ 0.023, 0. , 1. ],
[ 0.025, 0. , 1. ],
[ 0.028, 0. , 1. ],
[-0.03 , 0. , 1. ],
[-0.032, 0. , 0.999],
[ 0.034, 0. , 0.999],
[-0.036, 0. , 0.999],
[ 0.038, 0. , 0.999],
[-0.039, 0. , 0.999],
[-0.041, 0. , 0.999],
[-0.042, 0. , 0.999],
[ 0.044, 0. , 0.999],
[-0.045, 0. , 0.999],
[ 0.047, 0. , 0.999],
[-0.048, 0. , 0.999],
[ 0.049, 0. , 0.999],
[ 0.051, 0. , 0.999],
[ 0.052, 0. , 0.999],
[ 0.053, 0. , 0.999],
[ 0.054, 0. , 0.999],
[-0.056, 0. , 0.998],
[ 0.057, 0. , 0.998],
[ 0.058, 0. , 0.998],
[ 0.059, 0. , 0.998],
[-0.06 , 0. , 0.998],
[-0.061, 0. , 0.998],
[ 0.062, 0. , 0.998],
[ 0.063, 0. , 0.998]]

Thank you so much!

How to generate a new canera_path for new view synthesis?

Hi @Kai-46, could you share how to generate a continuous and proper camera_path for the new view synthesis

Running run_colmap.py failed in the last step.

Hi,

Great work!

When I run run_colmap.py, I encountered this error:
NameError: name 'mesh' is not defined.

Besides, I find in your current code, the tf is also not defined because both in_geometry_file and out_geometry_file are None.

Could you fix these problems?

Temple and tanks validation missing for Trunk

Hello,

I downloaded the Temple and tanks dataset however Trunk is missing Validation folder.

Could you please provide it? Thanks

Regards,
Sara

Run on cumstom data

I have run colmap on my own data. Then I convert it to json format by running extract_sfm.py. THen I run normalize_cam_dict.py to get the normalized json. But when I am training the model, it requires intrinsics.txt and other files that are not generated in my pipeline, How to run your code on our custom dataset?

Why not use voxel opacity to validate geometry ambiguity?

Scene geometry can be drawn by thresholding opacity of the volume. Why now apply a marchingcube to produce a mesh, which is more straight forward

Process 1 terminated with the following error

2022-11-18 23:46:13,697 [INFO] root: tat_training_Truck step: 0 resolution: 1.000000 level_0/loss: 0.064675 level_0/pnsr: 11.892565 level_1/loss: 0.064430 level_1/pnsr: 11.909071 iter_time: 0.250360
Exception in thread Thread-1:
Traceback (most recent call last):
File "//anaconda3/envs/nerfplusplus/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "//anaconda3/envs/nerfplusplus/lib/python3.6/site-packages/tensorboardX/event_file_writer.py", line 202, in run
data = self._queue.get(True, queue_wait_duration)
File "//anaconda3/envs/nerfplusplus/lib/python3.6/multiprocessing/queues.py", line 108, in get
res = self._recv_bytes()
File "//anaconda3/envs/nerfplusplus/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "//anaconda3/envs/nerfplusplus/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "//anaconda3/envs/nerfplusplus/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError

Traceback (most recent call last):
File "ddp_train_nerf.py", line 604, in
train()
File "ddp_train_nerf.py", line 599, in train
join=True)
File "//anaconda3/envs/nerfplusplus/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "//anaconda3/envs/nerfplusplus/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/~~/anaconda3/envs/nerfplusplus/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "//anaconda3/envs/nerfplusplus/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "//nerfplusplus-master/ddp_train_nerf.py", line 488, in ddp_train_nerf
idx = what_val_to_log % len(val_ray_samplers)
ZeroDivisionError: integer division or modulo by zero

how to realise the problem,could you help me

Model output for demo examples are blurry

After training with default configs listed in repo, I got much blurrier rendered images compared to what has been demonstrated. In particular truck scene has been trained for 500K iterations. Same situation for the train scene. I might be missing something important in the training phase.

What are the training params for the best model performance with high resolutions?

For reference see below two images, one is rendered second is the ground truth.

Question about the background net

Hi, amazing work! I have a small question about the model:
At L119:

nerfplusplus/ddp_model.py

Line 119 in 770e584

 bg_dists = torch.cat((bg_dists, HUGE_NUMBER * torch.ones_like(bg_dists[..., 0:1])), dim=-1) # [..., N_samples] 

why the "dists" is inversed distance, instead of the real distance? It seems wrong in volume rendering? or Am I missing something?
Thanks so much!

About run_colmap

Thanks for your work firstly. I am now working on my outdoor datasets and have some bad results in previous NeRF methods. I want to try on your code but I fail to build the dataset as the form of yours because of the run_colmap.py. I wonder where the install/bin/colmap is under the colmap/build directory. I just know how to use the cmake to compile colmap and cannot find the above path . Could you give me some advice?

what LICESE is for nerfplusplus ?

Thanks for a great repository !!!

I'd like to use those nerfplusplus codes for daily task for job if possible.
So, what kind of license is expected in this repository ?

Code for initializing NeRF's geometry as a sphere

Hi! Thanks for the elaborate analysis and informative experiments!

Noticed that you conducted an experiment where the geometry (opacity field) of NeRF is fixed (or initialized). Could you please also offer the code of that experiment?

Thanks in advance!

Running on custom dataset

How do I run on custom dataset. Errors trying to run on custom dataset

COLMAP pipeline gives faulty results on custom and vanilla data

Thanks for all the hard work on this!

Summary

The provided COLMAP pipeline is giving apparently faulty results, making it difficult to use my own custom data. I've confirmed this by running your given data through the pipeline with minimal changes to the code.

Overview

I'm attempting to run NeRF++ on my own custom dataset and I ran into very blurry and unusable results after running it through the COLMAP pipeline and training step. To isolate the issue, I ran the full dataset conversion on your dataset, specifically tat_training_Truck in your provided tanks and temples dataset.

I ran run_colmap.py on a new directory with just the rgb images of tat_training_Truck. Multiple issues arose when I visualized the results.

1. Focal point is out of frame in epipolar geometry visualization

I'm not terribly familiar with epipolar geometry, but I assume that the epipolar lines should converge within the view of the given frame (I assume this is the focal point? Please correct me if I'm wrong). This does not occur in the given dataset despite the camera pose pointing at the object of interest, which tells me that the outputted intrinsic matrix is incorrect.

green camera is visible on left side of image, seemingly oriented and positioned correctly

visualization of epipolar geometry of this pose

This tells me that there's some bug in the run_colmap.py pipeline that is causing a bad intrinsic matrix to result

2. Camera path not fully normalized to unit sphere

This was not an issue with my custom dataset, but it seems to be here. I visualized the automatic normalization that your script performed and the camera track did not get bound to the unit sphere. Additionally, there seems to be no built-in support for normalizing the kai_points.ply pointcloud. You seemed to have successfully normalized it in the example you gave, so I have two questions on this point:

How do you successfully normalize these camera poses within the unit sphere?
How do you normalize the kai_points.ply pointcloud and convert it to a mesh like you did in your example?

This comes straight out of the vanilla COLMAP pipeline, which is very different from the posted example

3. Blurry training results

I figure that this is a consequence of 1.. However, I can't demonstrate this for the vanilla data since its poses aren't successfully normalized according to 2.. Here's a sample of the blur experienced from training on a chair for many, many hours:

I also wrote my own converter that takes this outputted COLMAP data and transforms it into NeRF++-readable format. I figure no bugs from there are present here since this is before that conversion even takes place. On that note, if you have official code for this process I'd also love to take a look.

End

Since I performed minimal modifications upon the code and I'm using vanilla data, I figure there's either a bug in the system or I'm doing this fundamentally improperly. Do you have any suggestions on how to fix this so that I can use my own custom data without running into these same issues?

Optimize per-image autoexposure

Where is the code and do you have an explanation of how it works? Thanks.

how to generate rendering poses(like camera path in tanks_and_temples)?

I want to rendering a whole scene using trained model, but do not know how to generate the rendering poses and intrinsics, could you please give me some advices?

COLMAP code providing bad camera poses on provided data

Hi,
I am having trouble working with the camera geometries generated by the run_colmap.py function, and thus have not generated any usable model for nerfplusplus so far. I have downloaded the trucks and temples data and rerun the colmap generation on just the images in the training_truck directory, and obtained the following image:

This was after several guesses in the code, such as commenting out this line here which gave me an error as 'mesh' didn't exist, and guessing that my generated file at $DATASET_PATH/posed_images/kai_cameras_normalized.json is the desired variable needed in train/cam_dict_norm.json. So please let me know if I've done anything wrong in that regard.

I cannot even run the camera visualisation code as I get this error:
GLX: Failed to create context: BadValue (integer parameter out of range for operation)

I appreciate any help you can provide.

Colmap creates 0 and 1 directory under sfm/sparse

I just have a scene captured from multiple views and placed all 39 images in one directory...

Once I run the script run_colmap.py, I am seeing 2 folders getting created under sfm/sparse namely 0 and 1. The "0" folder consist info of 35 images and "1" folder.

May I know what is reason behind creation of 0 and 1 folder?

For MVS, which folder path I need to give?

index error

LPIPS version

Hi,

Thanks for the great work. Could you please tell me what version of LPIPS was used to obtain the results as stated in the paper?
i.e. AlexNet, VGG or SqueezeNet?

Thanks in advance

No validation data provided for Truck scene in Preprocessed Trucks and Temples zip

There is validation data provided for other scenes except the truck. Is it intentional or by a mistake? Can you please provide validation data of the Truck scene as well

What is autoexposure ?

Hi, I am YJHong and thanks for your great work!

I wonder what is autoexposure option for nerf.

Is it necessary option to run nerf++ code?

Thank you,
YJHong.

about run_colmap.py

cameras, images, points3D files do not exist at “...”\output\sfm\sparse
how to solve this problem？

Poses are camera-to-world?

According to your data description:

Poses are camera-to-world, not world-to-camera transformations.

Then why does normalize_cam_dict.py save W2C in the cam_dict? Shouldn't it be C2W (cam to world)?

how can i test on my custom scene?

thanks for your work.
how can i test it with my custom photos of scene?
how can get pos and camera settings?

run_colmap json files

Hi,

Thanks for sharing your code. I am running the colmap script on my own data and it produces json files but all the examples scenes you give have the camera parameters and poses in txt files. Is there a utility to convert jsons to txt files or the main train script can understand both?

George

there is no 'validation' dir in 'tanks_and_temples/tat_training_Truck'

about scene normalization

In your implementation, scene normalization is just camera position normalization. In my understanding, this is equivalent to scaling the size of the world. So should the intrinsics of the camera also be scaled?

How to specify which gpu to use for training

If I want to train only with the num'2' and num'3' gpus, how should i change the code? thank you very much!

Explanation of intersect_sphere and a faster implementation

This function computes the intersection depth, but there is no explanation either in the paper or in the code.

nerfplusplus/ddp_train_nerf.py

Lines 42 to 57 in 5792048

 def intersect_sphere(ray_o, ray_d): 

 ''' 

  ray_o, ray_d: [..., 3] 

  compute the depth of the intersection point between this ray and unit sphere 

  ''' 

 # note: d1 becomes negative if this mid point is behind camera 

 d1 = -torch.sum(ray_d * ray_o, dim=-1) / torch.sum(ray_d * ray_d, dim=-1) 

 p = ray_o + d1.unsqueeze(-1) * ray_d 

 # consider the case where the ray does not intersect the sphere 

 ray_d_cos = 1. / torch.norm(ray_d, dim=-1) 

 p_norm_sq = torch.sum(p * p, dim=-1) 

 if (p_norm_sq >= 1.).any(): 

 raise Exception('Not all your cameras are bounded by the unit sphere; please make sure the cameras are normalized properly!') 

 d2 = torch.sqrt(1. - p_norm_sq) * ray_d_cos 

 return d1 + d2

So in case it's not clear for somebody, I intend to provide some insights of how it is calculated, and a faster implementation based on my approach:
We have the origin o and the direction d, and we want the intersection depth with the unit sphere.
A straightforward method is to find t such that ||o+td|| = 1.
By raising both sides to the square, what we get is a quadratic equation in t such that:

||d||^2*t^2 + 2*(o.d)*t + ||o||^2-1 = 0

Then we can solve t using the famous formula.

It results in the following implementation:

def intersect_sphere(rays_o, rays_d):
    odotd = torch.sum(rays_o*rays_d, 1)
    d_norm_sq = torch.sum(rays_d**2, 1)
    o_norm_sq = torch.sum(rays_o**2, 1)
    determinant = odotd**2+(1-o_norm_sq)*d_norm_sq
    assert torch.all(determinant>=0), \
        'Not all your cameras are bounded by the unit sphere; please make sure the cameras are normalized properly!'
    return (torch.sqrt(determinant)-odotd)/d_norm_sq

which I have verified to yield the same result (epsilon-close) as the original implementation, but 5-10x faster (11ms vs 2ms for 100k rays on my PC, not that significant though).

Another possible code optimization that we can do is possibly normalize rays_d from the beginning, that way we can get rid of the d_norm_sq in intersect_sphere and also here

nerfplusplus/ddp_model.py

Lines 82 to 83 in 5792048

 ray_d_norm = torch.norm(ray_d, dim=-1, keepdim=True) # [..., 1] 

 viewdirs = ray_d / ray_d_norm # [..., 3]

How do I use camera poses I already have?

I have a new set of custom images. I also have the camera parameters.

Can you please let me know what should be the structure of the camera poses, both intrinsic and extrinsic parameters to run the code on those?

question about unit sphere

During rendering, is the camera position which is the ray_o must within the unit sphere? What if we want to render beyond the sphere?

why sample t0 linearly from 0 to 1 in order to get a linear sampling in disparity

I can understand the paper moving the camera position origin to the z = -n plane.

t = -(near + rays_o[..., 2]) / rays_d[..., 2]

rays_o = rays_o + t[..., None] * rays_d

But I don't understand why t0 changes linearly from 0 to 1 in NDC space. Because according to the formula, it should be - 1 to 1.

code for final rendering

Hi,
Can you also provide the code for rendering the final video? Many thanks :)

how to extract a mesh from nerf++ models?

expecting for some advice about extracting mesh from nerf++ model!!!

Camera path meaning and using of poses

Hello!

I've got two questions for you, hope it's fine.

First one: what exactly is camera path and how to obtain it? I'm thinking like you need a video forming a path and for that you need to extract camera posed and intrinsics? Correct me if I'm wrong. I want to understand the concept.

Second one: Do we need to operate with normalized poses or unnormalized poses? Those two .json files store the camera, but I don't know which one to choose and use it for my custom dataset.

Thank you and stay safe!

Add a Google Colab

Hi,

does it run on Google Colab?
Will add a version myself, if I have the time. 👍

split_size error when trainning

Thank you for your perfect work. When I train on my own dataset, there exists error as follows:

Could you please help me deal with it. Thanks so much!

Colmap

Hello,

I was having problems running the script. The point is that I can't find the proper path for the colmap_bin in run_colmap.py script, using COLMAP 3.6 for Windows. I was trying to reproduce your path, but without success. Can you please help with the colmap_bin path?
Thanks in advance!

ZeroDivisionError: integer division or modulo by zero

Running on custom dataset gives me zero division error

idx = what_val_to_log % len(val_ray_samplers)
ZeroDivisionError: integer division or modulo by zero

Preprocessing data

Hello!

First of all, nice job! I was wondering how can we preprocess new data and make our own dataset with generated poses, intrinsics. Thanks in advance!

about inverted sphere parameterization

To do volume rendering, we need to get x', y', z'.

And in the paper, in order to find x', y', z', it is said to be obtained by rotating point a of the figure.

If you just divide x, y, z by r, isn't it x', y', z'? Why do I have to get it as hard as a picture? Am I misunderstanding something?

Temple and tanks pretrained models

Hello, it seems that the pretrained model don't correspond to the model because when i Load the train scene the psnr at testing time is 7.

could you please provide the pretrained model again? thanks!!! I really appreciate it !!!

Sara

	def intersect_sphere(ray_o, ray_d):
	'''
	ray_o, ray_d: [..., 3]
	compute the depth of the intersection point between this ray and unit sphere
	'''
	# note: d1 becomes negative if this mid point is behind camera
	d1 = -torch.sum(ray_d * ray_o, dim=-1) / torch.sum(ray_d * ray_d, dim=-1)
	p = ray_o + d1.unsqueeze(-1) * ray_d
	# consider the case where the ray does not intersect the sphere
	ray_d_cos = 1. / torch.norm(ray_d, dim=-1)
	p_norm_sq = torch.sum(p * p, dim=-1)
	if (p_norm_sq >= 1.).any():
	raise Exception('Not all your cameras are bounded by the unit sphere; please make sure the cameras are normalized properly!')
	d2 = torch.sqrt(1. - p_norm_sq) * ray_d_cos

	return d1 + d2

	ray_d_norm = torch.norm(ray_d, dim=-1, keepdim=True) # [..., 1]
	viewdirs = ray_d / ray_d_norm # [..., 3]