chenhsuanlin / bundle-adjusting-nerf Goto Github PK
View Code? Open in Web Editor NEWBARF: Bundle-Adjusting Neural Radiance Fields 🤮 (ICCV 2021 oral)
License: MIT License
BARF: Bundle-Adjusting Neural Radiance Fields 🤮 (ICCV 2021 oral)
License: MIT License
Hi Chen-Hsuan,
Thanks for the great repo and a great piece of research! I think it would have even more impact if any user who wants to try it could launch training in the cloud, with automated hyperparameter sampling.
I have just made a PR which includes all the necessary infrastructure to be able to launch experiments in AzureML #14. This should enable faster and easier experimentation with your code for anyone who might want to use it. Let me know if you have any suggestions of how to make the PR better.
Best wishes,
Stan
Hello Chen-Hsuan,
Thank you again for releasing the code.
I have tried to train BARF with Freiburg Cars dataset. (please see the attached mp4 file)
As you can see, it is a real dataset with a spherical sequence. (hybrid between LLFF and NeRF-Synthetic)
I ran COLMAP to obtain camera pose, and have tried to train the model but it was not successful.
I have tried
Separately, NeRF could be trained with this dataset successfully.
Please advise me on how I should do with this.
Cheers,
Wonbong
In the example in data/iphone.py
the focal length is computed as self.focal = self.raw_W * 4.2 / (12.8 / 2.55)
. I am wondering what the specific constants mean?
Hi, thank you for releasing the code.
I want to train BARF w/ the photos taken by myself, and there is no camera pose.
Please advise me on how I should do with this.
Thank you very much.
hello thanks for the sharing~!
im trying to train barf with my custom dataset. I dont know the camera intrinsics and
the camera is located at 3 position(0, 30, 60 degree) spherically and an object is on the turn-table and took pics of it every 15 degrees.
so i have 72 pics ((pitch 0 yaw 15, 30, 45,...,360) and (pitch 30 yaw 15, 30, 45,...,360) and (pitch 60 yaw 15, 30, 45,...,360)).
so, even the camera is fixed per each pitch, its similar as blender style camera movement.
Barf support blender and llff but i failed to provide applicable camera pose information(.json or npz) as blender or llff dataloader.py need. so i tried to use iphone configuration.
i seems to train at a level. but is quite far from succeed after 200000steps.
i tried to initialize the camera pose spherically manually but it gives worse results.
please give me some hints to solve it.
thanks^^.
When i use my own dataset to trainning the model,how can i export the camera pose file? Or where can i find it?
Hi, @chenhsuanlin
Thanks for your great work. I'm trying barf on other scenes. However, the training behavior seems weird. As you can see from the image below, the rotation error is decreasing, but the translation error keeps increasing.
When I took a look at the synthesized validation image, it seems the result was biased by several pixels from the original image, and also the scale is not consistent with the original image.
For my experimental setting, I used COLMAP to compute the ground truth camera poses and intrinsics. The initial camera poses for barf are not identities instead of perturbing by a small pose with noise to be 0.15. I wonder if there are any parameters we need to fine tune?
Figure 5 in the paper shows NeRF naive postinal encoding result, which displays the GT pose and refine pose of full postional encoding.
I looked into nerf part codes, but there is no relevant ones.
bundle-adjusting-NeRF/model/nerf.py
Line 89 in 803291b
Visulization function in nerf.py only supports Tensorbord but not includes visdom like barf.py.
It would be great if you could provide some suggestion about this issue. thx a lot
Hello Chenhsuan!
Thanks for sharing your code!
I got a problem with your experiments on my own dataset, which is shot on iPhone.
I see your video which introduces BARF on youtube. In the end, you give the results on your life sequences such as living room or kitchen.
could you tell me how can I apply your approach to my dataset, so that I can try novel view synthesis by just providing an image folder with no pre-computed camera poses ?
I would like to ask a question about the difference between test and evaluate, I notice that for the "test-optim" mode, you would still perform some refinement and for the "eval" part, you evaluate the results, so if I have trained a model, I'm going to test the model and regenerate the figure, I should use the "test-optim" mode or the "eval" mode, from my understanding, I should use the "test-optim" mode? I'd appreciate it if you answered my questions about the difference about test and evaluate.
By the way, great works and thank you in advance!!!
Hi @chenhsuanlin,
Thanks for great work. I have some initial estimates of camera parameters from some of my own images. I wanted to use the blender dataset and I was wondering if I understand correctly that the transform_matrix in transforms_train.json for each frame is the camera extrinsic parameter in the form of [R | t], right?
Hi Chen-Hsuan,
I successfully created an experiment in the iphone.py style. That is, without any information of camera poses. Is it possible to create a 3D model from the output?
Thanks in advance,
Hey,
I am trying to optimize the camera positions given a non textured mesh. Right now NeRF implementation does not consider this prior and samples random rays. I was wondering how do I provide my mesh as initial input to BARF? Since I already have the mesh it should be instant to fix the camera poses!
I've tried to use PyTorch3D as differentiable renderer with BARF's additional parameters for camera pose optimisation, but it doesn't work. The cameras drift away and loss becomes NaN.
I tried to use the built-in initialization (kaiming init) and the network cannot converge. Why is the xavier init necessary?
I use Nuscenes data, a auto-driving dataset which have camera to world transform matrix, to train the barf. And I normalize the translation matrix between 1 to 10. I used tensorboard to visualize the training process and found the train loss converged, but the val loss went up. Do you have some ideas about the reason for this?
after i train a barf and a image align model ,i get a .ckpt file ,how i can use the model or test the model
could you please share the code of "Planar Image Alignment" Experiment ? Thanks
Hi Chen-Hsuan
First of all thank you very much for uploading your work here.
I cloned your repo to try to run the experiments on my local machine (Ubuntu 20.04). I tried to run the chair dataset with BARF with these 2 lines:
python3 train.py --group=G0 --model=barf --yaml=barf_blender --name=Test1 --data.scene=chair --barf_c2f=[0.1,0.5] --max_iter=2000 --visdom!
python3 evaluate.py --group=G0 --model=barf --yaml=barf_blender --name=Test1 --data.scene=chair --data.val_sub= --resume
But when running the evaluate.py file I got an error regarding MAGMA. Do you know the cause of this error? How can I fix it?
Thanks in advance,
Hello, what is your experimental environment like?
Hi, we are experimenting with the BARF code so that I can load another data format.
I want to load and use train_data
(For example, other data that I added other than camera pose or image data in iphone.py) in def forward in nerf.py .
Could you tell me how I can load the data I want?
Hi @chenhsuanlin great work!
I'm trying BARF on my custom data, and it shows promising results. However, I'm wondering whether the performance will be better if we optimize the intrinsic as well. Do you feel it's doable? If so, could you please guide me on where I should be modifying? Thanks!
Thank you for sharing this nice work. I'm just curious if you happen to have multi-gpu training code by hand? I was trying to train BARF
with multi GPU, but got stuck in a weird OOM issue: the GPU memory explode into over 50G, while your original code base takes less than 10G on blender/lego
Here's the edit I made: Ir1d@904228c
The command to run: CUDA_VISIBLE_DEVICES=0,1 python train.py --group=blender --model=barf --yaml=barf_blender --name=lego_baseline --data.scene=lego --gpu=1 --visdom! --batch_size=2
Do you know what might be the leading to the OOM here?
Thank you!
Hi Chenhsuan!
Thank you for sharing the code! When I try BARF on my own sequence,
tran script:python train.py --group='barf' --model=barf --yaml=barf_iphone --name='iphone' --data.scene='cats' --arch.posenc!
test script:python evaluate.py --group='barf' --model=barf --yaml=barf_iphone --name='iphone' --data.scene='cats' --resume
some error(s) occurred in test while loading state_dict for NeRF. Have you encountered this problem before? Thank you
Errors:
restoring nerf...
Traceback (most recent call last):
File "evaluate.py", line 34, in
main()
File "evaluate.py", line 28, in main
m.restore_checkpoint(opt)
File "/data1/lvxinbi/barf/model/base.py", line 53, in restore_checkpoint
epoch_start,iter_start = util.restore_checkpoint(opt,self,resume=opt.resume)
File "/data1/lvxinbi/barf/util.py", line 131, in restore_checkpoint
child.load_state_dict(child_state_dict)
File "/home/hotel_ai/anaconda3/envs/torch1.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for NeRF:
size mismatch for mlp_feat.0.weight: copying a param with shape torch.Size([256, 3]) from checkpoint, the shape in current model is torch.Size([256, 63]).
size mismatch for mlp_feat.4.weight: copying a param with shape torch.Size([256, 259]) from checkpoint, the shape in current model is torch.Size([256, 319]).
size mismatch for mlp_rgb.0.weight: copying a param with shape torch.Size([128, 259]) from checkpoint, the shape in current model is torch.Size([128, 283]).
I set the barf_c2f = [0.1,0.5],but can't reproduce the result in the paper. It's the question about Hyperparameters?
Thank u
Hi,
I wish to replicate the results from the original NeRF paper, but when I train with nerf_blender_repr.yaml, I only get PSNR 26.38 on lego scene. It's lower than nerf_blender.yaml, which can get PSNR 29.19.
Is there any mistake on nerf_blender_repr.yaml? Hope for your response.
I hope this is not too dumb of a question, but I'm having trouble understanding how evaluate.py
can correctly derive the mean rotation and translation errors from a model checkpoint file if the camera noise is not saved as part of the checkpoint?
As I understand it, self.graph.se3_refine
is learning pose corrections from the identity (for non-blender models) or from randomly-initialized noise (for blender models). In the blender case, when evaluating a trained model, wouldn't the camera noise be different between training and evaluation?
Thanks for your help!
Hi! Thank you for sharing your fantastic work!
When I was trying to train the network with a 3090 in barf mode with position encoding with blender datasets, after validating, it always show that
RuntimeError: CUDA out of memory. Tried to allocate 5.25 GiB (GPU 0; 23.70 GiB total capacity; 15.75 GiB already allocated; 4.71 GiB free; 15.77 GiB reserved in total by PyTorch)
Could you please tell me which parameter I should change to avoid this problem? Thank you very much!
Dear ChenHsuan Lin, thanks for sharing your great job!
I just have some difficulty to understand the main idea of the "camera poses registration" part in your article. I wonder if I can understand as it's through back propagation to optimize the initial camera poses? And where can i find this part in your codebase?
(You can express in Chinese if it's more comfortable. I just saw your name but i'm not sure hhh. If not, ignore this line.)
Hi @chenhsuanlin,
Thanks for sharing the great work. I had a doubt. Why are the camera poses perturbed after every iteration during training?
In data/llff.py
, in the parse_cameras_and_bounds
method, we are converting the poses_bounds.npy
and ingesting it to use the given camera poses. Per LLFF's specification, it seems like the convention of this dataset has the transformation matrix for axes [down, right, backward]
(i.e. positive x
is down, positive y
is right, positive z
is backward).
Per line 49 in the aforementioned file, it seems like we are swapping these axes to switch to a new convention
poses_raw[...,0],poses_raw[...,1] = poses_raw[...,1],-poses_raw[...,0]
moving from [down, right, backward]
to [right, up, backward]
(i.e. positive x
is right, positive y
is up, positive z
is backward).
However, the translation vector doesn't seem to be receiving the same modification. Is this behavior intended, as it seems inconsistent with the rest of the change?
Thanks for the wonderful project.
I wonder what does these numbers mean, and should I change them when testing my own dataset?
bundle-adjusting-NeRF/data/iphone.py
Line 65 in 803291b
Thank you for your code. I tried to use your code on a docker environment. I chose to close visdom visualization and run the code as the following command:
CUDA_VISIBLE_DEVICES=1 python3 train.py --group=llff --model=barf --yaml=barf_llff --name=orchids --data.scene=orchids --barf_c2f=[0.1,0.5] --visdom!
Then I met the problem as,
Could you please help me? What did I do wrong?
Hello,
Thank you for the great work! I really like the neat project structure so I am experimenting with NeRF on this codebase. However, I find the training is very sensitive to network initialization.
First, if I trained on the hotdog scene for 2000 iterations using default configs in the options/nerf_blender_repr.yaml
, I would get a normal render from self.nerf
, but an empty render from self.nerf_fine
.
The command is
python train.py --group=nerf-debug --model=nerf --yaml=nerf_blender_repr --name=debug-hotdog-nerf-9 --data.scene=hotdog
The result shown in tensorboard is
After some investigation, I found that the reason lay in the different initialization weights of the two networks. Then I tried changing the random seed from 0 to 233, this time both networks rendered an empty scene.
The command is
python train.py --group=nerf-debug --model=nerf --yaml=nerf_blender_repr --name=debug-hotdog-nerf-10 --data.scene=hotdog --seed=233
Finally, I tried copying the self.nerf
's weights to self.nerf_fine
, using the following code
if opt.nerf.fine_sampling:
self.nerf_fine = NeRF(opt)
self.nerf_fine.load_state_dict(self.nerf.state_dict()) // here
and set the random seed back to 0. This time the result was fine.
The command is
python train.py --group=nerf-debug --model=nerf --yaml=nerf_blender_repr --name=debug-hotdog-nerf-11 --data.scene=hotdog
Here, I want to post the results to discuss the general stability of NeRF training. I wonder if other NeRF repositories all have this kind of sensitivity to network initialization, or are we missing some initialization trick in this repo?
I believe this can relate to a more fundamental nature of NeRF. Do you have any ideas about this phenomenon? Thank you!
Thanks for your great work! I want to know about training gpu type and training time for BARF and NeRF.
Hi, thanks for sharing such a good work!
I have a question regarding the custom data. I use custom data to train my network. But I found that there is no code to render image or extract mesh based on the trained network.
Would you mind giving an introduction regarding how to represent custom data? Thank you.
Any help will be greatly appreciated!
Thanks for your great work.What computer configuration are you running on?
Hi Chen-Suan,
There is an issue with multiprocessing on Windows (see screenshot below). Wrapping the functions in train.py and evaluate.py with a main() function resolves them. I made a PR that resolves this issue: #11
Let me know if you think you can merge it into main :)
Hi Chen-Hsuan,
Thanks for the great work.
Every time the validation code is run, the resulting renders seem to move away from the camera (see images).
The underlying reason is that the arguments passed to preprocess_camera.py are passed by reference, so preprocess_camera.py changes the intrinsics of the underlying camera. This change occurs every time the validation code is run, so the intrinsics change, and the validation images become smaller and smaller. To avoid this issue, I have created a PR that detaches and clones the pose before preprocessing, therefore passing a copy of the intrinsics to be modified appropriately. #12
Let me know if we can pull this into main - it avoids the objects 'moving away', making the validation images easier to analyse.
Kind regards,
Stan
Thanks for your sharing of codes.
I followed all your tips in the ReadME and tried to reproduce the results of your paper.
However, the rotation error of my test is much higher than yours in the paper.
llff:fern | rotation | translation( x100 ) |
---|---|---|
paper | 0.191 | 0.192 |
reproduce | 0.689 | 0.193 |
Can you give some advice?
Besides, the depth map and the rendered RGB seem good.
Thanks!
Hi, Chen-Hsuan Lin.
Thank you for sharing the great work!
I have been reading the code and I did not understand very well about camera pose transformation when calling __getitem__
method for LLFF dataset:
https://github.com/chenhsuanlin/bundle-adjusting-NeRF/blob/main/data/llff.py#L104
In my understanding, camera pose in returned values from parse_cameras_and_bounds
is camera-to-world matrix and its coordinate system is [right, up, backwards].
https://github.com/chenhsuanlin/bundle-adjusting-NeRF/blob/main/data/llff.py#L42
Then, the camera pose is transformed by parse_raw_camera
when calling __getitem__
, but I could not follow what the transformation did:
https://github.com/chenhsuanlin/bundle-adjusting-NeRF/blob/main/data/llff.py#L104
Could you please let me know?
def procrustes_analysis(X0,X1): # [N,3]
# translation
t0 = X0.mean(dim=0,keepdim=True)
t1 = X1.mean(dim=0,keepdim=True)
X0c = X0-t0
X1c = X1-t1
# scale
s0 = (X0c**2).sum(dim=-1).mean().sqrt()
s1 = (X1c**2).sum(dim=-1).mean().sqrt()
X0cs = X0c/s0
X1cs = X1c/s1
# rotation (use double for SVD, float loses precision)
U,S,V = (X0cs.t()@X1cs).double().svd(some=True)
R = ([email protected]()).float()
if R.det()<0: R[2] *= -1
# align X1 to X0: X1to0 = (X1-t1)/[email protected]()*s0+t0
sim3 = edict(t0=t0[0],t1=t1[0],s0=s0,s1=s1,R=R)
return sim3
It’s line278~line295 of camera.py
@chenhsuanlin
Hello, I have a doubt why we did the average instead of the square when calculating s0 and s1. The s here represents the scale of the scene, it seems that it is more reasonable to square first and then average
hi @zhenpeiyang and thanks for your interesting paper and code!
A detailed that is disturbing me is that when you start with pose_GT=pose, you have a non-zero rotation loss, and it's lowering during training!
Matthieu.
Thank you @chenhsuanlin for sharing this impressive work. I wanted to ask you whether it is possible to initiate BARF training from initial camera poses instead of identity matrices changing the configuration only or I have to adapt the code.
thanks for the code!that‘s really a nice work!
But there seems to be a little description of how to use my own images for training, could you tell me more about it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.