GithubHelp home page GithubHelp logo

bundle-adjusting-nerf's People

Contributors

chenhsuanlin avatar martinarroyo avatar szymanowiczs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bundle-adjusting-nerf's Issues

Launching experiments in AzureML

Hi Chen-Hsuan,

Thanks for the great repo and a great piece of research! I think it would have even more impact if any user who wants to try it could launch training in the cloud, with automated hyperparameter sampling.

I have just made a PR which includes all the necessary infrastructure to be able to launch experiments in AzureML #14. This should enable faster and easier experimentation with your code for anyone who might want to use it. Let me know if you have any suggestions of how to make the PR better.

Best wishes,
Stan

Training BARF with another real data

Hello Chen-Hsuan,

Thank you again for releasing the code.
I have tried to train BARF with Freiburg Cars dataset. (please see the attached mp4 file)
As you can see, it is a real dataset with a spherical sequence. (hybrid between LLFF and NeRF-Synthetic)

I ran COLMAP to obtain camera pose, and have tried to train the model but it was not successful.
I have tried

  1. Starting from zero 6-dimensional se3 vectors (as in LLFF), but the model could not find the proper pose.
  2. Adding noise (0.15) from the GT pose, but the error on Rotation and Translation keeps increasing. As far as I know, the model finds the correct pose first and optimizes the NeRF parameters - but they do not work. Please see the images. Validation error keeps increasing as the pose estimation was not successful.

Separately, NeRF could be trained with this dataset successfully.
Please advise me on how I should do with this.

Cheers,
Wonbong

car001.mp4

Screenshot from 2022-04-27 17-50-05
Screenshot from 2022-04-27 17-55-06

Training BARF w/o camera pose

Hi, thank you for releasing the code.
I want to train BARF w/ the photos taken by myself, and there is no camera pose.
Please advise me on how I should do with this.

Thank you very much.

How to train for the case of a real life object and blender-like camera setting

hello thanks for the sharing~!
im trying to train barf with my custom dataset. I dont know the camera intrinsics and
the camera is located at 3 position(0, 30, 60 degree) spherically and an object is on the turn-table and took pics of it every 15 degrees.
so i have 72 pics ((pitch 0 yaw 15, 30, 45,...,360) and (pitch 30 yaw 15, 30, 45,...,360) and (pitch 60 yaw 15, 30, 45,...,360)).
so, even the camera is fixed per each pitch, its similar as blender style camera movement.
Barf support blender and llff but i failed to provide applicable camera pose information(.json or npz) as blender or llff dataloader.py need. so i tried to use iphone configuration.

i seems to train at a level. but is quite far from succeed after 200000steps.

i tried to initialize the camera pose spherically manually but it gives worse results.

please give me some hints to solve it.

thanks^^.

distort visualization in visdom

Hi Chenhsuan!

Thank you for sharing the code! I ran into a small issue regarding the visdom visualization: the camera meshes are sometimes distorted as shown in below figure. Have you encountered this problem before? Thank you
Screen Shot 2021-09-21 at 3 55 36 PM
!

Weird traning behavior of barf

Hi, @chenhsuanlin

Thanks for your great work. I'm trying barf on other scenes. However, the training behavior seems weird. As you can see from the image below, the rotation error is decreasing, but the translation error keeps increasing.
image

When I took a look at the synthesized validation image, it seems the result was biased by several pixels from the original image, and also the scale is not consistent with the original image.
image

For my experimental setting, I used COLMAP to compute the ground truth camera poses and intrinsics. The initial camera poses for barf are not identities instead of perturbing by a small pose with noise to be 0.15. I wonder if there are any parameters we need to fine tune?

Part of the scene looks like this:
P1000686

And the reconstructed scene:
image

Result seems not right

Hello, thanks for your great work!
I ran the barf code on the chair of the blender dataset, but the validation result doesn't seem right on tensorboard, is this normal?
Train:
MZO$U7N$IG($FQ(L(}BOT

Test:
loss

how to visualize positional encoding in visdom?

Figure 5 in the paper shows NeRF naive postinal encoding result, which displays the GT pose and refine pose of full postional encoding.

I looked into nerf part codes, but there is no relevant ones.

def visualize(self,opt,var,step=0,split="train",eps=1e-10):

Visulization function in nerf.py only supports Tensorbord but not includes visdom like barf.py.

It would be great if you could provide some suggestion about this issue. thx a lot

questions about how to do the experiments on my own dataset

Hello Chenhsuan!
Thanks for sharing your code!
I got a problem with your experiments on my own dataset, which is shot on iPhone.
I see your video which introduces BARF on youtube. In the end, you give the results on your life sequences such as living room or kitchen.
could you tell me how can I apply your approach to my dataset, so that I can try novel view synthesis by just providing an image folder with no pre-computed camera poses ?

Question about test and evaluate

I would like to ask a question about the difference between test and evaluate, I notice that for the "test-optim" mode, you would still perform some refinement and for the "eval" part, you evaluate the results, so if I have trained a model, I'm going to test the model and regenerate the figure, I should use the "test-optim" mode or the "eval" mode, from my understanding, I should use the "test-optim" mode? I'd appreciate it if you answered my questions about the difference about test and evaluate.

By the way, great works and thank you in advance!!!

initial camera parameters

Hi @chenhsuanlin,

Thanks for great work. I have some initial estimates of camera parameters from some of my own images. I wanted to use the blender dataset and I was wondering if I understand correctly that the transform_matrix in transforms_train.json for each frame is the camera extrinsic parameter in the form of [R | t], right?

Camera Pose Optimization with Geometry Prior

Hey,

I am trying to optimize the camera positions given a non textured mesh. Right now NeRF implementation does not consider this prior and samples random rays. I was wondering how do I provide my mesh as initial input to BARF? Since I already have the mesh it should be instant to fix the camera poses!

I've tried to use PyTorch3D as differentiable renderer with BARF's additional parameters for camera pose optimisation, but it doesn't work. The cameras drift away and loss becomes NaN.

train loss converged, but val loss do not

I use Nuscenes data, a auto-driving dataset which have camera to world transform matrix, to train the barf. And I normalize the translation matrix between 1 to 10. I used tensorboard to visualize the training process and found the train loss converged, but the val loss went up. Do you have some ideas about the reason for this?
Screenshot from 2022-06-20 18-24-10
Screenshot from 2022-06-20 18-28-35

A question

after i train a barf and a image align model ,i get a .ckpt file ,how i can use the model or test the model

experiment

could you please share the code of "Planar Image Alignment" Experiment ? Thanks

Error in magma_getdevice_arch: MAGMA not initialized (call magma_init() first) or bad device

Hi Chen-Hsuan

First of all thank you very much for uploading your work here.

I cloned your repo to try to run the experiments on my local machine (Ubuntu 20.04). I tried to run the chair dataset with BARF with these 2 lines:

python3 train.py --group=G0 --model=barf --yaml=barf_blender --name=Test1 --data.scene=chair --barf_c2f=[0.1,0.5] --max_iter=2000 --visdom!

python3 evaluate.py --group=G0 --model=barf --yaml=barf_blender --name=Test1 --data.scene=chair --data.val_sub= --resume

But when running the evaluate.py file I got an error regarding MAGMA. Do you know the cause of this error? How can I fix it?

Screenshot from 2021-10-19 17-44-29

Thanks in advance,

How to use train data in class Graph

Hi, we are experimenting with the BARF code so that I can load another data format.

I want to load and use train_data(For example, other data that I added other than camera pose or image data in iphone.py) in def forward in nerf.py .
Could you tell me how I can load the data I want?

How to optimize intrinsics as well?

Hi @chenhsuanlin great work!

I'm trying BARF on my custom data, and it shows promising results. However, I'm wondering whether the performance will be better if we optimize the intrinsic as well. Do you feel it's doable? If so, could you please guide me on where I should be modifying? Thanks!

Multi-GPU training (DataParallel)

Hi @chenhsuanlin

Thank you for sharing this nice work. I'm just curious if you happen to have multi-gpu training code by hand? I was trying to train BARF
with multi GPU, but got stuck in a weird OOM issue: the GPU memory explode into over 50G, while your original code base takes less than 10G on blender/lego

Here's the edit I made: Ir1d@904228c
The command to run: CUDA_VISIBLE_DEVICES=0,1 python train.py --group=blender --model=barf --yaml=barf_blender --name=lego_baseline --data.scene=lego --gpu=1 --visdom! --batch_size=2

Do you know what might be the leading to the OOM here?
Thank you!

AttributeError: 'Model' object has no attribute 'train_data'

When I run “extract_mesh.py”. It seems that there is no data loaded. What's the matter?
image
At the same time, the external parameters of the program I run are as follows
“python extract_mesh.py --group=G1 --model=barf --yaml=barf_blender --name=First --data.scene=chair --data.val_sub= --resume”

When I try BARF on my own sequence, some error(s) occurred in Test while loading state_dict for NeRF

Hi Chenhsuan!

Thank you for sharing the code! When I try BARF on my own sequence,

tran script:python train.py --group='barf' --model=barf --yaml=barf_iphone --name='iphone' --data.scene='cats' --arch.posenc!
test script:python evaluate.py --group='barf' --model=barf --yaml=barf_iphone --name='iphone' --data.scene='cats' --resume

some error(s) occurred in test while loading state_dict for NeRF. Have you encountered this problem before? Thank you

Errors:
restoring nerf...
Traceback (most recent call last):
File "evaluate.py", line 34, in
main()
File "evaluate.py", line 28, in main
m.restore_checkpoint(opt)
File "/data1/lvxinbi/barf/model/base.py", line 53, in restore_checkpoint
epoch_start,iter_start = util.restore_checkpoint(opt,self,resume=opt.resume)
File "/data1/lvxinbi/barf/util.py", line 131, in restore_checkpoint
child.load_state_dict(child_state_dict)
File "/home/hotel_ai/anaconda3/envs/torch1.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for NeRF:
size mismatch for mlp_feat.0.weight: copying a param with shape torch.Size([256, 3]) from checkpoint, the shape in current model is torch.Size([256, 63]).
size mismatch for mlp_feat.4.weight: copying a param with shape torch.Size([256, 259]) from checkpoint, the shape in current model is torch.Size([256, 319]).
size mismatch for mlp_rgb.0.weight: copying a param with shape torch.Size([128, 259]) from checkpoint, the shape in current model is torch.Size([128, 283]).

Unreasonable training result with nerf_blender_repr.yaml

Hi,
I wish to replicate the results from the original NeRF paper, but when I train with nerf_blender_repr.yaml, I only get PSNR 26.38 on lego scene. It's lower than nerf_blender.yaml, which can get PSNR 29.19.
Is there any mistake on nerf_blender_repr.yaml? Hope for your response.

graph.pose_noise not saved for models with blender dataset?

I hope this is not too dumb of a question, but I'm having trouble understanding how evaluate.py can correctly derive the mean rotation and translation errors from a model checkpoint file if the camera noise is not saved as part of the checkpoint?

As I understand it, self.graph.se3_refine is learning pose corrections from the identity (for non-blender models) or from randomly-initialized noise (for blender models). In the blender case, when evaluating a trained model, wouldn't the camera noise be different between training and evaluation?

Thanks for your help!

Out of memory problem

Hi! Thank you for sharing your fantastic work!
When I was trying to train the network with a 3090 in barf mode with position encoding with blender datasets, after validating, it always show that
RuntimeError: CUDA out of memory. Tried to allocate 5.25 GiB (GPU 0; 23.70 GiB total capacity; 15.75 GiB already allocated; 4.71 GiB free; 15.77 GiB reserved in total by PyTorch)
Could you please tell me which parameter I should change to avoid this problem? Thank you very much!

implement on registration

Dear ChenHsuan Lin, thanks for sharing your great job!
I just have some difficulty to understand the main idea of the "camera poses registration" part in your article. I wonder if I can understand as it's through back propagation to optimize the initial camera poses? And where can i find this part in your codebase?

(You can express in Chinese if it's more comfortable. I just saw your name but i'm not sure hhh. If not, ignore this line.)

LLFF Fern produces zoomed-in views

Thank you for sharing the code!

We are trying to reproduce the paper results, and run into an issue where the LLFF Fern seems to produce zoomed-in views. Is this expected, or should we change something in our procedure?

Thanks!

Output:
image (1)

Reference:
image (2)

Misaligned axes when converting LLFF data format to BARF coordinate frame

In data/llff.py, in the parse_cameras_and_bounds method, we are converting the poses_bounds.npy and ingesting it to use the given camera poses. Per LLFF's specification, it seems like the convention of this dataset has the transformation matrix for axes [down, right, backward] (i.e. positive x is down, positive y is right, positive z is backward).

Per line 49 in the aforementioned file, it seems like we are swapping these axes to switch to a new convention

poses_raw[...,0],poses_raw[...,1] = poses_raw[...,1],-poses_raw[...,0]

moving from [down, right, backward] to [right, up, backward] (i.e. positive x is right, positive y is up, positive z is backward).

However, the translation vector doesn't seem to be receiving the same modification. Is this behavior intended, as it seems inconsistent with the rest of the change?

run codes in the docker and meet a problem of cuda error

Thank you for your code. I tried to use your code on a docker environment. I chose to close visdom visualization and run the code as the following command:
CUDA_VISIBLE_DEVICES=1 python3 train.py --group=llff --model=barf --yaml=barf_llff --name=orchids --data.scene=orchids --barf_c2f=[0.1,0.5] --visdom!

Then I met the problem as,
image
Could you please help me? What did I do wrong?

Training are very sensitive to network initialization

Hello,
Thank you for the great work! I really like the neat project structure so I am experimenting with NeRF on this codebase. However, I find the training is very sensitive to network initialization.

First, if I trained on the hotdog scene for 2000 iterations using default configs in the options/nerf_blender_repr.yaml, I would get a normal render from self.nerf, but an empty render from self.nerf_fine.

The command is

python train.py --group=nerf-debug --model=nerf --yaml=nerf_blender_repr --name=debug-hotdog-nerf-9 --data.scene=hotdog

The result shown in tensorboard is
image

After some investigation, I found that the reason lay in the different initialization weights of the two networks. Then I tried changing the random seed from 0 to 233, this time both networks rendered an empty scene.

The command is

python train.py --group=nerf-debug --model=nerf --yaml=nerf_blender_repr --name=debug-hotdog-nerf-10 --data.scene=hotdog --seed=233

The result is
image

Finally, I tried copying the self.nerf's weights to self.nerf_fine, using the following code

          if opt.nerf.fine_sampling:
              self.nerf_fine = NeRF(opt)
              self.nerf_fine.load_state_dict(self.nerf.state_dict()) // here

and set the random seed back to 0. This time the result was fine.

The command is

python train.py --group=nerf-debug --model=nerf --yaml=nerf_blender_repr --name=debug-hotdog-nerf-11 --data.scene=hotdog

The result is
image

Here, I want to post the results to discuss the general stability of NeRF training. I wonder if other NeRF repositories all have this kind of sensitivity to network initialization, or are we missing some initialization trick in this repo?

I believe this can relate to a more fundamental nature of NeRF. Do you have any ideas about this phenomenon? Thank you!

Extract mesh or rendering on Custom Data?

Hi, thanks for sharing such a good work!

I have a question regarding the custom data. I use custom data to train my network. But I found that there is no code to render image or extract mesh based on the trained network.

Would you mind giving an introduction regarding how to represent custom data? Thank you.

Any help will be greatly appreciated!

Multiprocessing issue on Windows

Hi Chen-Suan,

There is an issue with multiprocessing on Windows (see screenshot below). Wrapping the functions in train.py and evaluate.py with a main() function resolves them. I made a PR that resolves this issue: #11

Let me know if you think you can merge it into main :)

Kind regards,
Stan
issue_mp

Image from validation cameras 'moves away'

Hi Chen-Hsuan,

Thanks for the great work.
Every time the validation code is run, the resulting renders seem to move away from the camera (see images).
MicrosoftTeams-image (3)
MicrosoftTeams-image (4)
MicrosoftTeams-image (5)
MicrosoftTeams-image (6)

The underlying reason is that the arguments passed to preprocess_camera.py are passed by reference, so preprocess_camera.py changes the intrinsics of the underlying camera. This change occurs every time the validation code is run, so the intrinsics change, and the validation images become smaller and smaller. To avoid this issue, I have created a PR that detaches and clones the pose before preprocessing, therefore passing a copy of the intrinsics to be modified appropriately. #12

Let me know if we can pull this into main - it avoids the objects 'moving away', making the validation images easier to analyse.

Kind regards,
Stan

Documentation on the LLFF data scene?

Hi Chen-Hsuan,

Do you have any documentation regarding how to set up another scene experiment? I did not found anything on the readme nor in the paper.

I'm refering to this folders and files:
Data_LLFF

Thanks in advance,

Cannot reproduce results on llff:fern

Thanks for your sharing of codes.

I followed all your tips in the ReadME and tried to reproduce the results of your paper.
However, the rotation error of my test is much higher than yours in the paper.

llff:fern rotation translation( x100 )
paper 0.191 0.192
reproduce 0.689 0.193

Can you give some advice?
Besides, the depth map and the rendered RGB seem good.
image

Thanks!

Question about camera pose transformation for LLFF

Hi, Chen-Hsuan Lin.
Thank you for sharing the great work!

I have been reading the code and I did not understand very well about camera pose transformation when calling __getitem__ method for LLFF dataset:
https://github.com/chenhsuanlin/bundle-adjusting-NeRF/blob/main/data/llff.py#L104

In my understanding, camera pose in returned values from parse_cameras_and_bounds is camera-to-world matrix and its coordinate system is [right, up, backwards].
https://github.com/chenhsuanlin/bundle-adjusting-NeRF/blob/main/data/llff.py#L42

Then, the camera pose is transformed by parse_raw_camera when calling __getitem__, but I could not follow what the transformation did:
https://github.com/chenhsuanlin/bundle-adjusting-NeRF/blob/main/data/llff.py#L104
Could you please let me know?

scale calculation of sim3

def procrustes_analysis(X0,X1): # [N,3]
    # translation
    t0 = X0.mean(dim=0,keepdim=True)
    t1 = X1.mean(dim=0,keepdim=True)
    X0c = X0-t0
    X1c = X1-t1
    # scale
    s0 = (X0c**2).sum(dim=-1).mean().sqrt()
    s1 = (X1c**2).sum(dim=-1).mean().sqrt()
    X0cs = X0c/s0
    X1cs = X1c/s1
    # rotation (use double for SVD, float loses precision)
    U,S,V = (X0cs.t()@X1cs).double().svd(some=True)
    R = ([email protected]()).float()
    if R.det()<0: R[2] *= -1
    # align X1 to X0: X1to0 = (X1-t1)/[email protected]()*s0+t0
    sim3 = edict(t0=t0[0],t1=t1[0],s0=s0,s1=s1,R=R)
    return sim3

It’s line278~line295 of camera.py
@chenhsuanlin 
Hello, I have a doubt why we did the average instead of the square when calculating s0 and s1. The s here represents the scale of the scene, it seems that it is more reasonable to square first and then average

rotation and translation losses

hi @zhenpeiyang and thanks for your interesting paper and code!
A detailed that is disturbing me is that when you start with pose_GT=pose, you have a non-zero rotation loss, and it's lowering during training!

Matthieu.

How to train with my own images?

thanks for the code!that‘s really a nice work!
But there seems to be a little description of how to use my own images for training, could you tell me more about it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.