zhengqili / neural-scene-flow-fields Goto Github PK
View Code? Open in Web Editor NEWPyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes"
License: MIT License
PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes"
License: MIT License
I get the following error while loading the pre-trained ResNet when I run run_midas.py
from a remote server. On my local machine, it however worked (with python 3.9.12).
Here I use python 3.7.4, but I also tried with python 3.8.5 with the same result.
Traceback (most recent call last): File "run_midas.py", line 267, in <module> args.resize_height) File "run_midas.py", line 158, in run model = MidasNet(model_path, non_negative=True) File "/cluster/project/infk/courses/252-0579-00L/group34_nerf/CloudNeRF/other_papers/Neural-Scene-Flow-Fields/nsff_scripts/models/midas_net.py", line 30, in __init__ self.pretrained, self.scratch = _make_encoder(features, use_pretrained) File "/cluster/project/infk/courses/252-0579-00L/group34_nerf/CloudNeRF/other_papers/Neural-Scene-Flow-Fields/nsff_scripts/models/blocks.py", line 6, in _make_encoder pretrained = _make_pretrained_resnext101_wsl(use_pretrained) File "/cluster/project/infk/courses/252-0579-00L/group34_nerf/CloudNeRF/other_papers/Neural-Scene-Flow-Fields/nsff_scripts/models/blocks.py", line 26, in _make_pretrained_resnext101_wsl resnet = torch.hub.load("facebookresearch/WSL-Images", "resnext101_32x8d_wsl") File "/cluster/project/infk/courses/252-0579-00L/group34_nerf/CloudNeRF/other_papers/Neural-Scene-Flow-Fields/nsff_venv/lib64/python3.7/site-packages/torch/hub.py", line 403, in load repo_or_dir = _get_cache_or_reload(repo_or_dir, force_reload, verbose, skip_validation) File "/cluster/project/infk/courses/252-0579-00L/group34_nerf/CloudNeRF/other_papers/Neural-Scene-Flow-Fields/nsff_venv/lib64/python3.7/site-packages/torch/hub.py", line 170, in _get_cache_or_reload repo_owner, repo_name, branch = _parse_repo_info(github) File "/cluster/project/infk/courses/252-0579-00L/group34_nerf/CloudNeRF/other_papers/Neural-Scene-Flow-Fields/nsff_venv/lib64/python3.7/site-packages/torch/hub.py", line 124, in _parse_repo_info with urlopen(f"https://github.com/{repo_owner}/{repo_name}/tree/main/"): File "/cluster/apps/nss/python/3.7.4/x86_64/lib64/python3.7/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/cluster/apps/nss/python/3.7.4/x86_64/lib64/python3.7/urllib/request.py", line 525, in open response = self._open(req, data) File "/cluster/apps/nss/python/3.7.4/x86_64/lib64/python3.7/urllib/request.py", line 543, in _open '_open', req) File "/cluster/apps/nss/python/3.7.4/x86_64/lib64/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/cluster/apps/nss/python/3.7.4/x86_64/lib64/python3.7/urllib/request.py", line 1360, in https_open context=self._context, check_hostname=self._check_hostname) File "/cluster/apps/nss/python/3.7.4/x86_64/lib64/python3.7/urllib/request.py", line 1319, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [Errno 111] Connection refused>
Hello,
first of all, thank you for releasing the implementation for your amazing project.
The question I wanted to ask is how does one adapt NSFF to support reconstruction in euclidean space, thereby extending it to also work on non-forward facing scenes?
In other words, which parts of the codebase would I need to modify to enable the codebase to run on such scenes? I'm guessing just setting the "no_ndc" flag to "True" inside the config file wouldn't be enough.
HI,
How many training images do you use on each time instance?
Hi,
I just want to quickly inquire about Midas depth prediction. The original midas approach seems to predict disparity or inverse depth, rather than euclidean depth: isl-org/MiDaS#42.
However, as far as I can tell, the rendered depth map is based on z-values distance, not inverse depth. Thus, is your MiDaS model pretrained on depth? Thanks.
Hello,
Do you have any suggestions for making the training faster given GPUs with more memory?
I'm working with 2 A6000s and would like to fully leverage the memory capacity.
I trained on a new dataset, (full 500000) but had to change the netwidth to 128 (from 256) to be able to train on 16GB GPU. See attached video, the result is poor, the scene is obviously complex, with original forward motions into moving windmills. Do you have any advice? Is the 128 MLP the cause of the poor results, or the dataset is too complex for the approach?
thanks
Here the network only takes input of points and view directions and predicts the forward and backward flow to neighbor frames. But in paper eq(4), the network additionally needs the time i
. Is the i
not useful in practice?
If that's the case, I'm wondering how the network can distinguish the different frames, if there are some points and view dirs that happen to be the same in different frames.
#local run
colmap feature_extractor \
--database_path ./database.db --image_path ./dense/images/
colmap exhaustive_matcher \
--database_path ./database.db
colmap mapper \
--database_path ./database.db \
--image_path ./dense/images \
--output_path ./dense/sparse
colmap image_undistorter \
--image_path ./dense/images \
--input_path ./dense/sparse/0 \
--output_path ./dense \
--output_type COLMAP \
--max_image_size 2000
#colab run
%cd /content/drive/MyDrive/neural-net
!git clone https://github.com/zl548/Neural-Scene-Flow-Fields
%cd Neural-Scene-Flow-Fields
!pip install configargparse
!pip install matplotlib
!pip install opencv
!pip install scikit-image
!pip install scipy
!pip install cupy
!pip install imageio.
!pip install tqdm
!pip install kornia
my Images are 288x512 pixels
%cd /content/drive/MyDrive/neural-net/Neural-Scene-Flow-Fields/nsff_scripts/
# create camera intrinsics/extrinsic format for NSFF, same as original NeRF where it uses imgs2poses.py script from the LLFF code: https://github.com/Fyusion/LLFF/blob/master/imgs2poses.py
!python save_poses_nerf.py --data_path "/content/drive/MyDrive/neural-net/Neural-Scene-Flow-Fields/nerf_data/bolli/dense"
# Resize input images and run single view model,
# argument resize_height: resized image height for model training, width will be resized based on original aspect ratio
!python run_midas.py --data_path "/content/drive/MyDrive/neural-net/Neural-Scene-Flow-Fields/nerf_data/bolli/dense" --resize_height 512
!bash ./download_models.sh
# Run optical flow model
!python run_flows_video.py --model models/raft-things.pth --data_path /content/drive/MyDrive/neural-net/Neural-Scene-Flow-Fields/nerf_data/bolli/dense
Error:
Traceback (most recent call last):
File "run_flows_video.py", line 448, in <module>
run_optical_flows(args)
File "run_flows_video.py", line 350, in run_optical_flows
images = load_image_list(images)
File "run_flows_video.py", line 257, in load_image_list
images = torch.stack(images, dim=0)
RuntimeError: stack expects each tensor to be equal size, but got [3, 512, 288] at entry 0 and [3, 512, 287] at entry 31
So input_w = is not consistent, eventhough my images are all dimensions 288x512
Even if I modify the script:
def load_image(imfile):
long_dim = 512
img = np.array(Image.open(imfile)).astype(np.uint8)
# Portrait Orientation
if img.shape[0] > img.shape[1]:
input_h = long_dim
input_w = 288
The dimensions error is gone, but another error:
...
flow input w 288 h 512
0
/usr/local/lib/python3.7/dist-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Traceback (most recent call last):
File "run_flows_video.py", line 448, in <module>
run_optical_flows(args)
File "run_flows_video.py", line 363, in run_optical_flows
(img_train.shape[1], img_train.shape[0]),
AttributeError: 'NoneType' object has no attribute 'shape'
Hi, I am wondering if there is a standard for SSIM and LPIPS.
For SSIM: I see you use the implementation of scikit-image. When I use kornia's implementation with window size = 11 (I don't know what size scikit-image uses if it's not set), it seems to yield different result... Do you have idea what other authors use?
For LPIPS:
normalize=True
Neural-Scene-Flow-Fields/nsff_exp/models/__init__.py
Lines 26 to 34 in 62a1770
It makes me think that if there's no common standard for these metrics that might differ from one implementation to the other, or that sometimes the authors make mistake in the evaluation process, then only the PSNR score is credible...
Hi,
I found the evaluation.py
actually use the training images. Could you please share how to get the exact number in Table 3 of your paper. More specifically, how to know which are
the remaining 11 held-out images per time instance for evaluation
?
Hi, I was wondering why
sf_sm_loss += args.w_sm * compute_sf_lke_loss(ret['raw_pts_ref'],
ret['raw_pts_post'],
ret['raw_pts_prev'],
H, W, focal)
is called twice at the following two lines?:
Neural-Scene-Flow-Fields/nsff_exp/run_nerf.py
Line 589 in d400175
Neural-Scene-Flow-Fields/nsff_exp/run_nerf.py
Line 593 in 86ad6dd
Should compute_sf_lke_loss
compute the
Thank you!
Hi, thank you for sharing this amazing work!
I would like to try a personal video using your model and wonder if you can share more detailed instructions on how to use COLMAP to get the training data. For instance, do you use only sparse reconstruction to get the data? If so, what do you set for the data type and shared intrinsics options? Furthermore, how do you deal with merging iterations of sparse reconstruction?
Thanks
Hi,
I've been trying to work with your code (very cool project by the way). Is there any plan to make the instructions cristal clear on how to use your code to infer scenes from one's video or image. Is there anyway you could for example create a google colab out of it? I've given a shot at it and couldn't complete colmap's installation.
I'm really looking forward to being able to experiment with your project!
Thanks in advance
Hi i want to reproduce the original quality
Is running kids config in cofigs folder the same as the original?
How many iters should i train
2000000 or 360000?
Hi team, amazing paper!
I am trying to adapt your model to regularise the volume-rendered scene flow against a monocular scene flow estimator from a different paper. The third-party scene flow estimator produces results in world coordinates (not normalised) so for me to compare against this model's scene flow, which is in NDC space, I need to transform one coordinate system to the other. This has me confused by some of the code you use to do coordinate system conversions.
o
) to NDC space (o'
):Neural-Scene-Flow-Fields/nsff_exp/run_nerf_helpers.py
Lines 534 to 541 in 5620494
Neural-Scene-Flow-Fields/nsff_exp/run_nerf_helpers.py
Lines 546 to 563 in 5620494
se3_transform_points
converts from world space to camera space, is that correct?Generally, it would be very helpful to me if you could point me to where you obtained the operations for perspective_projection
, se3_transform_points
and NDC2Euclidean
.
My graphics knowledge is limited so apologies if these questions are trivial. Your help is greatly appreciated :)
Hi all. I'm trying to run this method on custom data, with mixed success so far. I was wondering if you had any insight about what might be happening. I'm attaching some videos and images to facilitate discussion.
Even though the data driven monocular depth / flow losses are phased out during training, I wonder if monocular depth is perhaps causing these issues? Although again the monocular depth and flow both look reasonable.
Let me know if you have any insights about what might be going on, and how we can improve quality here -- I'm a bit stumped at the moment. I'm also happy to send you the short dataset / video sequence that we're using if you'd like to take a look.
All the best,
~Ben
Hi! I was wondering more about the method solving novel time synthesis(average splatting). Is this a mathematical transformation or a method using pretrain model?
NDC2Euclidean appears to be attempting to prevent a divide-by-zero error by the addition of an epsilon value:
def NDC2Euclidean(xyz_ndc, H, W, f):
z_e = 2./ (xyz_ndc[..., 2:3] - 1. + 1e-6)
x_e = - xyz_ndc[..., 0:1] * z_e * W/ (2. * f)
y_e = - xyz_ndc[..., 1:2] * z_e * H/ (2. * f)
xyz_e = torch.cat([x_e, y_e, z_e], -1)
return xyz_e
However, since the coordinates have scene flow field vectors added to them, and the scene flow field output ranges (-1.0,1.0), it is possible to have xyz_ndc
significantly outside of the normal range. This means that a divide-by-zero can still happen in the above code if the z value hits (1.0+1e-6), which it does in our training.
We suggest clamping to valid NDC values to the range (-1.0, 0.99), with 0.99 chosen to prevent the Euclidean far plane from getting too large. This choice of clamping has significantly stabilized our training in early iterations:
z_e = 2./ (torch.clamp(xyz_ndc[..., 2:3], -1.0, 0.99) - 1.0)
https://github.com/zhengqili/Neural-Scene-Flow-Fields/blob/main/nsff_exp/run_nerf_helpers.py#L535
Hi, I have originally mentioned this issue in #1, but it seems to deviate from the original question, so I decided to open a new issue.
As discussed in #1, I tried setting the raw_blend_w
to either 0 or 1 to create "static only" and "dynamic only" images that theoretically would look like the Fig. 5 in the paper and in the video. However, this approach seems to be wrong because from the result, the static part looks ok-ish but the dynamic part is almost everything, which is not good at all (we want only the moving part, e.g. only the running kid).
It's been a week that I have been testing this while waiting for some response, but still to no avail. @zhengqili @sniklaus @snavely @owang Sorry for bothering, but could any of the authors kindly clarify what's wrong with my approach to separate static/dynamic by setting the blending weight to either 0 or 1? I also tried blending the sigmas (opacity in the code) instead of alphas as in the paper, or directly use the rgb_map_ref_dy
as output image, but neither helped.
Neural-Scene-Flow-Fields/nsff_exp/render_utils.py
Lines 804 to 809 in 7d8a336
I have applied the above approach to other pretrained scenes, but none of them produces good results.
Left: static (raw_blend_w=0). Right: dynamic (raw_blend_w=1).
Left: static (raw_blend_w=0). Right: dynamic (raw_blend_w=1).
I believe there's something wrong with my approach, but I cannot figure out. I really appreciate if the authors could kindly point out what's the correct approach. Thank you very much.
I really appreciate to your great work!!
And I have a question though.
I think there are no multi-view image for evaluation on broom and curl scene dataset.
Could you let me know how to evaluate it?
I guess that the only way to evaluate it is creating validation dataset from original nerfies dataset or loading the original one using their data format.
Am I right or do we have some other ways?
If you give me the answer, it would be really helpful!! :)
thank you!
I would like to replicate the CVD experiment you have on the project page.
I ran the example training on an RTX 3090 and it took 41 hours to complete. Is there any recommended settings for maximum results for using a 24 GB card other than trial and error if training time isn't a concern? I noticed when I run the example the quality is much lower than your gifs on your project page? Are you using more samples in your configuration?
Also can this generate depth map output or am I misunderstanding what the video was describing? It looked like you had a visualization for depth per frame. Is extracting that information supported in the released implementation easily?
I wanted to use the data provided in nvidia_data_full.zip
for evaluating my own methods. During this I recognised that when I directly load the data with np.load
the dimension of the loaded array is (24, 17)
. This leads me to believe that these poses (probably) only correspond to the poses in the images
directory, which would mean that no poses are available for the images in the mv_images
directory, since 24 corresponds to the number of time steps, but not to the number of cameras, which would be 12.
I am not really sure what this implies for the provided evaluation script, which loads the images from the mv_images
directory but relies on the output poses from the load_nvidia_data
function.
In the evaluation script the ids of the cameras are used to access the first dimension of the poses array.
start_frame
and end_frame
parameters. Since the number of time steps (24) is higher than the number of cameras (12) this does not throw an error, as long as the start_frame
and end_frame
parameters are selected accordingly.
I would really appreciate clarification on this, even if the confusion is just due to my misunderstanding of the code.
Also please let me know if there is another way to obtain the poses for the images in the the mv_images
directory.
I'm running your program from Windows and I noticed a few things.
Very minor, but I needed to install tensorboard as a dependency that isn't listed in the README in order to train.
On lines https://github.com/zhengqili/Neural-Scene-Flow-Fields/blob/main/nsff_scripts/run_midas.py#L84 and 104 you have linux commands for copying and removing files.
I don't know python, but import shutil
and this sounded right:
src_files = os.listdir(imgdir_orig)
for file_name in src_files:
full_file_name = os.path.join(imgdir_orig, file_name)
if os.path.isfile(full_file_name):
shutil.copy(full_file_name, imgdir)
Also https://github.com/zhengqili/Neural-Scene-Flow-Fields/blob/main/nsff_scripts/run_flows_video.py#L218 and 219 use rm.
shutil.rmtree(mask_dir)
shutil.rmtree(semantic_dir)
When run_flows_video.py runs it puts the motion masks into the image directory. If I copy them to the motion_masks folder then copy the images_512x288 images to the images directory I can begin training. (Assuming that's where they're supposed to be). I'm training on a single 3090, so this might take a while. (I turned off the NaN detection in the recent pull request which seems to have sped things up quite a bit).
In readme you say it takes 2 days on 2 V100 gpus, but I don't see any option setting the number of gpus to use in run_nerf.py
. Does it mean this code only supports single gpu training?
I am trying to train on my dataset, a small one of 12 images (computed colmap and llff and all seems fine up to the training), and I encounter this error, which seems to come from freeing memory or something similar. Any advice? I am happy to share my image data. Using linux, (I tried gcc 6.3, 7.5 and 9.1, same), with cuda 11.0 and pytorch 1.8.0. see error file
Thank you for your great work!!
How did you visualize 3D sceneflow as in demo video?
Could you share a code or resource?
Were any experiments conducted with 360 captured scene? Curious to know if there is any difference between training forward facing vs 360 scene.
Hello, I really appreciate to your awesome work!!
However, I have an error when I try to run the evaluation.py
file with our trained model.
I think there are lack of declaration on --chain_sf
in the file.
I used the given configuration file of kid-running scene.
Here are script that I run
python evaluation.py --datadir /data1/dogyoon/neural_sceneflow_data/nerf_data/kid-running/dense/ --expname Default_Test --config configs/config_kid-running.txt
Do I should add the argument in evaluation.py file or there are any way to solve this problem?
Thanks!!
Hi,
I believe there is an issue with the demo as outlined in the readme. Specifically, when trying to use the pre-trained model to render any of the 3 interpolation methods - time, viewpoint, or both - the resulting images (as found in nsff_exp/logs/kid-running_ndc_5f_sv_of_sm_unify3_testing_F00-30/<interpolation_dependent_name>/images
) come out looking like this:
or
depending on which form of interpolation I run.
I have a colab file set up that obtains the above result. It pulls from a forked repository which has only two changes: I add a requirements.txt file and update the data_dir
in nsff_exp/configs/config_kid-running.txt
.
Have I missed something or am I correct in saying the demo broken? Thanks!
I download the kidrun data but when I run the run_nerf.py
it causes the FileNotFoundError
. Because I don't have the motion mask data. How can I get the mask data?It was created by this model or use other mehtod to get the motion mask?
Hi, the idea of warping and having a blending weight predicting how to merge static and dynamic results is a good idea, however I have been confused by some operations in the implementation. Initially I was not sure if these confusing parts lead to bad results, until now I see several issues concerning the performance on own datasets, so I decided to post this issue, hoping to provide some insights that could potentially solve some of the issues.
Neural-Scene-Flow-Fields/nsff_exp/render_utils.py
Lines 1019 to 1022 in 4cb2ef4
Neural-Scene-Flow-Fields/nsff_exp/render_utils.py
Lines 1055 to 1057 in 4cb2ef4
I would like to know @zhengqili 's opinion on these points, and maybe suggest the users to try these modifications to see if it solves some problem.
Hi, thanks for the code! Do you plan to publish the full data (running kid, and other data you used in the paper other than the NVIDIA ones) as well?
In fact, the thing I'd like to check the most is your motion masks' accuracy. I'd like know if it's really possible to let the network learn to separate the background and the foreground by only providing the "coarse mask" that you mentioned in the supplementary.
For example for the bubble scene on the project page, how accurate should the mask be to clearly separate the bubbles from the background like you showed? Have you also experimented on the influence of the mask quality, i.e. if masks are more coarse (larger), then how well can the model separate bg/fg?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.