GithubHelp home page GithubHelp logo

banmo's Introduction

BANMo

[Project page] [Paper] [Colab for NVS]

This repo provides scripts to reproduce experiments in the paper. For the latest updates on the software, please check out lab4d.

Changelog

  • 11/21: Remove eikonal loss to align with paper results, #36
  • 08/09: Fix eikonal loss that regularizes surface (resulting in smoother mesh).
  • 06/18: Add a colab demo for novel view synthesis.
  • 04/11: Replace matching loss with feature rendering loss; Fix bugs in LBS; Stablize optimization.
  • 03/20: Add mesh color option (canonical mappihg vs radiance) during surface extraction. See --ce_color flag.
  • 02/23: Improve NVS with fourier light code, improve uncertainty MLP, add long schedule, minor speed up.
  • 02/17: Add adaptation to a new video, optimization with known root poses, and pose code visualization.
  • 02/15: Add motion-retargeting, quantitative evaluation and synthetic data generation/eval.

Install

Build with conda

We provide two versions.

[A. torch1.10+cu113 (1.4x faster on V100)]
# clone repo
git clone [email protected]:facebookresearch/banmo.git --recursive
cd banmo
# install conda env
conda env create -f misc/banmo-cu113.yml
conda activate banmo-cu113
# install pytorch3d (takes minutes), kmeans-pytorch
pip install -e third_party/pytorch3d
pip install -e third_party/kmeans_pytorch
# install detectron2
python -m pip install detectron2 -f \
  https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html
[B. torch1.7+cu110]
# clone repo
git clone [email protected]:facebookresearch/banmo.git --recursive
cd banmo
# install conda env
conda env create -f misc/banmo.yml
conda activate banmo
# install kmeans-pytorch
pip install -e third_party/kmeans_pytorch
# install detectron2
python -m pip install detectron2 -f \
  https://dl.fbaipublicfiles.com/detectron2/wheels/cu110/torch1.7/index.html

Data

We provide two ways to obtain data. The easiest way is to download and unzip the pre-processed data as follows.

[Download pre-processed data]

We provide preprocessed data for cat and human. Download the pre-processed rgb/mask/flow/densepose images as follows

# (~8G for each)
bash misc/processed/download.sh cat-pikachiu
bash misc/processed/download.sh human-cap
[Download raw videos]

Download raw videos to ./raw/ folder

bash misc/vid/download.sh cat-pikachiu
bash misc/vid/download.sh human-cap
bash misc/vid/download.sh dog-tetres
bash misc/vid/download.sh cat-coco

To use your own videos, or pre-process raw videos into banmo format, please follow the instructions here.

PoseNet weights

[expand]

Download pre-trained PoseNet weights for human and quadrupeds

mkdir -p mesh_material/posenet && cd "$_"
wget $(cat ../../misc/posenet.txt); cd ../../

Demo

This example shows how to reconstruct a cat from 11 videos and a human from 10 videos. For more examples, see here.

Hardware/time for running the demo

The short schedule takes 4 hours on 2 V100 GPUs (+SSD storage). To reach higher quality, the full schedule takes 12 hours. We provide a script that use gradient accumulation to support experiments on fewer GPUs / GPU with lower memory.

Setting good hyper-parameter for videos with various length

When optimizing videos with different lengths, we found it useful to scale batchsize with the number of frames. A rule of thumb is to set "num gpus" x "batch size" x "accu steps" ~= num frames. This means more video frames needs more GPU memory but the same optimization time.

Try pre-optimized models

We provide pre-optimized models and scripts to run novel view synthesis and mesh extraction (results saved at tmp/*all.mp4). Also see this Colab for NVS.

# download pre-optimized models
mkdir -p tmp && cd "$_"
wget https://www.dropbox.com/s/qzwuqxp0mzdot6c/cat-pikachiu.npy
wget https://www.dropbox.com/s/dnob0r8zzjbn28a/cat-pikachiu.pth
wget https://www.dropbox.com/s/p74aaeusprbve1z/opts.log # flags used at opt time
cd ../

seqname=cat-pikachiu
# render novel views
bash scripts/render_nvs.sh 0 $seqname tmp/cat-pikachiu.pth 5 0
# argv[1]: gpu id
# argv[2]: sequence name
# argv[3]: path to the weights
# argv[4]: video id used for pose traj
# argv[5]: video id used for root traj

# Extract articulated meshes and render
bash scripts/render_mgpu.sh 0 $seqname tmp/cat-pikachiu.pth \
        "0 5" 64
# argv[1]: gpu id
# argv[2]: sequence name
# argv[3]: weights path
# argv[4]: video id separated by space
# argv[5]: resolution of running marching cubes (use 256 to get higher-res mesh)

1. Optimization

[cat-pikachiu]
seqname=cat-pikachiu
# To speed up data loading, we store images as lines of pixels). 
# only needs to run it once per sequence and data are stored
python preprocess/img2lines.py --seqname $seqname

# Optimization
bash scripts/template.sh 0,1 $seqname 10001 "no" "no"
# argv[1]: gpu ids separated by comma 
# args[2]: sequence name
# args[3]: port for distributed training
# args[4]: use_human, pass "" for human cse, "no" for quadreped cse
# args[5]: use_symm, pass "" to force x-symmetric shape

# Extract articulated meshes and render
bash scripts/render_mgpu.sh 0 $seqname logdir/$seqname-e120-b256-ft2/params_latest.pth \
        "0 1 2 3 4 5 6 7 8 9 10" 256
# argv[1]: gpu id
# argv[2]: sequence name
# argv[3]: weights path
# argv[4]: video id separated by space
# argv[5]: resolution of running marching cubes (256 by default)
cat-pikachiu-.0.-all.mp4
[human-cap]
seqname=adult7
python preprocess/img2lines.py --seqname $seqname
bash scripts/template.sh 0,1 $seqname 10001 "" ""
bash scripts/render_mgpu.sh 0 $seqname logdir/$seqname-e120-b256-ft2/params_latest.pth \
        "0 1 2 3 4 5 6 7 8 9" 256
adult7-.8.-all.mp4

2. Visualization tools

[Tensorboard]
# You may need to set up ssh tunneling to view the tensorboard monitor locally.
screen -dmS "tensorboard" bash -c "tensorboard --logdir=logdir --bind_all"
[Root pose, rest mesh, bones]

To draw root pose trajectories (+rest shape) over epochs

# logdir
logdir=logdir/$seqname-e120-b256-init/
# first_idx, last_idx specifies what frames to be drawn
python scripts/visualize/render_root.py --testdir $logdir --first_idx 0 --last_idx 120

Find the output at $logdir/mesh-cam.gif. During optimization, the rest mesh and bones at each epoch are saved at $logdir/*rest.obj.

pose-20.mp4
[Correspondence/pose code]

To visualize 2d-2d and 2d-3d matchings of the latest epoch weights

# 2d matches between frame 0 and 100 via 2d->feature matching->3d->geometric warp->2d
bash scripts/render_match.sh $logdir/params_latest.pth "0 100" "--render_size 128"

2d-2d matches will be saved to tmp/match_%03d.jpg. 2d-3d feature matches of frame 0 will be saved to tmp/match_line_pred.obj. 2d-3d geometric warps of frame 0 will be saved to tmp/match_line_exp.obj. near-plane frame 0 will be saved to tmp/match_plane.obj. Pose code visualization will be saved at tmp/code.mp4.

pose-code.mp4
[Render novel views]

Render novel views at the canonical camera coordinate

bash scripts/render_nvs.sh 0 $seqname logdir/$seqname-e120-b256-ft2/params_latest.pth 5 0
# argv[1]: gpu id
# argv[2]: sequence name
# argv[3]: path to the weights
# argv[4]: video id used for pose traj
# argv[5]: video id used for root traj

Results will be saved at logdir/$seqname-e120-b256-ft2/nvs*.mp4.

nvs-pikachiu.mp4
[Render canonical view over iterations]

Render depth and color of the canonical view over optimization iterations

bash scripts/visualize/nvs_iter.sh 0 logdir/$seqname-e120-b256-init/
# argv[1]: gpu id
# argv[2]: path to the logdir

Results will be saved at logdir/$seqname-e120-b256-init/vis-iter*.mp4.

cat-pikachiu-vis-iter-iter-dph.mp4
cat-pikachiu-vis-iter-iter-rgb.mp4

Common install issues

[expand]
  • Q: pyrender reports ImportError: Library "GLU" not found.
    • install sudo apt install freeglut3-dev
  • Q: ffmpeg reports libopenh264.so.5 not fund
    • resinstall ffmpeg in conda conda install -c conda-forge ffmpeg

Note on arguments

[expand]
  • use --use_human for human reconstruction, otherwise it assumes quadruped animals
  • use --full_mesh to disable visibility check at mesh extraction time
  • use --noce_color at mesh extraction time to assign radiance instead canonical mapping as vertex colors.
  • use --queryfw at mesh extraction time to extract forward articulated meshes, which only needs to run marching cubes once.
  • use --use_cc maintains the largest connected component for rest mesh in order to set the object bounds and near-far plane (by default turned on). Turn it off with --nouse_cc for disconnected objects such as hands.
  • use --debug to print out the rough time each component takes.

Acknowledgement

[expand]

Volume rendering code is borrowed from Nerf_pl. Flow estimation code is adapted from VCN-robust. Other external repos:

License

[expand]

banmo's People

Contributors

gengshan-y avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

banmo's Issues

batch load or line load

Hi Gengshan,

Thank you so much for your great work.

I noticed that you use batch_load for evaluation and line_load for training in BANMo. I want to learn information from the whole image, so I try to use batch_load in training, but the performance will be degraded compared to the original line_load training. Do you have any suggestions? Thanks a lot!!

some problems when I run ./script/template.sh

Thanks for your great job.
When I run your code to fit my video.
there has some problem, in train_one_epoch()
I debug it and find the bug happened when

self._num_faces_per_mesh.unique() ==1 403 line in pytorch3d/meshes.py

I got *** RuntimeError: std::bad_alloc*** issue.
I don't know why? all the environments are installed according to your yaml.

2*3090 are used when I run your code.

Sharing pretrained weights on Hugging Face

Hello there!

First of all, thank you for open-sourcing your work! I really enjoyed reading your paper and learning about your work, plus I'm a big fan of the coauthors Pikachiu, Tetres, Haru, Coco, and Socks 🐕🐈 Would you be interested in sharing your models on the Hugging Face Hub?

The Hub makes it easy to freely download and upload models, and it can make models more accessible and visible to the rest of the ML community. It's good way to share useful metadata and metrics, and we also support features like TensorBoard visualizations and PapersWithCode integrations. Since models are hosted as Git repos, they're also automatically versioned with a commit history and diffs. You could even upload it to the already-existing Facebook AI organization.

We have a step-by-step guide that explains the process for uploading the model to the Hub, in case you're interested. We also have a library for programmatic access to uploading and downloading models, which includes features like caching for downloaded models.

Please let us know if you have any questions, and we'd be happy to guide you through the process!

Nima and the Hugging Face team

cc @osanseviero @lhoestq

technical problem

Hello! I'm having some problems reproducing the thesis work:
Example: Motion retargeting


Traceback (most recent call last):
File "preprocess/img2lines.py", line 111, in
app.run(main)
File "/opt/anaconda3/envs/banmo/lib/python3.8/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/opt/anaconda3/envs/banmo/lib/python3.8/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "preprocess/img2lines.py", line 53, in main
data_info = trainer.init_dataset()
File "/tmp/pycharm_project_829/./nnutils/train_utils.py", line 129, in init_dataset
self.dataloader = frameloader.data_loader(opts_dict)
File "/tmp/pycharm_project_829/./dataloader/frameloader.py", line 38, in data_loader
data_inuse = config_to_dataloader(opts_dict)
File "/tmp/pycharm_project_829/./utils/io.py", line 311, in config_to_dataloader
dataset = torch.utils.data.ConcatDataset(datalist)
File "/opt/anaconda3/envs/banmo/lib/python3.8/site-packages/torch/utils/data/dataset.py", line 199, in init
assert len(datasets) > 0, 'datasets should not be an empty iterable' # type: ignore
AssertionError: datasets should not be an empty iterable


**opts_dict={} in "./nnutils/train_utils.py" seems to require adding paths manually

When I modify the path, the following error occurs**


Traceback (most recent call last):
File "preprocess/img2lines.py", line 111, in
app.run(main)
File "/opt/anaconda3/envs/banmo/lib/python3.8/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/opt/anaconda3/envs/banmo/lib/python3.8/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "preprocess/img2lines.py", line 53, in main
data_info = trainer.init_dataset()
File "/tmp/pycharm_project_829/./nnutils/train_utils.py", line 113, in init_dataset
opts_dict['n_data_workers'] = '1'
TypeError: 'str' object does not support item assignment


How should the path be modified?

update_delta_rts

Hi, I have a question about the following function, update_delta_rts.

def update_delta_rts(self, rays):

Why do you compute rays['bone_rts'] by multiplying (bone_rts_rst)^(-1) * bone_rts_fw in correct_rest_pose function?
Is it right that you intend to multiply bone_rts_fw first and then multiply the inverse of bone rest transformation?

What I understand is that you first transform bones using bone_rts_rst, which results in restpose bones.
(restpose_bones = bones_rts_rst * bones)

Then, you make time t transformation using bone_rts_fw and bone_rts_rst in correct_rest_pose_function.
(bone_rts_fw = (bone_rts_rst)^(-1) * (bone_rts_fw) )

As shown in the rendering code, bones_dfm is computed using those two values.
(bone_dfm = bone_rts_fw * bones_rest
= (bone_rts_rst)^(-1) * bone_rts_fw * bones_rts_rst * bones

What I'm curious about is that why you multiply the inverse of bone_rts_rst and bone_rts_rst on before and after of bone_rts_fw matrix? What's the meaning of this multiplication?

Hope to hear from you soon.
Thank you.

Question about result : a difference between the results in the paper and my results

Thank you for nice work.

I have a question about the results in the paper.
I just followed your instruction for data processing, training, rendering and evaluation, which is described at scripts/README.md

Desired results should be same as
image

BUT my models shows poor performance results.
I will describe my results below.

AMA

  1. training model by using T_swing and T_samba simultaneously --> evaluation
    :: ave 11.4 chamfer distance about T_swing
    :: ave 10.7 chamfer distance about T_samba
  2. download some files by cmd wget https://www.dropbox.com/sh/n9eebife5uovg2m/AAA1BsADDzCIsTSUnJyCTRp7a -O tmp.zip --> evaluation (=same as your instruction)
    :: ave 9.1 chamfer distance about T_swing
  3. train model by using only T_swing videos --> evaluation (same process about T_samba, too)
    :: ave 8.6 chamfer distance about T_swing
    :: ave 8.2 chamfer distance about T_samba

Synthetic (hand, eagle)
:: ave 6.4 chamfer distance about eagle
:: ave 5.3 chamfer distance about hands

I used 2xA100 for training, and same conda environment as yours.

As you see, My all results show poor performance.
Here, My question is How can I get same results described in the paper?

Best regards.

Canonical mesh degenerates after epoch 1

Hi Gengshan,

Thanks for your great work!

After epoch 1 in the last training stage, I noticed that the canonical mesh always breaks down and needs a few epochs to get back to the correct shape (roughly) again. This happens in the trainings on all the datasets (cat, AMA, etc.). Do you know what causes this effect at epoch 1? Thanks!

For example, these are the canonical meshes after epoch 0 to epoch 3 in the last training stage on AMA human dataset (notice that after epoch 1 the mesh degenerates):

mesh_rest-0
epoch 0 canonical mesh in last training stage
mesh_rest-1
epoch 1 canonical mesh in last training stage
mesh_rest-2
epoch 2 canonical mesh in last training stage
mesh_rest-3
epoch 3 canonical mesh in last training stage

no camera path and keypoints.json in database

In I/O.py, you have variables 'rtklist' and ‘kplist’ which contain paths like 'database/DAVIS/Cameras/Full-Resolution/cat-pikachiu00/00000.txt' and 'database/DAVIS/KP/Full-Resolution/cat-pikachiu00/00000_keypoints.json'. But there are no relative files found after unzipping the dropbox files.

RuntimeError: std::bad_alloc

Optimization failed when the memory is available
our setup rtx2080ti * 2, Intel(R) Core(TM) i9-10900X CPU @ 3.70GHz, 32GB memory
log file attached

/home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3440: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, /home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) /home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3440: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, /home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) Traceback (most recent call last): File "/mnt/banmo/main.py", line 42, in <module> app.run(main) File "/home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/mnt/banmo/main.py", line 39, in main trainer.train() File "/mnt/banmo/nnutils/train_utils.py", line 684, in train self.train_one_epoch(epoch, log) File "/mnt/banmo/nnutils/train_utils.py", line 922, in train_one_epoch total_loss,aux_out = self.model(batch) File "/home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 886, in forward output = self.module(*inputs[0], **kwargs[0]) File "/home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/mnt/banmo/nnutils/banmo.py", line 650, in forward_default mesh_rest = pytorch3d.structures.meshes.Meshes( File "/mnt/banmo/third_party/pytorch3d/pytorch3d/structures/meshes.py", line 406, in __init__ if len(self._num_faces_per_mesh.unique()) == 1: File "/home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/torch/_tensor.py", line 530, in unique return torch.unique(self, sorted=sorted, return_inverse=return_inverse, return_counts=return_counts, dim=dim) File "/home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/torch/_jit_internal.py", line 422, in fn return if_false(*args, **kwargs) File "/home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/torch/_jit_internal.py", line 422, in fn return if_false(*args, **kwargs) File "/home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/torch/functional.py", line 821, in _return_output output, _, _ = _unique_impl(input, sorted, return_inverse, return_counts, dim) File "/home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/torch/functional.py", line 735, in _unique_impl output, inverse_indices, counts = torch._unique2( RuntimeError: std::bad_alloc WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 32696 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 32697) of binary: /home/hyang/.conda/envs/banmo-cu113/bin/python Traceback (most recent call last): File "/home/hyang/.conda/envs/banmo-cu113/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/hyang/.conda/envs/banmo-cu113/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/torch/distributed/launch.py", line 193, in <module> main() File "/home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/torch/distributed/launch.py", line 189, in main launch(args) File "/home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/torch/distributed/launch.py", line 174, in launch run(args) File "/home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/torch/distributed/run.py", line 710, in run elastic_launch( File "/home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/hyang/.conda/envs/banmo-cu113/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
log.log

Question about CSE and root pose

Hi there! Thanks for sharing this amazing work!

I have been reading your work for many days and I'm still confused about CSE:

  1. You have shown the good result of hand, eagle, laikago robot. Did you apply the CSE model trained for quadruped animals to these non-quadruped objects? If not, how did you train the CSE model for these objects? (especially for hands) Did you modify the dataset of hands so that it can be used to training CSE?
  2. Like what you said in CSE embedding for other object categories and banmo for reconstruction of cars, Does it mean that if I want to reconstruct a rigid object such as a cup or a car, I could get good results as long as a good enough initial root pose of it is given? So CSE and PoseNet are not necessary as long as I can get a good root pose by other means?

Any response will be greatly appreciated!

some bugs, when I render the results

Thanks for your great jobs.
There are some bugs when I render the results using pyrender.
My version of ubuntu is 20.04, and install all library according to your banmo.yml
the backtrack of error logger is:

libEGL warning: DRI2: failed to create dri screen
libEGL warning: DRI2: failed to create dri screen
Traceback (most recent call last):
  File "scripts/visualize/render_vis.py", line 537, in <module>
    main()
  File "scripts/visualize/render_vis.py", line 329, in main
    r = OffscreenRenderer(img_size, img_size)
  File "/home/linxia/anaconda3/envs/banmo/lib/python3.8/site-packages/pyrender/offscreen.py", line 31, in __init__
    self._create()
  File "/home/linxia/anaconda3/envs/banmo/lib/python3.8/site-packages/pyrender/offscreen.py", line 149, in _create
    self._platform.init_context()
  File "/home/linxia/anaconda3/envs/banmo/lib/python3.8/site-packages/pyrender/platforms/egl.py", line 177, in init_context
    assert eglInitialize(self._egl_display, major, minor)
  File "/home/linxia/anaconda3/envs/banmo/lib/python3.8/site-packages/OpenGL/platform/baseplatform.py", line 402, in __call__
    return self( *args, **named )
  File "/home/linxia/anaconda3/envs/banmo/lib/python3.8/site-packages/OpenGL/error.py", line 228, in glCheckError
    raise GLError(
OpenGL.error.GLError: GLError(
        err = 12289,
        baseOperation = eglInitialize,
        cArguments = (
                <OpenGL._opaque.EGLDisplay_pointer object at 0x7fad0deeb040>,
                c_long(0),
                c_long(0),
        ),
        result = 0
)

I don't know how to fix it.
If I change 'egl' to 'osmesa', which is too slow as it is cpu-only.

Canonical embeddings matching

Hi, thanks for your impressive work.

I am trying to understand your 2d-2d matching part. However, when I print the learned matching matrix (prob_vol), it seems all the elements are very close. As I know, it may mean that Banmo does not learn a matching between 2d and 3d features. So what does this part really do? Thanks!

Qusetion about the Caluculation of the Mahalanobis Distance

Hello, thanks for your great work!
I have one question.
In

mdis = mdis*100*log_scale.exp() # TODO accound for scaled near-far plane

and
mdis = (-10 * mdis.sum(3)) # bs,N,B

when you are calculating Mahalanobis Distance, what are you doing here?
Why mdis need to first multiply 100 and log_scale.exp() and then (-10 * mdis.sum(3))?
What is the log_scale from skin_aux?

Thanks! :)

Data preprocess

#Thanks for your excellent work!
##An error is displayed during data preprocessing.The environment configuration is configured according to the “banmo.yml” you provided. Tested with video of my own data, but could not preprocess. Using the series of videos(“cat-pikachiu.mov”) to process, it still reports an error.
the backtrack of error logger is:
bash preprocess/preprocess.sh ”cat-pikachiu“ .MOV n 10
image

Training BANMo using my own videos

Hi, thanks for your work!

I try to train BANMo using my own videos, I preprocess the videos as you instructed. However, when I see the segmentation results in DAVIS/Annotations, the results seem not to be preprocessed like adult7 and cat-pikachiu. Like the following image, there are different colors on the cat. When I check my own videos, there is only one color and the detection result is not good. Did I miss something? Thanks!
vis-00097

What is the warmup stage doing exactly?

Hi, thanks for releasing the code!

I attempted training on the human video, and saw that nerf_coarse is trained for 5 epochs to warmup using an SMPL mesh loaded from mesh_material folder.

However, when I visualized it from tmp/smpl_27554.obj in blender it just seemed to be an ellipsoid [image below], is this expected?

Appreciate your thoughts, thanks!

image

left multiply or right multiply?

Thanks for your great work!

May I ask why you sometimes apply left multiply on Rmat, and sometimes right multiply? Why not use the same formation? What's the difference? I'm really confused about that.

Pre-trained model?

Hi Gengshan,

First of all, great work! Do you plan to release pretrained model on the demo sequences? I was trying to understand your code better by running some demo (for example, NVS on cat-pikachiu as you suggested on the README page). It will make it much easier to use your code!

Thank you!

technical discussion

I want to apply banmo to car modeling, but I find it takes too much time to retrain every reconstruction, even if there is only a slight difference between the two cars.Is there a way to make banmo train only once, and then infer on various car videos to get 3D models of different cars.I can provide a car video and hope to have a category-level end-to-end 3D reconstruction method. If you have relevant research papers, can you share them? Thank you.

Details on bone initialization, skinning function and training schedule

Hi Gengshan,

Following up on my last issue, I have some trouble understanding the way you initialize the "bones" and your skinning function. It will be wonderful if you can shed some light on some details .

  1. Bone initialization
    I am looking at your generate_bones method and it seems that bones are always initialized at origins. Is that intended? And if so, how do you center/scale the original cat-pikachu scene? In another word, what is the "unit" for PoseNet output? My understanding is that it is in the object space since it is trained on synthetic data with 360 camera sampling.

  2. Bone reinitialization
    It seems that bones are reinitialized at 2/3 num_epochs where bones are resampled by K-Means based on the canonical shape. I am wondering how do you reinitialize the bones at each video frame. Thank you!

  3. Skinning function
    I am a bit confusesd by the skinning function here:

    banmo/nnutils/geom_utils.py

    Lines 226 to 236 in ff5df1d

    mdis = center.view(bs,1,B,3) - pts.view(bs,N,1,3) # bs,N,B,3
    if True:#B<50:
    mdis = axis_rotate(orient.view(bs,1,B,3,3), mdis[...,None])
    #mdis = orient.view(bs,1,B,3,3).matmul(mdis[...,None]) # bs,N,B,3,1
    mdis = mdis[...,0]
    mdis = scale.view(bs,1,B,3) * mdis.pow(2)
    else:
    # for efficiency considerations
    mdis = mdis.pow(2)
    mdis = mdis*100*log_scale.exp() # TODO accound for scaled near-far plane
    mdis = (-10 * mdis.sum(3)) # bs,N,B

Specifically I am wondering why is there 100 and -10 in the blending? Is it effectively the same if we initialize log_scale to log(1000) and keep the -1 part for RBF weight?

  1. Training schedule
    Thank you for kindly sharing your training script. I would really appreciate it if you can explain a bit more -- I cannot find any description in the original paper. Here is where I am looking at:

# mode: line load
savename=${model_prefix}-init
bash scripts/template-mgpu.sh $gpus $savename \
$seqname $addr --num_epochs $num_epochs \
--pose_cnn_path $pose_cnn_path \
--warmup_shape_ep 5 --warmup_rootmlp \
--lineload --batch_size $batch_size\
--${use_symm}symm_shape \
--${use_human}use_human
# mode: pose correction
# 0-80% body pose with proj loss, 80-100% gradually add all loss
# freeze shape/feature etc
loadname=${model_prefix}-init
savename=${model_prefix}-ft1
num_epochs=$((num_epochs/4))
bash scripts/template-mgpu.sh $gpus $savename \
$seqname $addr --num_epochs $num_epochs \
--pose_cnn_path $pose_cnn_path \
--model_path logdir/$loadname/params_latest.pth \
--lineload --batch_size $batch_size \
--warmup_steps 0 --nf_reset 1 --bound_reset 1 \
--dskin_steps 0 --fine_steps 1 --noanneal_freq \
--freeze_proj --proj_end 1\
--${use_symm}symm_shape \
--${use_human}use_human
# mode: fine tunning without pose correction
loadname=${model_prefix}-ft1
savename=${model_prefix}-ft2
num_epochs=$((num_epochs/2))
bash scripts/template-mgpu.sh $gpus $savename \
$seqname $addr --num_epochs $num_epochs \
--pose_cnn_path $pose_cnn_path \
--model_path logdir/$loadname/params_latest.pth \
--lineload --batch_size $batch_size \
--warmup_steps 0 --nf_reset 0 --bound_reset 0 \
--dskin_steps 0 --fine_steps 0 --noanneal_freq \
--${use_symm}symm_shape \
--${use_human}use_human
# mode: final tunning with larger rgb loss wt and reset beta
loadname=${model_prefix}-ft2
savename=${model_prefix}-ft3
bash scripts/template-mgpu.sh $gpus $savename \
$seqname $addr --num_epochs $num_epochs \
--pose_cnn_path $pose_cnn_path \
--model_path logdir/$loadname/params_latest.pth \
--lineload --batch_size $batch_size \
--warmup_steps 0 --nf_reset 0 --bound_reset 0 \
--dskin_steps 0 --fine_steps 0 --noanneal_freq \
--img_wt 1 --reset_beta --eikonal_loss \
--${use_symm}symm_shape \
--${use_human}use_human

Why are there four stages? And what does these four stages do exactly? Specifically, I would like to understand how bones are reinitialized in these context. I can roughly see that you are trying to fine-tune the root motion, cameras and bones, respectively.

I realized these are a lot of questions to answer (sorry!). But I guess there might be other folks share similar confusion since it is quite a chunk of code :) And I have not managed to run the code so far, all the questions are purely from plain code reviewing -- apologies if there's any I have missed!

Thank you!!!
Hang

Minimum amount of cameras /videos?

Hello! Thanks a lot for sharing this!
I’m very curious, what is the minimum amount of cameras / videos to recreate animated human shape? (I really hope it is okay to use 4x GoPro)

near_far, obj_scale, bound

Hi, thanks again for your awesome work!

Could you please tell me the means and relationships of parameter 'near_far', 'obj_bound', 'obj_scale', 'bound', and 'bound_factor'? I am so confused about these parameters.

Also, I can understand you set the object center localized at z=0.3 in the world space, and design the 'warmp_shape' to initialize the object as a small sphere by training the SDF. But why set the near_far (initialized by 0-0.6) as a learning parameter (reset_nf) instead of a fixed hyperparameter?

Questions about the synthetic datasets

Hi Gengshan,

Thanks for the great work!

I have a couple of questions regarding the synthetic datasets (Eagle and Hands) and the other results on your website:

  1. The instructions on synthetic datasets use the ground truth camera poses in training. However, the paths to the rtk files are commented out in the config. If I directly use this config, it won't use the ground truth camera poses in the training right?

  2. I followed the same instructions for Eagle dataset preparation, but it does not save the rtk files to the locations specified in the config, should I manually change the paths?

  3. Have you tried running BANMo optimizations on Eagle and Hands without the ground truth camera poses? And if so, how's the result visually and quantitatively (in terms of Chamfer Distance and F-scores)?

  4. I noticed that you have results of more objects such as Penguins, Robot-Laikago etc. on your website. Do you know where I can get access to these datasets as well?

Adaptation to a new video

Hi, thanks for your work!

I am trying to adapt a pre-trained model to a new video. But if I did not use template-prior-model.sh and just used the pre-trained model, I cannot load cameras from init-cam. The camera parameters will be wrong. Do you have any suggestions if I do not want to retrain the pre-trained models? Thanks!

Does rest shape have to fit any specific frame in video ?

Hello, thank you for your great work.

I am wondering if the rest shape has to fit a pose in a specific frame of the video, e.g., first frame or last frame ?
If so, can I know which part in the source code does that ?
If not, how can you control the rest shape to be in rest pose like human or cat standing straight like in the visualization ?

Question regarding using pre-trained models.

Dear GengShan,
Hi! I am trying to use your pre-optimized models for evaluation. Unfortunately when I call the scripts, it says

FATAL Flags parsing error: ERROR:: Unable to open flagfile: [Errno 2] No such file or directory: 'tmp/cat-pikachiu/opts.log

I don't know how the arg-parsing works for the absl system and do you have any suggestions to bypass the need of opts.log?

Thanks.

Problems about synthetic data

Hi. I'm trying to use the synthetic generator. I want to know what's the unit of the focal length and depth in render_synthetic.py?

bones_dfm or bones_rst when using gauss_mlp_skinning?

Hi, many thanks for your excellent work!

May I ask (1) why using 'bones_dfm' instead of 'bones_rst' in the below line? Whether (2) the 'skin_backward' is the LBS weight applied to sample pts in the camera coordinate and transfer them to the root coordinate? and (3) the 'xyz_coarse_sampled' is the sample pts in the root coordinate? If so, it seems like using 'bones_rst' here makes sense.

bones_dfm, time_embedded, nerf_skin, skin_aux=skin_aux)

ext_utils not found problem

Hello there! Thanks for sharing the amazing work!

I have some problems about the file ext_utils. I can't find it in the repo. Consequently I can't run the code. I would like to know where I can get this file. Any response will be greatly appreciated!

Question for Nerfies experiment

Thanks for your great work!
I have a question about the Nerfies experiment mentioned in your paper.
Nerfies uses colmap to conduct camera registration and scene-related calculation (scene scale and scene center), but banmo doesn't use colmap.
I also want to conduct the experiment mentioned in your paper. Is the related code released? Or any other guidance?

Question about rendering

Hello,

Thank you for you brilliant job and sharing your code.

I am trying to run your project on my own computer. Until optimization, it works well under the auto-built environment. But in the rendering process, I met lots of "no mesh found" and error report No such file or directory: 'logdir/cat-pikachiu-e120-b256-ft3/opts.log'.

I can see that the problem is that in the rendering command: bash scripts/render_mgpu.sh 0 $seqname logdir/$seqname-e120-b256-ft3/params_latest.pth "0 1 2 3 4 5 6 7 8 9 10" 256, files in the folder with suffix "-ft3" is needed. But only "-ft1" and "-ft2" folder is generated in optimization step, according to template.sh, "-ft3" folder is likely not generated.

By the way, by searching globally in the whole project, "-ft3" only appears in scripts/abalations/template-human-noactive.sh and scripts/abalations/template-human-nosymm.sh. Since I haven't understand fully of your code yet, I am not sure when and where I could reach the generation of "ft3" folder. If it is convenience, could you simply explain what is the meaning of "ft"?

Could you please give me some advice on the problem I met now in rendering step? Thank you!

About the fraction occupied

Hi, thanks for your great work.

I want to ask about the meaning of the fraction occupied. Could you please introduce the relationship between the fraction occupied and the quality of the reconstructed result? It doesn't look like the smaller the better. Thanks!

No "from pytorch3d import _C"

I have ran your colab demos , but is seems no _C under /content.banmo/third_party/pytorch3d/pytorch3d/

Can u pls tell me how to fix that? Thanks

some problems when i try pre-optimized models according to the suggestion

When I enter “bash scripts/render_nvs.sh 0 $seqname tmp/cat-pikachiu.pth 5 0”, the following log comes out. The error causes the command to exit abnormally.I am sorry to bother you again.

-------------------------------logs--------------------------------------------------------------------
screenshot_1

I have uploaded the logs to the attachment.
Untitled 2.odt
--------------------------------command--------------------------------------------------------------------

screenshot

Questions by using pre-optimized models

Dear GengShan,

Hi! Many thanks for your awesome job! I am trying to use your pre-optimized models for evaluation.

bash scripts/render_mgpu.sh 0 human-cap logdir/human-cap/human-cap.pth "0" 64

However, the results seem very strange as below, including the viewing directions and model motions.

human-cap-{0}-all
Could you please tell me what's happening to these questions?

Thanks!

Questions on the evaluation on root pose prediction

Hi Gengshan,

Thanks for the great work!

I was wondering how the rotation error in degrees are computed in Table 4 in Appendix C.1. The caption says " Rotations are aligned to the ground-truth by a global rotation under chordal L2 distances", does this mean that the rotation error reported in table is the angle of the "global rotation" about a single axis from prediction to ground-truth?

It would be super helpful for me to understand it if you could point me to the script that computes this rotation error. Thanks!

CSE embedding for other object categories

Hi, thanks for sharing the code. I really like your work. I want to try BANMO on objects that are neither human nor four-legged animals but it seems like BANMO assumes that. Would the results be degenerative if the CSE embeddings are not pre-trained. I noticed Tab. 5 mentioned "pre-trained embedding" and said the pre-training is not too important as long as the initial pose is good?

Generation of synthetic datasets

Hi Gengshan,

Thanks for your great work!

I have a couple of questions about the synthetic datasets (Eagle).

  • How to generate the 120 time-sequence obj from the one-instance model?
  • Is there any motion sequence used?
  • I want to build cat synthetic datasets to evaluate results. Have you ever tried this and held these kinds of datasets?

Training pipeline of PoseNet

Hello there!

Thanks a lot for sharing your work!

I have a couple of questions:

1.What is the dataset you used to train the PoseNet for root pose initialization?
2.What is the occ from the optical flow model? It seems that it's loaded in dataloader but is not used anywhere in the training.

Visualize of articulated shape

Many thanks for your great job!

Could you tell me how to generate dynamic 3d models like the "Articulated shape:" of cat-pikachiu on this page?

Bone reinitialization

Hi, thank you for sharing your code.

I have a question on the following line, correct_bones in the process of bone reinitialization.

bones,_ = correct_bones(model, bones, inverse=True)

Why do you correct bones using the inverse of rest pose transformation?
Then, are the bones in the rest pose or another state?

As far as I know, you transform bones again into the rest pose bones using the rest pose transformation afterwards.
(

bones_rst, bone_rts_rst = correct_bones(self, self.nerf_models['bones'])
)

I’m wondering what is the purpose of rest pose?
Also, why did not you use the default bones(before multiplying the inverse of rest pose transformation) as rest pose?

Thank you.
Hope to hear from you soon!

About the 2D keypoint transfer metric

Hi, thanks for publishing this wonderful code.

I want to ask that is it possible to evaluate the reconstructed 3D model using the 2D keypoint transfer metric defined in ViSER and LASR? Because BANMo has the ability to register the correspondences between frames so I thought it is possible. But because in the main paper you use Chamfer distance instead of 2D keypoint transfer, so I am also wondering why we can't use 2D keypoint transfer metric, is it inappropriate in this case?
Thank you very much.

Question about mesh of Viser in Banmo

Thank you for your extremely good work.
I had a question reproducing viser in that the mesh generated for the cat was smoother but poorly shaped than the mesh you put in the banmo. (Compare below, left is the mesh in banmo, right is the mesh I reran viser to generate)
image
As well as in ama-female, and the one we generated would tend to have no hands (as shown in the comparison below)
image
The images above are all n_faces=8000 in the code and are fully trained by the official viser script.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.