GithubHelp home page GithubHelp logo

jiahuilei / gart Goto Github PK

View Code? Open in Web Editor NEW
238.0 9.0 12.0 75.73 MB

GART: Gaussian Articulated Template Models

Home Page: https://www.cis.upenn.edu/~leijh/projects/gart/

License: MIT License

Jupyter Notebook 8.24% Shell 0.55% Python 77.29% CMake 0.18% C++ 2.72% Cuda 10.90% C 0.12%

gart's Introduction

GART: Gaussian Articulated Template Models

Project Page; Arxiv; Youtube video; Bilibili video

  • 2023.Nov.27: Code pre-release.

teaser

Install

Our code is tested on Ubuntu 20.04 and cuda 11.8, you may have to change the following install script according to your system. To install, run:

bash install.sh

Prepare Data

Template Models and Poses

SMPL for human body

Download SMPL v1.1 SMPL_python_v.1.1.0.zip from SMPL official website and move and rename SMPL_python_v.1.1.0/smpl/models/*.pkl to PROJECT_ROOT/data/smpl_model So you can get:

PROJECT_ROOT/data/smpl_model
    ├── SMPL_FEMALE.pkl
    ├── SMPL_MALE.pkl
    ├── SMPL_NEUTRAL.pkl

# to check the version of SMPL, here is the checksum of female pkl we are using
cksum SMPL_FEMALE.pkl
3668678829 247530000 SMPL_FEMALE.pkl
AMASS (generation only)

Download the SMPL-X N package of BMLrub subset from AMASS, unzip, and put it into PROJECT_ROOT/data/amass/BMLrub.

For generation application, you only have to download the SMPL models and AMASS poses. You can go to generation section directly and skip the following data downloading steps.

D-SMAL for dog body

If you want to work with dogs, please download smal_data from BITE and put this folder to lib_gart/smal/smal_data so you get:

lib_gart/smal/smal_data
    ├── mean_dog_bone_lengths.txt
    ├── my_smpl_data_SMBLD_v3.pkl
    ├── my_smpl_SMBLD_nbj_v3.pkl
    ├── new_dog_models
    │   ├── 39dogsnorm_newv3_dog_dog_0.obj
    ...
    │   └── X_scaled_39dogsnorm_newv3_dog.npy
    └── symmetry_inds.json

Real Videos

ZJU-MoCap

We use the data from Instant-nvr. Note the poses from Instant-nvr is different from the original ZJU-MoCap, please follow the instructions in Instant-nvr to download their data: download ZJU-MoCap and their smpl model smpl-meta.tar.gz. link the unzipped data so you have:

PROJECT_ROOT/data/
    ├── smpl-meta
    │   ...
    │   └── SMPL_NEUTRAL.pkl
    └── zju_mocap
        ├── my_377
        ├── ...
        └── my_394
PeopleSnapshot

We use the data from InstantAvatar, including their pre-processed poses (already included when you clone this repo). First download the data from PeopleSnapshot. Then run DATA_ROOT=PATH_TO_UNZIPPED_PEOPLESNAPSHOT bash utils/preprocess_people_snapshot.sh to prepare the data.

Dog

Download our preprocessed data (pose estimated via BITE):

cd PROJECT_ROOT/data
gdown 1mPSnyLClyTwITPFrWDE70YOmJMXdnAZm
unzip dog_data_official.zip
UBC-Fashion

Download our preprocessed data (6 videos with pose estimated via ReFit):

cd PROJECT_ROOT/data
gdown 18byTvRqOqRyWOQ3V7lFOSV4EOHlKcdHJ
unzip ubc_release.zip

All the released data and logs can be downloaded from google drive.

Fit Real Videos

We provide examples for the four dataset in example.ipynb.

And all fitting and testing scripts can be found under script.

For example, if you want to fit all people snapshot video, run:

bash script/fit_people_30s.sh

You may found other useful scripts under script. The running time of each configuration is obtained from this laptop with an RTX-3080 GPU with 16GB vRAM and an Intel i9-11950H processor with 4x16GB RAM. We observe that for some unkown reason the training is slower on our A40 cluster.

Text2GART

Please see text2gart.ipynb for an example.

Acknowledgement

Our code is based on several interesting and helpful projects:

TODO

  • clean the code
  • add bibtex

gart's People

Contributors

jiahuilei avatar

Stargazers

XieChen avatar  avatar Chief Accelerator avatar fighting! avatar 최연우(Yonwoo Choi) avatar Yuan Liu avatar Yifan Liu avatar  avatar  avatar Junyi Zhang avatar Christen Millerdurai avatar Yuyang Li avatar Zan Wang avatar Seokju Yun avatar Tosin avatar  avatar Xiaobing Han avatar Jichao Zhang avatar  avatar jbji avatar Qi Sun 孙启 avatar Ailing Zeng avatar ZqlwMatt avatar  avatar  avatar  avatar Jotaro Sakamiya avatar  avatar  avatar HillChen avatar Zhuoran Zhao avatar Baoxiong Jia avatar JIAQI LI avatar fantastic_levio avatar  avatar Ruihan Lu avatar zhang avatar TeaWhite avatar  avatar  avatar Ellis avatar Huang avatar Mengqi Guo avatar  avatar Siyuan Li avatar Euphoria avatar Jingbo  avatar Rustam G avatar  avatar Zhifan Ye avatar Xiangyue Liu avatar  avatar LALaLaLA avatar 某科学的苏打汽水 avatar  avatar  avatar Pengcheng Yu avatar Kchen avatar Zhouzichen avatar ShandongWang avatar Sergey Prokudin avatar Yusen Feng avatar kkot_ avatar Jiayi Liu avatar Inckie avatar  avatar  avatar Kun&Qi avatar Guo Xun avatar  avatar  avatar WANGYifan avatar  avatar Filip Anjou avatar laodar avatar Yanhao Zhang avatar  avatar Tingting Liao avatar  avatar  avatar 918 avatar  avatar LIU YIYING avatar weilunhuang@jhu avatar Xiaokun Sun avatar  avatar Shuai Yang avatar neophack avatar RanYu avatar Weidong Liu avatar Youngju Na avatar Zhengming Yu avatar Leon Li avatar wzjwzjwzj avatar  avatar  avatar Inferencer avatar  avatar An Liang avatar Ren Yurui avatar

Watchers

 avatar signal processing fan avatar Snow avatar  avatar Glory Chen avatar peter avatar Mingwei Li avatar  avatar Quentin Leboutet avatar

gart's Issues

About dL_depth and dL_mean

Hi Jiahui

Thanks for your code releasing!

I am trying to develop the depth supervision based on your lib_render and noticed that you have modify the differentiable_renderer in lib_render

dL_dmean

It seems that the gradients of mean3D did not include depth and alpha for back progration.I would like to know if this is intentional on your part?(I mean will this make the results more better? etc..) Based on my understanding, the gradient from dL_ddepths and dL_dalphas should also be include to the gradient of means3D.

Results on UBC data is noisy

Hi, thanks for this excellent work.
The results on people_snapshot_public are quite good. But when I run on UBC data, the results are quite noisy, as:
cano-pose
May I ask where is the problem? Thanks

Meaning of (A dot A0_inv) of SMPLTemplate

First and foremost, I'd like to express my appreciation for the outstanding work. I have a query regarding a specific part of the implementation.

In the forward function of SMPLTemplate, there's a line where the transformation matrix A is calculated as follows:
A = torch.einsum("bnij, njk->bnik", A, self.A0_inv)
I'm seeking clarification on the conceptual meaning behind the operation A dot A0_inv. As I understand it, A0 represents the relative transformation from the SMPL default pose joints (J) to the DaPose joints (J_dapose). Consequently, inv(A0) should denote the transformation from J_dapose back to J.

If my understanding is correct, then A in this context represents the transformation from J_dapose to the joints in the theta pose (J_theta). Could you please confirm if my interpretation is accurate or provide further explanation of A dot A0_inv if necessary?

Thank you in advance for your assistance!

Relevant code are attached here for convenience.

init_smpl_output = self._template_layer(
    betas=init_beta[None],
    body_pose=can_pose[None, 1:],
    global_orient=can_pose[None, 0],
    return_full_pose=True,
)
J_canonical, A0 = init_smpl_output.J, init_smpl_output.A
A0_inv = torch.inverse(A0)

def forward(self, theta=None, xyz_canonical=None):
            # skinning
            _, A = batch_rigid_transform(
                axis_angle_to_matrix(theta),
                self.J_canonical.expand(nB, -1, -1),
                self._template_layer.parents,
            )
            A = torch.einsum("bnij, njk->bnik", A, self.A0_inv) 

VRAM usage and mesh generation

Hi, thanks for developing a fascinating algorithm!

I have just attempted running it on an Nvidia RTX 4070 with 12 GB VRAM, and get a CUDA out of memory error when running bash script/fit_people_30s.sh (it attempts to allocate ~13GB). I now wonder if there are any simple ways of reducing the VRAM usage during the fitting of the model — for example, which parameters in profiles/people/people_30s.yaml have the largest effect on memory?

Additionally, what would be the preferred way of generating a textured mesh from the learned avatar?

Thanks,
Filip

Meaning of A (rot) and A (t)

First and foremost, I want to express my admiration for the incredible work!
I have a bit of confusion, why we use left top 3 × 3 and the right 3 × 1 block rather than other settings?

Exported .ply don't look like rendered out images

I used the save_gauspl_ply function in lib_gart/model_utils.py to export the gaussian for the training view but the results don't look the same as the exported renders and I'm trying to figure out why.

Is it a bug in the save function or something you do differently in the renderer?

Here's an example using the lab scene from the Neumann dataset with InstantAvatar format

first-frame

It's not the best looking but you can still see his facial details in the gif but they seem to get lost somehow on the .ply export. here's a copy of the .ply visualized in antimatter's web viewer
neumann_lab_exported_ply

when use custom data, the result is bad

how can I get the cameras.npz and train_pose.npz? when I use the data which processed by the instantAvatar, after ./scripts/fit.sh , the resutl is bad, like this: can you give me some suggestions?
test

Rendering from different views

If I want to render around the object in a circular path, how can I achieve that? Looking forward to your reply, thank you!

Code release of Banana

I recently read your paper titled "Banana" and found it extremely insightful. I'm interested in experimenting with the techniques you've described. Could you please let me know when you plan to release the code?

AttributeError: 'SMPLOutput' object has no attribute 'J'

When I run this program, the following error occurs.
s8maolab | INFO | Dec-13-16:37:57 | Optimization with ./profiles/zju/zju_30s.yaml [solver.py:1389]
WARNING: You are using a SMPL model, with only 10 shape coefficients.
Using predefined pose: da_pose
python-BaseException
Traceback (most recent call last):
File "/.pycharm_helpers/pydev/pydevd.py", line 1500, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/GART/solver.py", line 1390, in
_, optimized_seq = solver.run(data_provider)
File "/GART/solver.py", line 786, in run
) = self._get_model_optimizer(betas=init_beta, add_bones_total_t=total_t)
File "/GART/solver.py", line 181, in _get_model_optimizer
template = get_template(
File "/GART/lib_gart/templates.py", line 29, in get_template
template = SMPLTemplate(
File "/GART/lib_gart/templates.py", line 69, in init
J_canonical, A0 = init_smpl_output.J, init_smpl_output.A
AttributeError: 'SMPLOutput' object has no attribute 'J'

smpl data correction

I note in zju_mocap.py, you correct the smpl data like this.

        # Use camera extrinsic to rotate the simple to each camera coordinate frame!

        # the R,t is used like this, stored in cam
        # i.e. the T stored in cam is actually p_c = T_cw @ p_w
        # def get_rays(H, W, K, R, T):
        #     # calculate the camera origin
        #     rays_o = -np.dot(R.T, T).ravel()
        #     # calculate the world coodinates of pixels
        #     i, j = np.meshgrid(
        #         np.arange(W, dtype=np.float32), np.arange(H, dtype=np.float32), indexing="xy"
        #     )
        #     xy1 = np.stack([i, j, np.ones_like(i)], axis=2)
        #     pixel_camera = np.dot(xy1, np.linalg.inv(K).T)
        #     pixel_world = np.dot(pixel_camera - T.ravel(), R)
        #     # calculate the ray direction
        #     rays_d = pixel_world - rays_o[None, None]
        #     rays_d = rays_d / np.linalg.norm(rays_d, axis=2, keepdims=True)
        #     rays_o = np.broadcast_to(rays_o, rays_d.shape)
        #     return rays_o, rays_d

        # ! the cams R is in a very low precision, have use SVD to project back to SO(3)
        for cid in range(num_cams):
            _R = self.cams["R"][cid]
            u, s, vh = np.linalg.svd(_R)
            new_R = u @ vh
            self.cams["R"][cid] = new_R

        # this is copied
        smpl_layer = SMPLLayer(osp.join(osp.dirname(__file__), "../data/smpl-meta/SMPL_NEUTRAL.pkl"))

        # * Load smpl to camera frame
        self.smpl_theta_list, self.smpl_trans_list, smpl_beta_list = [], [], []
        self.meta = []
        for img_fn in self.ims:
            cam_ind = int(img_fn.split("/")[-2])
            frame_idx = int(img_fn.split("/")[-1].split(".")[0])
            self.meta.append({"cam_ind": cam_ind, "frame_idx": frame_idx})
            smpl_fn = osp.join(root, "smpl_params", f"{frame_idx}.npy")
            smpl_data = np.load(smpl_fn, allow_pickle=True).item()
            T_cw = np.eye(4)
            T_cw[:3, :3], T_cw[:3, 3] = (
                np.array(self.cams["R"][cam_ind]),
                np.array(self.cams["T"][cam_ind]).squeeze(-1) / 1000.0,
            )

            smpl_theta = smpl_data["poses"].reshape((24, 3))
            assert np.allclose(smpl_theta[0], 0)
            smpl_rot, smpl_trans = smpl_data["Rh"][0], smpl_data["Th"]
            smpl_R = axangle2mat(
                smpl_rot / (np.linalg.norm(smpl_rot) + 1e-6), np.linalg.norm(smpl_rot)
            )

            T_wh = np.eye(4)
            T_wh[:3, :3], T_wh[:3, 3] = smpl_R.copy(), smpl_trans.squeeze(0).copy()

            T_ch = T_cw.astype(np.float64) @ T_wh.astype(np.float64)

            smpl_global_rot_d, smpl_global_rot_a = mat2axangle(T_ch[:3, :3])
            smpl_global_rot = smpl_global_rot_d * smpl_global_rot_a
            smpl_trans = T_ch[:3, 3]  # 3
            smpl_theta[0] = smpl_global_rot
            beta = smpl_data["shapes"][0][:10]

            # ! Because SMPL global rot is rot around joint-0, have to correct this in the global translation!!
            _pose = axis_angle_to_matrix(torch.from_numpy(smpl_theta)[None])
            so = smpl_layer(
                torch.from_numpy(beta)[None],
                body_pose=_pose[:, 1:],
            )
            j0 = (so.joints[0, 0]).numpy()
            t_correction = (_pose[0, 0].numpy() - np.eye(3)) @ j0
            smpl_trans = smpl_trans + t_correction

            self.smpl_theta_list.append(smpl_theta)
            smpl_beta_list.append(beta)
            self.smpl_trans_list.append(smpl_trans)

Do this correction has any reference? Since during the correction, camera pose is used, it means one frame has different smpl data in different views. It feels quiet strange...

training condition for own dataset

Hello
I tried neuman dataset using UBC-dataset training condition and peoplesnapshot dataset training condition.
but the results was not so good.
Should we fine tune hyper parameters by ourselves?
I'd like to know how to optimize i, or this is a expected results?
da-pose

May I know how to get the LPIPS* score?

Hi, thanks for the excellent work. In the Table 3. of your paper, you used "LPIPS*" (with a star) as a metric instead of normal "LPIPS" (which is usually zero point something), may I know how to calculate the "LPIPS*" score? Thanks
image

Custom data

Congratulations on this excellent work!
I wonder how to run this work on my own data. For example, after capturing a monocular video, how to run your method? How should I process the data for training?
Many thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.