caizhongang / smpler-x Goto Github PK

View Code? Open in Web Editor NEW

837.0 837.0 53.0 140.89 MB

Official Code for "SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation"

Home Page: https://caizhongang.github.io/projects/SMPLer-X/

License: Other

Python 92.01% C++ 2.17% C 1.74% Cuda 3.81% Shell 0.27%

smpler-x's People

Contributors

Stargazers

Watchers

smpler-x's Issues

How to directly output the BVH file？

Hi, thanks for your great work.
As in README, we can get the obj file and the mesh jpg. But my goal is to obtain the BVH file which contains the pose keypoints.
We know our project should have the step that extract the pose keypoints like:

So what should we do if we want the BVH file?

Could find no file with path '../demo/result/xxx/img/%06d.jpg' and index in the range 0-4

During the docker support stage, when I run the

docker run  --gpus all -v <vid_input_folder>:/smplerx_inference/vid_input \
        -v <vid_output_folder>:/smplerx_inference/vid_output \
        wcwcw/smplerx_inference:v0.2 --vid <video_name>.mp4

It shows "No adapter used", but It seems that my structure is ok. Is there anything wrong?

Could find no file with path '../demo/results/walking/img/%06d.jpg' and index in the range 0-4

When I try to inference, it shows these errors. Can someone tell me why and how to deal with it?

about hand token

hello, i just found that it does not use the hand_token variable by inference. Is this intentional?Because i get the hand result not well

ModuleNotFoundError: No module named 'vit_adapter_utils'

When running the inference with specified adapter_name="vit_adapter" I keep running into this error.

About Paper

Thanks for sharing this amazing work ! I am wondering if the corresponding paper have released?

Described installation protocol does not work?!

We were not able to get the project to run given the provided instructions at all (maybe fixing requirements would help).

The only thing that worked for us was to work based on the Dockerfile within the provided image, which boils down to (if in doubt, look at the Dockerfile) - NOTE: only tested this for inference:

conda create -n smplerx python=3.9.12 -y
conda activate smplerx
sudo apt-get update && sudo apt-get install -y ffmpeg
sudo apt-get install -y python3-opengl libosmesa6
pip install mmcv-full==1.7.1 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.12.0/index.html
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch -y
cd SMPLer-X/main/transformer_utils/
pip install -v -e .
pip install torchgeometry --find-links .
pip install -r {PATH_TO_FILE}/requirements_fixed.txt
conda update ffmpeg

extract the config.py and conversions.py from the Docker image (because of version conflicts) and place them wherever you installed the respective packages mmcv and torchgeometry (replace the existing files ofc). The following are the commands from within the docker container (assuming it runs on the same machine).

{COPY} smplerx/main/transformer_utils/mmcv/utils/config.py {WHEREVER_YOUR_CONDA_PACKAGES_ARE}/site-packages/mmcv/utils/config.py
{COPY} smplerx/main/transformer_utils/torchgeometry/core/conversions.py {WHEREVER_YOUR_CONDA_PACKAGES_ARE}/site-packages/torchgeometry/core/conversions.py

If you are not familiar with Docker, this is how you get into their container (you may leave out the nvidia arg):
sudo docker run -it --rm --runtime=nvidia --entrypoint="" {CONTAINER_ID} bash

requirements_fixed.txt

Is it possible to get 3D pose under a global coordinate?

Currently, the reconstruction is done on the image patch of the detected human bounding box. I tried directly rendering the reconstructed poses of a video with fixed camera parameters, but the result kept jittering. Is there an easy way to transfer from the predicted local 3D pose (with respect to the image patch) to a global 3D pose based on the bounding box parameters?

Results flipped occasionally

Hi! I found some frame of the results are flipped. I checked and realized it's due to my wrong modifications on the pytorch conversions.py script. I used this one as reference to make sure it's the right math: https://github.com/camenduru/SMPLer-X-colab/blob/main/SMPLer_X_colab.ipynb

bone point data of the human body

This is really an exciting job. Can I directly apply the bone point data of the human body in the model inference results to the self built human body model? I visualized the bone point data of the human body in the model inference results, but did not output a representation similar to the skeleton. @caizhongang

Custom Data

Thanks for your nice work! I have some data that are different from these datasets format in your project. Can I run this model in my own data? How could I do this? If you can give me some advice, I will be thankful!

🦒 colab

Thanks for the project ❤️ I made a colab. 🥳 I hope you like it. https://github.com/camenduru/SMPLer-X-colab

Unity model binding

Hello, I saw a related video on YouTube, and I want to ask how to model the inference results of the model to a model in Unity? This should be very interesting. Is there any expert who can offer some help? Thank you.

what's version of mmdet?

There are some errors when using mmdet. The version of mmcv is 1.7.1, and only supports mmdet 2.27.2. However, simple code like this
model = init_detector(config_file, checkpoint_file, device='cuda:0') not work. So what's the version of mmdet?

如何为虚拟角色制作动画

想请教下如何转换为动画，比较小白，想知道一条路径，不需要非常详细，能大概告知如何操作的就行，非常感谢

Can not find SMPLX_to_J14.pkl in links

Dear Author:
Thanks a lot for your job, when I download files as your README to construct human_models_files, I can not find the file "SMPLX_to_J14.pkl" in smplx. Could you please help me with that?

the link in README is https://smpl-x.is.tue.mpg.de/download.php

It seems that joint_image can't be utilized as 2D joints

Hi, SMPLer-X had already provided 3D info, but I still wanna obtain 2D information for my work (I would not like to add another 2D joints detector). Here is my modification (refer to InterWild).
SMPLer_X.py

def project_to_body_space(part_name, bbox):
    hand_bbox_w = bbox[:, None, 2] - bbox[:, None, 0]
    hand_bbox_h = bbox[:, None, 3] - bbox[:, None, 1]
    joint_img[:, smpl_x.pos_joint_part[part_name], 0] *= (
        (hand_bbox_w / cfg.output_hand_hm_shape[2]))
    joint_img[:, smpl_x.pos_joint_part[part_name], 1] *= (
        (hand_bbox_h / cfg.output_hand_hm_shape[1]))
    joint_img[:, smpl_x.pos_joint_part[part_name], 0] += bbox[:, None, 0]
    joint_img[:, smpl_x.pos_joint_part[part_name], 1] += bbox[:, None, 1]
        
for part_name, bbox in (('lhand', lhand_bbox), ('rhand', rhand_bbox)):
    project_to_body_space(part_name, bbox)

joint_img[:,smpl_x.pos_joint_part['body'],0] *= (cfg.input_body_shape[1]/ cfg.output_hm_shape[2])
joint_img[:,smpl_x.pos_joint_part['body'],1] *= (cfg.input_body_shape[0]/ cfg.output_hm_shape[1])

joint_img[:,:,0] *= (cfg.input_img_shape[1] / cfg.input_body_shape[1])
joint_img[:,:,1] *= (cfg.input_img_shape[0] / cfg.input_body_shape[0])

inference.py

joint_img = out['joint_img'].cpu().numpy()[0]
joint_img_xy1 = np.concatenate((joint_img[:,:2], np.ones_like(joint_img[:,:1])),1)
joint_img = np.round(np.dot(bb2img_trans, joint_img_xy1.transpose(1,0)).transpose(1,0)).astype(int)

lhand_bbox = out['lhand_bbox'].cpu().numpy().reshape(2,2)
hand_bbox_xyl = np.concatenate((lhand_bbox, np.ones_like(lhand_bbox[:,:1])),1)
lhand_bbox = np.round(np.dot(bb2img_trans, hand_bbox_xyl.transpose(1,0)).transpose(1,0)).astype(int)
lhand_bbox = [lhand_bbox[0,0], lhand_bbox[0,1], lhand_bbox[1,0], lhand_bbox[1,1]]  

rhand_bbox = out['rhand_bbox'].cpu().numpy().reshape(2,2)
hand_bbox_xyr = np.concatenate((rhand_bbox, np.ones_like(rhand_bbox[:,:1])),1)
rhand_bbox = np.round(np.dot(bb2img_trans, hand_bbox_xyr.transpose(1,0)).transpose(1,0)).astype(int)  
rhand_bbox = [rhand_bbox[0,0], rhand_bbox[0,1], rhand_bbox[1,0], rhand_bbox[1,1]]

When I use default params, the results seems wired, the joints are not accurate.

After modified as below:

# lhand_bbox = restore_bbox(lhand_bbox_center, lhand_bbox_size, cfg.input_hand_shape[1] / cfg.input_hand_shape[0], 2).detach()  
# rhand_bbox = restore_bbox(rhand_bbox_center, rhand_bbox_size, cfg.input_hand_shape[1] / cfg.input_hand_shape[0], 2).detach()
lhand_bbox = restore_bbox(lhand_bbox_center, lhand_bbox_size, cfg.input_hand_shape[1] / cfg.input_hand_shape[0], 1.5).detach()  
rhand_bbox = restore_bbox(rhand_bbox_center, rhand_bbox_size, cfg.input_hand_shape[1] / cfg.input_hand_shape[0], 1.5).detach()
# bbox = process_bbox(mmdet_box_xywh, original_img_width, original_img_height)
 bbox = process_bbox(mmdet_box_xywh, original_img_width, original_img_height, ratio=1.1)

Things get better, but still not accurate.

How to explain this?

recover mesh

For each image,while inferencing,there is a npz file in output.So how can I recover 3D model image using the npz file

FPS comparison with other SOTAs?

Hi!

Do you have any comparison results of inference speed(fps) between your code and other SOTAs like OSX?

Thanks!

about humandata

great work! when release train data on humandata format ?

ValueError: buffer size does not match array size

python inference.py --num_gpus 1 --exp_name output/demo_inference_单人半身 --pretrained_model smpler_x_h32 --agora_benchmark agora_model --img_path ../demo/images/单人半身 --start 1 --end 514 --output_folder ../demo/results/单人半身 --show_verts --show_bbox --save_mesh
/home/bruce/anaconda3/envs/smplerx/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
Traceback (most recent call last):
File "inference.py", line 297, in
main()
File "inference.py", line 78, in main
from base import Demoer
File "/media/bruce/3696C50196C4C31B/zxx/SMPLer-X/main/../common/base.py", line 11, in
from SMPLer_X import get_model
File "/media/bruce/3696C50196C4C31B/zxx/SMPLer-X/main/../main/SMPLer_X.py", line 4, in
from nets.smpler_x import PositionNet, HandRotationNet, FaceRegressor, BoxNet, HandRoI, BodyRotationNet
File "/media/bruce/3696C50196C4C31B/zxx/SMPLer-X/main/../common/nets/smpler_x.py", line 6, in
from utils.human_models import smpl_x
File "/media/bruce/3696C50196C4C31B/zxx/SMPLer-X/main/../common/utils/human_models.py", line 175, in
smpl_x = SMPLX()
File "/media/bruce/3696C50196C4C31B/zxx/SMPLer-X/main/../common/utils/human_models.py", line 20, in init
self.j14_regressor = pickle.load(f, encoding='latin1')
ValueError: buffer size does not match array size

Does anyone know how to solve this problem?Thank you very much!

I want to use this powerful algorithm for indoor scenes. Do I have to finetune this model?

I noticed that the pretrain model is pretrained on many datasets and has strong generalization capabilities. I want to apply it to actual indoor scenes, but I don't have my own dataset. Do you think I still need to finetune using the 5 public datasets you used?

Thinks！

SMPLX_to_J14.pkl

SMPLX_to_J14.pkl not found ,where is the file?

Image normalization and VIT

I noticed, that there is only one transform: ToTensor() in the DataLoader.
Why don't you use image normalization (mean, std) before first VIT's layers?

Problem regarding estimation accuracy in Foot Region

Hi, I ran the project on other videos and it worked generally well. But I observed some differences between the pose estimation output and the actual human pose, especially in the foot area. The attached image shows a clear divergence, where the estimated pose points are not aligned with the physical position of the foot:

I tried models s to h and the same problem occurred in all models. Can you give your opinion on this matter? Is this a known issue or could you recommend any method to improve the performance?

Thanks in advance.

Any reason for choosing mmcv_full from sensetime?

Hi,

Do you have any reason for choosing mmcv_full from sensetime?

wget http://download.openmmlab.sensetime.com/mmcv/dist/cu113/torch1.12.0/mmcv_full-1.7.1-cp38-cp38-manylinux1_x86_64.whl

I couldn't connect to that link, so I chose to use openmmlab's but I'm curious about this.

Thanks!

Error when running "init_detector" in inference

Thanks for the nice work.

I followed the installation instructions but ran into an error when running inference. The error happened when init_detector, which then used mmdet, mmcv, and importlib.

Do you have any idea why will this happen? Is that a package version issue?

Traceback (most recent call last):
  File "inference.py", line 188, in <module>
    main()
  File "inference.py", line 71, in main
    model = init_detector(config_file, checkpoint_file, device='cuda:0')  # or device='cuda:0'
  File "/home/anaconda3/envs/smplerx/lib/python3.8/site-packages/mmdet/apis/inference.py", line 33, in init_detector
    config = mmcv.Config.fromfile(config)
  File "/home/anaconda3/envs/smplerx/lib/python3.8/site-packages/mmcv/utils/config.py", line 340, in fromfile
    cfg_dict, cfg_text = Config._file2dict(filename,
  File "/home/anaconda3/envs/smplerx/lib/python3.8/site-packages/mmcv/utils/config.py", line 208, in _file2dict
    mod = import_module(temp_module_name)
  File "/home/anaconda3/envs/smplerx/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/tmp/tmpngpqtodm/tmpti3mss0b.py", line 1, in <module>
NameError: name 'false' is not defined
Exception ignored in: <function _TemporaryFileCloser.__del__ at 0x7fcdc5affa60>
Traceback (most recent call last):
  File "/home/anaconda3/envs/smplerx/lib/python3.8/tempfile.py", line 440, in __del__
    self.close()
  File "/home/anaconda3/envs/smplerx/lib/python3.8/tempfile.py", line 436, in close
    unlink(self.name)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpngpqtodm/tmpti3mss0b.py'

ImportError: DLL load failed while importing _ext: 找不到指定的程序。

I only want to use the function of inference to get the npz from a video, so I comment out the line of inference.py from 158 to 184. Then the error comes out. Does anyone have any clues?

TypeError: register_module() got an unexpected keyword argument 'froce'

I got this error during inference:

even if I installed the packages with correct versions:

I also can't find the keyword "froce" defined anywhere in the mmcv or mmdet source code.

The hand details are not extracted

Hi, thanks for your great work again.
But for me it seems that the hand details are not extracted, why is that? I know that SMPL officially provides a mano that can extract hand details. Does your pre-trained model not have this module? I see that all of your YouTube videos have hand details.
The model that I use is "SMPLer-X-H32":

Looking forward to your early reply.

Error: libcuda.so: wrong ELF class: ELFCLASS32

when I run
sh slurm_inference.sh test_video mp4 24 smpler_x_s32
it occurs error
Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: wrong ELF class: ELFCLASS32
how can I solve it?

Use Pytorch3D to render visualization

Firstly, I would like to express my admiration for your impressive work. It has been a great resource for me.

I am currently trying to understand how to use PyTorch3D for 2D overlap visualization.

I have attempted to write my own code to achieve this, but the results have been unsatisfactory, with strange overlapping occurring. I suspect that I might be missing some crucial steps or misusing some parameters.

Could you possibly provide a detailed description or guide on how to use PyTorch3D for 2D overlap visualization, particularly with respect to the use of predicted camera parameters? Any examples or references would be greatly appreciated.

Thank you in advance for your time and assistance. I look forward to your response.

Best regards

Some question about the docker support

docker run --gpus all -v <vid_input_folder>:/smplerx_inference/vid_input
-v <vid_output_folder>:/smplerx_inference/vid_output
wcwcw/smplerx_inference:v0.2 --vid <video_name>.mp4
I wonder what does the <vid_output_folder> and the <vid_input_folder> mean. Are they the folders /SMPLer-X-main/demo/results and /SMPLer-X-main/demo/videos?

about training (video/photo) and different size of betas in datasets

I saw on the main page, that SMPLer-X inference scripts expect video data as input. Is it possible to modify model to support single images?
Do you train your model using images as video (in strict sequence)?
BEDLAM dataset has 11 betas and only one neutral gender (according to the base model). AGORA has only 10 betas. How do you combine different numbers of shapes while training and in the model's head? Does SMPLX regression layer have dynamic size?

Thank you!

random output on inference with pretrained model

Hi,

I've tested inference with your pretrained model, but the result seems random value.
As I tried to debug this, input image cropped was okay.

I've tested all pretrained models but failed.

Could you check them again?

说明一下data文件夹下的详细文件排列树状图

你好，你的工作做的非常出色，能不能说明一下data文件夹下的详细文件排列树状图呢，非常感谢

What is the difference between this work and OSX?

hi,
This result is very amazing. I would like to know other than the difference in training data, is there any difference between this work and OSX?

ModuleNotFoundError: No module named 'utils.inference_utils'

Hi! I'm having some trouble when trying to inference over a video. I'm using the inference.py script, but in the console I obtain the following error:

/home/seba/anaconda3/envs/smplerx/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
Traceback (most recent call last):
File "main/inference.py", line 18, in
from utils.inference_utils import process_mmdet_results, non_max_suppression
ModuleNotFoundError: No module named 'utils.inference_utils'

I followed the instructions to setup the environment as appears in the readme file.
The command I used is as follows.
python main/inference.py --num_gpus 1 --pretrained_model smpler_x_s32 --agora_benchmark agora_model --image_path ./demo/images --start 1 --end 453 --output_folder ./demo/results/salida.mp4 --show_verts --show_bbox

Thanks for your help!

RuntimeError: Caught RuntimeError in replica 0 on device 0.

when I run

sh slurm_inference.sh test_video mp4 24 smpler_x_h32

it occurs error

RuntimeError: Caught RuntimeError in replica 0 on device 0.

I used

num_gpus =1

and in /main/common/base.py I used

ckpt = torch.load(cfg.pretrained_model_path,map_location=torch.device('cpu'))

Pretrained model?

Hi, thanks for sharing your great work!

Is there any pretrained model I can test?

Why output is different from repository

Hi, thanks for the great work , i am new with SMPL models. May I ask why the output in your work like 'point', but not like a mesh blend output in README? how can i get that output? thanks .

Data converting scripts

Thank you for your work! Do you have any plans to release converters to the human_data format? (Most interested in the BEDLAM converter)

Subtraction, the `-` operator, with a bool tensor is not supported.

May I ask if the following error occurred while loading the model? Is this an issue with the pytorch version or the model? How to solve it?

ImportError: ('Unable to load OpenGL library', "Could not find module 'OSMesa' (or one of its dependencies). Try using the full path with constructor syntax.", 'OSMesa', None)

The following error occurs when reasoning on windows10, please tell me how to solve it, thank you

About slurm

How to run slurm on ubuntu

fail to inference and cannot find where to input the driven image

hi, thanks for your excellent project! But when I try to inference your model with my video, it failed:

I dont know why. And as you says in README:

only input one video? So where is the driven images?

caizhongang / smpler-x Goto Github PK

smpler-x's People

Contributors

Stargazers

Watchers

Forkers

smpler-x's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs