GithubHelp home page GithubHelp logo

garyzhao / semgcn Goto Github PK

View Code? Open in Web Editor NEW
461.0 11.0 78.0 6.1 MB

The Pytorch implementation for "Semantic Graph Convolutional Networks for 3D Human Pose Regression" (CVPR 2019).

Home Page: https://arxiv.org/abs/1904.03345

License: Apache License 2.0

Python 99.63% MATLAB 0.37%
human-pose-estimation 3d-pose-estimation graph-convolutional-networks

semgcn's Introduction

Semantic Graph Convolutional Networks for 3D Human Pose Regression (CVPR 2019)

This repository holds the Pytorch implementation of Semantic Graph Convolutional Networks for 3D Human Pose Regression by Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia and Dimitris N. Metaxas. If you find our code useful in your research, please consider citing:

@inproceedings{zhaoCVPR19semantic,
  author    = {Zhao, Long and Peng, Xi and Tian, Yu and Kapadia, Mubbasir and Metaxas, Dimitris N.},
  title     = {Semantic Graph Convolutional Networks for 3D Human Pose Regression},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages     = {3425--3435},
  year      = {2019}
}

Introduction

We propose Semantic Graph Convolutional Networks (SemGCN), a novel graph convolutional network architecture that operates on regression tasks with graph-structured data. The code of training and evaluating our approach for 3D human pose estimation on the Human3.6M Dataset is provided in this repository.

In this repository, 3D human poses are predicted according to Configuration #1 in our paper: we only leverage 2D joints of the human pose as inputs. We utilize the method described in Pavllo et al. [2] to normalize 2D and 3D poses in the dataset, which is different from the original implementation in our paper. To be specific, 2D poses are scaled according to the image resolution and normalized to [-1, 1]; 3D poses are aligned with respect to the root joint. Please refer to the corresponding part in Pavllo et al. [2] for more details. We predict 16 joints (as the skeleton in Martinez et al. [1] without the 'Neck/Nose' joint). We also provide the results of Martinez et al. [1] in the same setting for comparison.

Results on Human3.6M

Under Protocol 1 (mean per-joint position error) and Protocol 2 (mean per-joint position error after rigid alignment).

Method 2D Detections # of Epochs # of Parameters MPJPE (P1) P-MPJPE (P2)
Martinez et al. [1] Ground truth 200 4.29M 44.40 mm 35.25 mm
SemGCN Ground truth 50 0.27M 42.14 mm 33.53 mm
SemGCN (w/ Non-local) Ground truth 30 0.43M 40.78 mm 31.46 mm
Martinez et al. [1] SH (fine-tuned) 200 4.29M 63.48 mm 48.15 mm
SemGCN (w/ Non-local) SH (fine-tuned) 100 0.43M 61.24 mm 47.71 mm

Results using two different 2D detections (Ground truth and Stacked Hourglass detections fine-tuned on Human3.6M) are reported.

References

[1] Martinez et al. A simple yet effective baseline for 3d human pose estimation. ICCV 2017.

[2] Pavllo et al. 3D human pose estimation in video with temporal convolutions and semi-supervised training. CVPR 2019.

Quick start

This repository is build upon Python v2.7 and Pytorch v1.1.0 on Ubuntu 16.04. NVIDIA GPUs are needed to train and test. See requirements.txt for other dependencies. We recommend installing Python v2.7 from Anaconda, and installing Pytorch (>= 1.1.0) following guide on the official instructions according to your specific CUDA version. Then you can install dependencies with the following commands.

git clone [email protected]:garyzhao/SemGCN.git
cd SemGCN
pip install -r requirements.txt

Dataset setup

You can find the instructions for setting up the Human3.6M and results of 2D detections in data/README.md. The code for data preparation is borrowed from VideoPose3D.

Evaluating our pretrained models

The pretrained models can be downloaded from Google Drive. Put checkpoint in the project root directory.

To evaluate Martinez et al. [1], run:

python main_linear.py --evaluate checkpoint/pretrained/ckpt_linear.pth.tar
python main_linear.py --evaluate checkpoint/pretrained/ckpt_linear_sh.pth.tar --keypoints sh_ft_h36m

To evaluate SemGCN without non-local blocks, run:

python main_gcn.py --evaluate checkpoint/pretrained/ckpt_semgcn.pth.tar

To evaluate SemGCN with non-local blocks, run:

python main_gcn.py --non_local --evaluate checkpoint/pretrained/ckpt_semgcn_nonlocal.pth.tar
python main_gcn.py --non_local --evaluate checkpoint/pretrained/ckpt_semgcn_nonlocal_sh.pth.tar --keypoints sh_ft_h36m

Note that the error is calculated in an action-wise manner.

Training from scratch

If you want to reproduce the results of our pretrained models, run the following commands.

For Martinez et al. [1]:

python main_linear.py

For SemGCN without non-local blocks:

python main_gcn.py --epochs 50

By default the application runs in training mode. This will train a new model for 50 epochs without non-local blocks, using ground truth 2D detections. You may change the value of num_layers (4 by default) and hid_dim (128 by default) if you want to try different network settings. Please refer to main_gcn.py for more details.

For SemGCN with non-local blocks:

python main_gcn.py --non_local --epochs 30

This will train a new model with non-local blocks for 30 epochs, using ground truth 2D detections.

For training and evaluating models using 2D detections generated by Stacked Hourglass, add --keypoints sh_ft_h36m to the commands:

python main_gcn.py --non_local --epochs 100 --keypoints sh_ft_h36m
python main_gcn.py --non_local --evaluate ${CHECKPOINT_PATH} --keypoints sh_ft_h36m

Visualization

You can generate visualizations of the model predictions by running:

python viz.py --architecture gcn --non_local --evaluate checkpoint/pretrained/ckpt_semgcn_nonlocal.pth.tar --viz_subject S11 --viz_action Walking --viz_camera 0 --viz_output output.gif --viz_size 3 --viz_downsample 2 --viz_limit 60

The script can also export MP4 videos, and supports a variety of parameters (e.g. downsampling/FPS, size, bitrate). See viz.py for more details.

Acknowledgement

Part of our code is borrowed from the following repositories.

We thank to the authors for releasing their codes. Please also consider citing their works.

semgcn's People

Contributors

garyzhao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

semgcn's Issues

About the global offset

Hello, I noticed you remove the global offset in data_utils.py but it will lose the depth information. It will make a difference with others in the evaluating stage because you set the hip joint as the original point but not the camera. Your estimation will be easier because you don't need to estimate the depth. I do not understand. Can you help me?

The correspondence between 2D joints and 3D joints

Hello! I have two questions about the correspondence between 2D joints and 3D joints. As you train the model, is it right that the type and order of input 2D joints is same as output 3D joints ? Does both 2D and 3D joints exclude neck/nose? @garyzhao

问题

#[-1 0 1 2 3 4 0 6 7 8 9 0 11 12 13 14 12 16 17 18 19 20 19 22 12 24 25 26 27 28 27 30]
想请问一下,父亲节点根据什么来的

It seems that the code for "SemGraphConv" is different from the paper

hi, thank you for your releasing code, that's great.
I have some question about the graphConv operation.

It seems that Eq. 2 in the paper performs F.softmax(M * A), however, in the releasing code, it performs F.softmax(A) * M. Is there any difference, and which one is better?

how tow concat roi feature with pose

I am very interested in your work. In your paper, you said concate Roi feature with pose. Do you compress the feature to d-dimension?
My understanding of your approach is:
For example, pose(2D)+feature(d-dimen) == 2+D?
Is it right? and what is your feature dimension.
Looking forward to your reply.

confused about `View-disentangled Human Pose Representation ` on arxiv

Hi, Long~
Recently, I am very interested in your paper 'Learning View-disentangled Human Pose Representation by Contrastive cross-view Mutual Information Maximization' which is posted on the arxiv. Among them, I am very curious about the method you use to minimize mutual information to achieve Disentangle. I study the direction of graph data. Due to the needs of the project, I have a strong demand for disentangling to achieve mutual information minimization. Then, I am also interested in the approximation method of the upper bound of mutual information based on 2020ICML-club: A contrastive log-ratio upper bound of mutual information. Unfortunately, I cannot converge well with it in my own model, and even negative values often occur. Therefore, I am sorry to bother you, and I would like to ask you some question:
1: Do you have any adjustment or improvement on the source code processing method that you and 2020ICML hang out? Or is it convenient for you to talk about your experience or views on the convergence of mutual information to minimize the loss?
2: Does the idea of mutual information minimization realize decoupling contribute significantly to the effectiveness of your model? Maybe due to my lack of basic knowledge at present, I only have a general understanding of 2020ICML about mutual information minimization, and I still have some confusion about how to apply it.
Please forgive me if I disturb you!
Best wishes

reference:
CLUB: A contrastive log-ratio upper bound of mutual information. Pengyu Cheng. In ICML, 2020

About the chanel-wise

i can't understand the chanel-wise implementation, file "sem_ch_graph_conv.py" and "sem_graph_conv.py", what's the difference between them. And i wonder the size of "W" in file "sem_ch_graph_conv.py" line 19, how to understand its first dimention of 2.
Looking forward your guidance ,thanks!

What is the Input format of the ckpt_semgcn_nonlocal_sh.pth.tar model ?

The ckpt_semgcn_nonlocal_sh.pth.tar model outputs 3D poses in H36M format.

But it is not clear to me from the description how to generate a correct input from an "in the wild image".

Does it take 2D input poses in MPII format? (Stacked Hourglass)

Or does it take 2D input poses in (2D) H36M format?

If it is the latter, then how did you convert from MPII to 2D H36M format when training the ckpt_semgcn_nonlocal_sh.pth.tar model? Or did you train a special Stacked Hourglass model to output 2D H36M format directly?

Data set error

@garyzhao Hello, thank you very much for your work.
When I ran Python prepare_data_h36m.py --from- Archive H36m.zip, I got the following error:

Extracting Human3.6M dataset from h36m.zip
Converting...
Traceback (most recent call last):
File "prepare_data_h36m.py", line 66, in
positions = hf['3D_positions'].value.reshape(32, 3, -1).transpose(2, 0, 1)
AttributeError: 'Dataset' object has no attribute 'value'

Looking forward to your reply!

Do you ever have tried cropping the joint?

hello! Do you ever have tried cropping the image into patches with person-centered to reduce the influence of bias of 2D joint? Does it probably have any improvement? Thank you!

RuntimeError: mat1 dim 1 must match mat2 dim 0

I get this error with this Traceback when trying to train non-local on ground truth h36m:

Traceback (most recent call last):

File "/mnt/466a95be-9c12-4e2c-9e95-7d037205b249/Amin/SemGCN/main_gcn.py", line 287, in
main(parse_args())

File "/mnt/466a95be-9c12-4e2c-9e95-7d037205b249/Amin/SemGCN/main_gcn.py", line 176, in main
epoch_loss, lr_now, glob_step = train(train_loader, model_pos, criterion, optimizer, device, args.lr, lr_now,

File "/mnt/466a95be-9c12-4e2c-9e95-7d037205b249/Amin/SemGCN/main_gcn.py", line 223, in train
outputs_3d = model_pos(inputs_2d)

File "/home/amin/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)

File "/mnt/466a95be-9c12-4e2c-9e95-7d037205b249/Amin/SemGCN/models/sem_gcn.py", line 93, in forward
out = self.gconv_input(x)

File "/home/amin/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)

File "/home/amin/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)

File "/home/amin/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)

File "/mnt/466a95be-9c12-4e2c-9e95-7d037205b249/Amin/SemGCN/models/sem_gcn.py", line 23, in forward
x = self.gconv(x).transpose(1, 2)

File "/home/amin/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)

File "/mnt/466a95be-9c12-4e2c-9e95-7d037205b249/Amin/SemGCN/models/sem_graph_conv.py", line 43, in forward
output = torch.matmul(adj * M, h0) + torch.matmul(adj * (1 - M), h1)

RuntimeError: mat1 dim 1 must match mat2 dim 0

and these are the input parameters:
Using settings Namespace(actions='*', batch_size=64, checkpoint='checkpoint', dataset='h36m', downsample=1, dropout=0.0, epochs=100, evaluate='', hid_dim=128, keypoints='gt', lr=0.001, lr_decay=100000, lr_gamma=0.96, max_norm=True, non_local=True, num_layers=4, num_workers=8, resume='', snapshot=5)

the matrix multiplication dimensions are not matching apparently:

adj is 16x16
M is 16x16

and h0 is 64x17x128.

Does anybody know how to fix this? Thanks in advance!

3D关节点

Hello, what is the name and corresponding order of the 16 nodes in the output 3D coordinates?(请问输出的3d坐标的16个关节点的名称和对应的顺序是什么?)

问题

@garyzhao ,您好,在您的代码中出现了parents ,在后面的代码当中,它的作用是什么呢,貌似后面没有出现parents的用处

2D pose estimation network

Hi,

Good work!
May I know the details of 2D pose estimation network in your paper? similar to 'Simple baselines for Human Pose Estimation' or modified Resnet50?

Configuration 2?

Hi, nice paper! In the Readme you indicate this repo runs the Configuration 1 of the paper (2D joints of the human pose as inputs). Is there a way to train the Configuration 2 (2D images as input)?

While evaluating using pretrained models (https://github.com/garyzhao/SemGCN#evaluating-our-pretrained-models)

for e.g., with SemGCN with non-local blocks

RuntimeError: Error(s) in loading state_dict for SemGCN:
	size mismatch for gconv_input.0.gconv.e: copying a param with shape torch.Size([1, 46]) from checkpoint, the shape in current model is torch.Size([1, 49]).
	size mismatch for gconv_layers.0.gconv1.gconv.e: copying a param with shape torch.Size([1, 46]) from checkpoint, the shape in current model is torch.Size([1, 49]).
	size mismatch for gconv_layers.0.gconv2.gconv.e: copying a param with shape torch.Size([1, 46]) from checkpoint, the shape in current model is torch.Size([1, 49]).
	size mismatch for gconv_layers.2.gconv1.gconv.e: copying a param with shape torch.Size([1, 46]) from checkpoint, the shape in current model is torch.Size([1, 49]).
	size mismatch for gconv_layers.2.gconv2.gconv.e: copying a param with shape torch.Size([1, 46]) from checkpoint, the shape in current model is torch.Size([1, 49]).
	size mismatch for gconv_layers.4.gconv1.gconv.e: copying a param with shape torch.Size([1, 46]) from checkpoint, the shape in current model is torch.Size([1, 49]).
	size mismatch for gconv_layers.4.gconv2.gconv.e: copying a param with shape torch.Size([1, 46]) from checkpoint, the shape in current model is torch.Size([1, 49]).
	size mismatch for gconv_layers.6.gconv1.gconv.e: copying a param with shape torch.Size([1, 46]) from checkpoint, the shape in current model is torch.Size([1, 49]).
	size mismatch for gconv_layers.6.gconv2.gconv.e: copying a param with shape torch.Size([1, 46]) from checkpoint, the shape in current model is torch.Size([1, 49]).
	size mismatch for gconv_output.e: copying a param with shape torch.Size([1, 46]) from checkpoint, the shape in current model is torch.Size([1, 49]).

Similar issues with other models.

converting H36M to openpose

hi thanks for your work, Is there a way to convert the resulting pose to openpose 18 keypoint format from h36m format ?
Thanks

question

想请问一下,代码中的parents有什么作用

Have you tested your method for 3D hand pose estimation?

Hi @garyzhao, thanks for your amazing work and sharing your codes!

I guess your method should also work for 3D hand pose estimation. Have you ever tested your method on 3D hand pose data sets and compared with some top performers such as A2J, V2V poseNet?

Thank you for your kind reply!

Converting Stacked Hourglass keypoints to h36m

Hi @garyzhao and others. I have been researching different kinds of 3D pose estimation models and really like your work. I do have a few questions.

I currently have a 2D pose estimation (All desired keypoints) using this stacked hourglass model.
I am trying to convert it to the desired h36m format using your converter that can be found here. However, even though there is a readme i do not understand how you use the converter for a single (or multiple) 2d keypoint prediction.

Is there anyone that knows how I can convert my 2d predictions to h36m format?

How to run inference on the model

Hi @garyzhao . Thanks alot for your repo. It has been very helpfull so far.
I do however have one question.

I converted my 2D prediction data to the required h36m format using your converter found here. I am however unsure how I can use this as input data to inference your pre-trained model. Is there any help you can give me to perform such an inference?

Kind Regards

RuntimeError: Error(s) in loading state_dict for SemGCN

Thankyou for your work, I tried running the code using

python main_linear.py --evaluate checkpoint/pretrained/ckpt_linear.pth.tar

python main_gcn.py --evaluate checkpoint/pretrained/ckpt_semgcn.pth.tar

are working fine, but getting missing keys error for this run
python main_gcn.py --non_local --evaluate checkpoint/pretrained/ckpt_semgcn_nonlocal.pth.tar

$ python main_gcn.py --non_local --evaluate checkpoint/pretrained/ckpt_semgcn_nonlocal.pth.tar ==> Using settings Namespace(actions='*', batch_size=64, checkpoint='checkpoint', dataset='h36m', downsample=1, dropout=0.0, epochs=100, evaluate='checkpoint/pretrained/ckpt_semgcn_nonlocal.pth.tar', hid_dim=128, keypoints='gt', lr=0.001, lr_decay=100000, lr_gamma=0.96, max_norm=True, non_local=True, num_layers=4, num_workers=8, resume='', snapshot=5) ==> Loading dataset... ==> Preparing data... ==> Loading 2D detections... ==> Creating model... ==> Total parameters: 0.43M ==> Loading checkpoint 'checkpoint/pretrained/ckpt_semgcn_nonlocal.pth.tar' Traceback (most recent call last): File "main_gcn.py", line 320, in <module> main(parse_args()) File "main_gcn.py", line 138, in main model_pos.load_state_dict(ckpt['state_dict']) File "/home/kirk/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/site-packages/torch/nn/modules/module.py", line 839, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for SemGCN: Missing key(s) in state_dict: "gconv_input.1.nonlocal1.g.0.weight", "gconv_input.1.nonlocal1.g.0.bias", "gconv_input.1.nonlocal1.theta.weight", "gconv_input.1.nonlocal1.theta.bias", "gconv_input.1.nonlocal1.phi.0.weight", "gconv_input.1.nonlocal1.phi.0.bias", "gconv_input.1.nonlocal1.concat_project.0.weight", "gconv_input.1.nonlocal1.W.0.weight", "gconv_input.1.nonlocal1.W.0.bias", "gconv_input.1.nonlocal1.W.1.weight", "gconv_input.1.nonlocal1.W.1.bias", "gconv_input.1.nonlocal1.W.1.running_mean", "gconv_input.1.nonlocal1.W.1.running_var", "gconv_layers.1.nonlocal1.g.0.weight", "gconv_layers.1.nonlocal1.g.0.bias", "gconv_layers.1.nonlocal1.theta.weight", "gconv_layers.1.nonlocal1.theta.bias", "gconv_layers.1.nonlocal1.phi.0.weight", "gconv_layers.1.nonlocal1.phi.0.bias", "gconv_layers.1.nonlocal1.concat_project.0.weight", "gconv_layers.1.nonlocal1.W.0.weight", "gconv_layers.1.nonlocal1.W.0.bias", "gconv_layers.1.nonlocal1.W.1.weight", "gconv_layers.1.nonlocal1.W.1.bias", "gconv_layers.1.nonlocal1.W.1.running_mean", "gconv_layers.1.nonlocal1.W.1.running_var", "gconv_layers.3.nonlocal1.g.0.weight", "gconv_layers.3.nonlocal1.g.0.bias", "gconv_layers.3.nonlocal1.theta.weight", "gconv_layers.3.nonlocal1.theta.bias", "gconv_layers.3.nonlocal1.phi.0.weight", "gconv_layers.3.nonlocal1.phi.0.bias", "gconv_layers.3.nonlocal1.concat_project.0.weight", "gconv_layers.3.nonlocal1.W.0.weight", "gconv_layers.3.nonlocal1.W.0.bias", "gconv_layers.3.nonlocal1.W.1.weight", "gconv_layers.3.nonlocal1.W.1.bias", "gconv_layers.3.nonlocal1.W.1.running_mean", "gconv_layers.3.nonlocal1.W.1.running_var", "gconv_layers.5.nonlocal1.g.0.weight", "gconv_layers.5.nonlocal1.g.0.bias", "gconv_layers.5.nonlocal1.theta.weight", "gconv_layers.5.nonlocal1.theta.bias", "gconv_layers.5.nonlocal1.phi.0.weight", "gconv_layers.5.nonlocal1.phi.0.bias", "gconv_layers.5.nonlocal1.concat_project.0.weight", "gconv_layers.5.nonlocal1.W.0.weight", "gconv_layers.5.nonlocal1.W.0.bias", "gconv_layers.5.nonlocal1.W.1.weight", "gconv_layers.5.nonlocal1.W.1.bias", "gconv_layers.5.nonlocal1.W.1.running_mean", "gconv_layers.5.nonlocal1.W.1.running_var", "gconv_layers.7.nonlocal1.g.0.weight", "gconv_layers.7.nonlocal1.g.0.bias", "gconv_layers.7.nonlocal1.theta.weight", "gconv_layers.7.nonlocal1.theta.bias", "gconv_layers.7.nonlocal1.phi.0.weight", "gconv_layers.7.nonlocal1.phi.0.bias", "gconv_layers.7.nonlocal1.concat_project.0.weight", "gconv_layers.7.nonlocal1.W.0.weight", "gconv_layers.7.nonlocal1.W.0.bias", "gconv_layers.7.nonlocal1.W.1.weight", "gconv_layers.7.nonlocal1.W.1.bias", "gconv_layers.7.nonlocal1.W.1.running_mean", "gconv_layers.7.nonlocal1.W.1.running_var". Unexpected key(s) in state_dict: "gconv_input.1.nonlocal.g.0.weight", "gconv_input.1.nonlocal.g.0.bias", "gconv_input.1.nonlocal.theta.weight", "gconv_input.1.nonlocal.theta.bias", "gconv_input.1.nonlocal.phi.0.weight", "gconv_input.1.nonlocal.phi.0.bias", "gconv_input.1.nonlocal.concat_project.0.weight", "gconv_input.1.nonlocal.W.0.weight", "gconv_input.1.nonlocal.W.0.bias", "gconv_input.1.nonlocal.W.1.weight", "gconv_input.1.nonlocal.W.1.bias", "gconv_input.1.nonlocal.W.1.running_mean", "gconv_input.1.nonlocal.W.1.running_var", "gconv_input.1.nonlocal.W.1.num_batches_tracked", "gconv_layers.1.nonlocal.g.0.weight", "gconv_layers.1.nonlocal.g.0.bias", "gconv_layers.1.nonlocal.theta.weight", "gconv_layers.1.nonlocal.theta.bias", "gconv_layers.1.nonlocal.phi.0.weight", "gconv_layers.1.nonlocal.phi.0.bias", "gconv_layers.1.nonlocal.concat_project.0.weight", "gconv_layers.1.nonlocal.W.0.weight", "gconv_layers.1.nonlocal.W.0.bias", "gconv_layers.1.nonlocal.W.1.weight", "gconv_layers.1.nonlocal.W.1.bias", "gconv_layers.1.nonlocal.W.1.running_mean", "gconv_layers.1.nonlocal.W.1.running_var", "gconv_layers.1.nonlocal.W.1.num_batches_tracked", "gconv_layers.3.nonlocal.g.0.weight", "gconv_layers.3.nonlocal.g.0.bias", "gconv_layers.3.nonlocal.theta.weight", "gconv_layers.3.nonlocal.theta.bias", "gconv_layers.3.nonlocal.phi.0.weight", "gconv_layers.3.nonlocal.phi.0.bias", "gconv_layers.3.nonlocal.concat_project.0.weight", "gconv_layers.3.nonlocal.W.0.weight", "gconv_layers.3.nonlocal.W.0.bias", "gconv_layers.3.nonlocal.W.1.weight", "gconv_layers.3.nonlocal.W.1.bias", "gconv_layers.3.nonlocal.W.1.running_mean", "gconv_layers.3.nonlocal.W.1.running_var", "gconv_layers.3.nonlocal.W.1.num_batches_tracked", "gconv_layers.5.nonlocal.g.0.weight", "gconv_layers.5.nonlocal.g.0.bias", "gconv_layers.5.nonlocal.theta.weight", "gconv_layers.5.nonlocal.theta.bias", "gconv_layers.5.nonlocal.phi.0.weight", "gconv_layers.5.nonlocal.phi.0.bias", "gconv_layers.5.nonlocal.concat_project.0.weight", "gconv_layers.5.nonlocal.W.0.weight", "gconv_layers.5.nonlocal.W.0.bias", "gconv_layers.5.nonlocal.W.1.weight", "gconv_layers.5.nonlocal.W.1.bias", "gconv_layers.5.nonlocal.W.1.running_mean", "gconv_layers.5.nonlocal.W.1.running_var", "gconv_layers.5.nonlocal.W.1.num_batches_tracked", "gconv_layers.7.nonlocal.g.0.weight", "gconv_layers.7.nonlocal.g.0.bias", "gconv_layers.7.nonlocal.theta.weight", "gconv_layers.7.nonlocal.theta.bias", "gconv_layers.7.nonlocal.phi.0.weight", "gconv_layers.7.nonlocal.phi.0.bias", "gconv_layers.7.nonlocal.concat_project.0.weight", "gconv_layers.7.nonlocal.W.0.weight", "gconv_layers.7.nonlocal.W.0.bias", "gconv_layers.7.nonlocal.W.1.weight", "gconv_layers.7.nonlocal.W.1.bias", "gconv_layers.7.nonlocal.W.1.running_mean", "gconv_layers.7.nonlocal.W.1.running_var", "gconv_layers.7.nonlocal.W.1.num_batches_tracked".

MPJPE result

Hi, thank you for sharing your code.

MPJPE result in orginal paper is 43.8mm, but result in this repo(README.md) is 40.78mm.

What is the difference?

Bone constraints in loss

Hi, Gary, Thanks for the code. Did you implement combination of joint and bone constraints as loss in the code as described in your paper?

A

Hi, in my reproduce work, there is a mistake which I don't know how to solve.
models/sem_gcn.py", line 49
self.nonlocal = GraphNonLocal(hid_dim, sub_sample=group_size)
^
SyntaxError: invalid syntax

main_gcn.py occurs error

Hi there~
Thank you for sharing this great work.

when I run main_gcn.py.
It occur this error, the code seems correct though.

Below are the error message:

Traceback (most recent call last):
File "main_gcn.py", line 22, in
from models.sem_gcn import SemGCN
File "/home/iis/SemGCN/models/sem_gcn.py", line 48
self.nonlocal = GraphNonLocal(hid_dim, sub_sample=group_size)
^
SyntaxError: invalid syntax

Thank you~

Is there any training trick? I can't get the provide result.

I follow the instructions and get the result for the SemGCN without the non-local part. But the result is not like yours in the table.
Here is the result:

Epoch: 45 | LR: 0.00066483
Train |################################| (24372/24372) Data: 0.000093s | Batch: 0.026s | Total: 0:10:38 | ETA: 0:00:01 | Loss: 0.0002
Eval |################################| (8490/8490) Data: 0.000082s | Batch: 0.008s | Total: 0:01:07 | ETA: 0:00:01 | MPJPE: 44.7153 | P-MPJPE: 35.0317

Epoch: 46 | LR: 0.00066483
Train |################################| (24372/24372) Data: 0.000130s | Batch: 0.026s | Total: 0:10:37 | ETA: 0:00:01 | Loss: 0.0002
Eval |################################| (8490/8490) Data: 0.000087s | Batch: 0.008s | Total: 0:01:07 | ETA: 0:00:01 | MPJPE: 45.4787 | P-MPJPE: 35.7501

Epoch: 47 | LR: 0.00063824
Train |################################| (24372/24372) Data: 0.000120s | Batch: 0.026s | Total: 0:10:36 | ETA: 0:00:01 | Loss: 0.0002
Eval |################################| (8490/8490) Data: 0.000078s | Batch: 0.008s | Total: 0:01:07 | ETA: 0:00:01 | MPJPE: 45.8799 | P-MPJPE: 35.6219

Epoch: 48 | LR: 0.00063824
Train |################################| (24372/24372) Data: 0.000094s | Batch: 0.026s | Total: 0:10:33 | ETA: 0:00:01 | Loss: 0.0002
Eval |################################| (8490/8490) Data: 0.000071s | Batch: 0.008s | Total: 0:01:07 | ETA: 0:00:01 | MPJPE: 47.1181 | P-MPJPE: 35.5356

Epoch: 49 | LR: 0.00063824
Train |################################| (24372/24372) Data: 0.000122s | Batch: 0.026s | Total: 0:10:40 | ETA: 0:00:01 | Loss: 0.0002
Eval |################################| (8490/8490) Data: 0.000128s | Batch: 0.008s | Total: 0:01:07 | ETA: 0:00:01 | MPJPE: 45.3596 | P-MPJPE: 35.5696

Epoch: 50 | LR: 0.00063824
Train |################################| (24372/24372) Data: 0.000105s | Batch: 0.026s | Total: 0:10:37 | ETA: 0:00:01 | Loss: 0.0001
Eval |################################| (8490/8490) Data: 0.000071s | Batch: 0.008s | Total: 0:01:07 | ETA: 0:00:01 | MPJPE: 43.9466 | P-MPJPE: 34.7108

Do you have some suggestions?

How to convert 2D poses of my own image into the network input?

Hi, recently I've been comparing different 3D human pose estimation methods on my own dataset. While applying SemGCN it seems like the method need 3D ground truth points and camera params to generate 2D pose input. I tried to scaled my 2D human pose according to the image resolution and normalized to [-1, 1], but the 3D result is still really bad.
So is there any method I can follow to preprocess the 2D poses of my own image?
(回中文也行)

can't reproduce same performance

Hi, first of all thank you for sharing the codes.

I am currently trying to reproduce MPJPE(P1) 40.78 with your SemGN+nonlocal model, having ground truth 3d pose as input.

I copied your model, adjacency matrix parsing code. Optimizer and loss function, data preprocesses are all coded according to this repo.

The only different thing is the preprocessed Human3.6M dataset and dataloader code, joint coordinates are in mm scale, but I'm sure it doesn't have problem.

Do you have any clue what's going wrong? This is my training loss graph and error graph.
스크린샷 2019-12-23 14 44 09
스크린샷 2019-12-23 14 43 50

ps)
While the code uses normalized adjacency matrix which is defined as 'A serves as a mask which forces that for node i in the graph' in the paper, what is the point of normalizing if it is just a mask? Is there any difference with just using original adjacency matrix? Thanks.

The correspondence between 2D joints and 3D joints

Hello! I have two questions about the correspondence between 2D joints and 3D joints. As you train the model, is it right that the type and order of input 2D joints is same as output 3D joints ? Does both 2D and 3D joints exclude neck/nose?

Weird results when applying SemGCN to 2D pose from image

Inference on images in the wild using SemGCN has been partially covered in this thread and others, but only the overall process has been made clear. I.e.:

  • Step 1: Use a 2D pose estimation network to generate 2D pose in MPII format.
  • Step 2: Convert 2D pose from MPII format to H36M format as done here.
  • Step 3: Pre-process the 2D input pose as done here.
  • Step 4: Use the pre-processed 2D pose in H36M format as input to the SemGCN SH model. It outputs 3D pose in H36M format.

Below i will follow each step, using the test image of size 300x600 to the left.

original_image 2d_put

For Step 1, i use EfficientPose to generate the MPII format 2d pose of the test image as shown above on the right, here's the numeric output:

positions = [[[108. 512.]	# Right ankle
              [114. 428.]	# Right knee
              [124. 320.]	# Right hip
              [186. 324.]	# Left hip
              [178. 426.]	# Left knee
              [176. 512.]	# Left ankle
              [156. 322.]	# Pelvis
              [162. 152.]	# Thorax
              [164. 114.]	# Upper neck
              [166.  24.]	# Head top
              [ 60. 322.]	# Right wrist
              [ 78. 238.]	# Right elbow
              [ 96. 148.]	# Right shoulder
              [230. 154.]	# Left shoulder
              [240. 246.]	# Left elbow
              [224. 326.]]]	# Left wrist

For Step 2, i run this:

positions = positions[:, SH_TO_GT_PERM, :]

To get the output:

positions = [[[156. 322.]
              [124. 320.]
              [114. 428.]
              [108. 512.]
              [186. 324.]
              [178. 426.]
              [176. 512.]
              [162. 152.]
              [164. 114.]
              [166.  24.]
              [230. 154.]
              [240. 246.]
              [224. 326.]
              [ 96. 148.]
              [ 78. 238.]
              [ 60. 322.]]]

For Step 3, i run this:

positions[..., :2] = normalize_screen_coordinates(positions[..., :2], w=300, h=600)

To get the output:

positions = [[[ 0.0399  0.1466 ]
              [-0.1733  0.1333 ]
              [-0.2400  0.8533 ]
              [-0.2799  1.4133 ]
              [ 0.2400  0.1600 ]
              [ 0.1866  0.8399 ]
              [ 0.1733  1.4133 ]
              [ 0.0800 -0.9866 ]
              [ 0.0933 -1.2400 ]
              [ 0.1066 -1.8400 ]
              [ 0.5333 -0.9733 ]
              [ 0.6000 -0.3600 ]
              [ 0.4933  0.1733 ]
              [-0.3600 -1.0133 ]
              [-0.4800 -0.4133 ]
              [-0.6000  0.1466 ]]]

For Step 4, the above is used as input to the SemGCN SH model running this:

inputs_2d = torch.from_numpy(positions)
inputs_2d = inputs_2d.to(device)
outputs_3d = model_pos(inputs_2d).cpu()
outputs = outputs_3d[:, :, :] - outputs_3d[:, :1, :]

Which gives the output:

outputs = [[[ 0.0000  0.0000  0.0000 ]
            [-0.0769 -0.6899 -0.2520 ]
            [ 0.0847 -0.4062 -0.0607 ]
            [ 0.4154  0.2318  0.4062 ]
            [ 0.2708 -0.5181 -0.0504 ]
            [ 0.3431 -0.7337  0.3018 ]
            [ 0.6379  0.6684  0.2033 ]
            [ 0.1650 -0.9141 -0.8496 ]
            [ 0.5825 -2.1341  0.2762 ]
            [ 1.1561 -1.5364 -0.6433 ]
            [ 1.1612 -1.1453 -0.2103 ]
            [ 0.9097 -0.6763  0.2361 ]
            [ 0.8202 -0.2971  0.2679 ]
            [ 0.8008 -1.1936 -0.1120 ]
            [ 0.2124 -1.3246  0.5563 ]
            [ 0.5093 -0.4762  0.3473 ]]]

When visualized this looks completely wrong... See image below. Can anyone highlight on where the problem lies? Is it a problem with the pre-processing, or with the model?

3d_output

fine-tuned dataset

Thanks for your sharing.
I applied for the access to the fine-tuned dataset, but there was no response. Could you plesase send it to my e-mail([email protected]) so that I can reproduce it as soon as possible.

A question of the 2D input and 3D ground truth

Hi @garyzhao , thanks for your great work!
I have a question of the 2D input and 3D ground truth.

    1. I find that you normalize the 2D pose position to (-1, 1) which corresponds to (0, w), why?
    1. Sorry I do not read the code of 3D part, are the 3D pose positions relative to the pivot? If yes, can I consider that they have been normalized?

Another question: does the Semgraph convolution can learn the position information? Or it only wants to use the neighborhood node information to learn the relationship between the 2D and 3D positions?

I am confused and hope you can give some suggestions, thanks very much!

viz.py problem

Hi, I meet an error when run viz.py following the step you offered,
ValueError: outfile must be *.htm or *.html
It occurs when
Rendering...
Generating 1621 poses...

Thanks.

问题

@garyzhao ,您好,有一些信息不是很清楚,您的这个代码,损失函数为什么是MES ,

a bug about CUDA

File "/home/dm/anaconda3/envs/py27/lib/python2.7/site-packages/torch/nn/functional.py", line 941, in relu
result = torch.relu_(input)
RuntimeError: CUDA error: invalid argument

HI
is there someone happen for this problem,my environment is ubuntu16.04, pytorch==1.1.0,python=2.7

results

I downloaded the Github repo and obtained the SH_2d features according to the instructions, but I directly used the warehouse, and the results were not as good as reported. I don't know why.

problem when running viz.py

python viz.py --architecture gcn --non_local --evaluate checkpoint/pretrained/ckpt_semgcn_nonlocal.pth.tar --viz_subject S11 --viz_action Walking --viz_camera 0 --viz_output output.gif --viz_size 3 --viz_downsample 2 --viz_limit 60

When I do this step above, I encountered the following problem and I don't know how to solve it.

==> Loading dataset... ==> Preparing data... ==> Loading 2D detections... ==> Creating model... ==> Total parameters: 0.43M ==> Loading checkpoint 'checkpoint/pretrained/ckpt_semgcn_nonlocal.pth.tar' ==> Loaded checkpoint (Epoch: 16 | Error: 41.1463412364) ==> Rendering... Generating 1621 poses...

Traceback (most recent call last): File "viz.py", line 192, in <module> main(parse_args()) File "viz.py", line 141, in main input_video_skip=args.viz_skip) File "/root/SemGCN-master/common/visualization.py", line 180, in render_animation anim.save(output, dpi=80, writer='imagemagick') File "/usr/local/lib/python2.7/dist-packages/matplotlib/animation.py", line 1200, in save writer.grab_frame(**savefig_kwargs) File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/usr/local/lib/python2.7/dist-packages/matplotlib/animation.py", line 241, in saving self.finish() File "/usr/local/lib/python2.7/dist-packages/matplotlib/animation.py", line 563, in finish .format(self._proc.returncode)) RuntimeError: Error creating movie, return code: 1

Handling missing values in 2D due to occlusions and other factors

Hi @garyzhao ,
Thank you for the repo.
If we have missing keypoints in the 2D predictions, due to occlusions by another object or only partial body being visible,
Is there any provision for handling this issue in SemGCN? suppose the missing keypoints are addressed as '-1' or '0' in the 2D keypoints list, how is your model going to handle that? We find that the predictions are bad when we have missing values (in terms of occlusions).
Thank you.

training time

In order to confirm whether the hardware conditions of my laboratory are sufficient, I would like to ask how long does it take to complete the training of the network, how many gpus have been used, and what is the configuration of the GPU.

Stacked Hourglass detections

Hello!You mention in your paper that you pre-train a network to realize 2D pose estimation, but it seems that this part of network was not given in your code, and the 2D estimated result was directly used. Could you please provide the code for the python version of this network? Thanks.

About input data of GCN

hi,
how to concatenate 2d data(16, 2) with perceptual feature?
what's the size of perceptual feature?
what's the size of input data for GCN?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.