GithubHelp home page GithubHelp logo

fraluca / stsgcn Goto Github PK

View Code? Open in Web Editor NEW
89.0 2.0 25.0 16.13 MB

Repository for "Space-Time-Separable Graph Convolutional Network for Pose Forecasting" (ICCV 2021)

License: MIT License

Python 100.00%

stsgcn's Introduction

Space-Time-Separable Graph Convolutional Network for Pose Forecasting (Accepted to ICCV '21)

Theodoros Sofianos†, Alessio Sampieri†, Luca Franco and Fabio Galasso

Sapienza University of Rome, Italy

[Paper] [Website] [Talk]

Abstract

Human pose forecasting is a complex structured-data sequence-modelling task, which has received increasing attention, also due to numerous potential applications. Research has mainly addressed the temporal dimension as time series and the interaction of human body joints with a kinematic tree or by a graph. This has decoupled the two aspects and leveraged progress from the relevant fields, but it has also limited the understanding of the complex structural joint spatio-temporal dynamics of the human pose.

Here we propose a novel Space-Time-Separable Graph Convolutional Network (STS-GCN) for pose forecasting. For the first time, STS-GCN models the human pose dynamics only with a graph convolutional network (GCN), including the temporal evolution and the spatial joint interaction within a single-graph framework, which allows the cross-talk of motion and spatial correlations. Concurrently, STS-GCN is the first space-time-separable GCN: the space-time graph connectivity is factored into space and time affinity matrices, which bottlenecks the space-time cross-talk, while enabling full joint-joint and time-time correlations. Both affinity matrices are learnt end-to-end, which results in connections substantially deviating from the standard kinematic tree and the linear-time time series.

In experimental evaluation on three complex, recent and large-scale benchmarks, Human3.6M [Ionescu et al. TPAMI'14], AMASS [Mahmood et al. ICCV'19] and 3DPW [Von Marcard et al. ECCV'18], STS-GCN outperforms the state-of-the-art, surpassing the current best technique [Mao et al. ECCV'20] by over 32% in average in the most difficult long-term predictions, while only requiring 1.7% of its parameters. We explain the results qualitatively and illustrate the graph attention by the factored joint-joint and time-time learnt graph connections.

--------

⚠️Update about results and evaluation metric [08/07/2022]

A problem arises because no prior human pose forecasting work has explicitly written the test MPJPE metric. [Mao et al., 2020, Mao et al., 2019] have specified the MPJPE for the learning loss, and they have referred to the (same) MPJPE for testing, which is however different.

In [Mao et al., 2020], Eq. (6), they define MPJPE as

$$MPJPE = \frac{1}{J(M+T)}\sum_{t=1}^{M+T} \sum_{j=1}^J ||\hat{\textbf{p}}{t,j} - \textbf{p}{t,j} ||^2,$$

which sums up all errors at all frames up to the prediction T.

Also in [Ionescu et al., 2014], Eq. (8), they define the MPJPE as:

$$MPJPE(t) = \frac{1}{J} \sum_{j=1}^J ||\hat{\textbf{p}{t,j} }- \textbf{p}{t,j} ||^2,$$

and they state: "For a set of frames the error is the average over the MPJPEs of all frames."

We have therefore interpreted the test MPJPE to be:

$$MPJPE = \frac{1}{J T}\sum_{t=M+1}^{M+T} \sum_{j=1}^J ||\hat{\textbf{p}}{t,j} - \textbf{p}{t,j} ||^2,$$

which is implemented in our testing code. Note: coding has been done in good faith, and in good faith we have open-sourced the project here.

As noted in this thread, the code provided by [Mao et al., 2020] actually considers only the target temporal horizon, not the average up to that time.

Running the test code of [Mao et al., 2020], short-term (400ms) and long-term (1000ms) errors for the Human3.6M dataset for STS-GCN are:

Here we report this performance and specify the test MPJPE error, to avoid future discrepancies.

image


Install dependencies:

 $ pip install -r requirements.txt

Get the data

Human3.6m in exponential map can be downloaded from here.

Directory structure:

H3.6m
|-- S1
|-- S5
|-- S6
|-- ...
`-- S11

AMASS from their official website.

Directory structure:

amass
|-- ACCAD
|-- BioMotionLab_NTroje
|-- CMU
|-- ...
`-- Transitions_mocap

3DPW from their official website.

Directory structure:

3dpw
|-- imageFiles
|   |-- courtyard_arguing_00
|   |-- courtyard_backpack_00
|   |-- ...
`-- sequenceFiles
    |-- test
    |-- train
    `-- validation

Put the all downloaded datasets in ../datasets directory.

Train

The arguments for running the code are defined in parser.py. We have used the following commands for training the network,on different datasets and body pose representations(3D and euler angles):

 python main_h36_3d.py --input_n 10 --output_n 25 --skip_rate 1 --joints_to_consider 22 
 python main_h36_ang.py --input_n 10 --output_n 25 --skip_rate 1 --joints_to_consider 16 
  python main_amass_3d.py --input_n 10 --output_n 25 --skip_rate 5 --joints_to_consider 18 

Test

To test on the pretrained model, we have used the following commands:

python main_h36_3d.py --input_n 10 --output_n 25 --skip_rate 1 --joints_to_consider 22 --mode test --model_path ./checkpoints/CKPT_3D_H36M
python main_h36_ang.py --input_n 10 --output_n 25 --skip_rate 1 --joints_to_consider 16 --mode test --model_path ./checkpoints/CKPT_ANG_H36M
 python main_amass_3d.py --input_n 10 --output_n 25 --skip_rate 5 --joints_to_consider 18 --mode test --model_path ./checkpoints/CKPT_3D_AMASS

Visualization

For visualizing from a pretrained model, we have used the following commands:

 python main_h36_3d.py --input_n 10 --output_n 25 --skip_rate 1 --joints_to_consider 22 --mode viz --model_path ./checkpoints/CKPT_3D_H36M --n_viz 5
 python main_h36_ang.py --input_n 10 --output_n 25 --skip_rate 1 --joints_to_consider 16 --mode viz --model_path ./checkpoints/CKPT_ANG_H36M --n_viz 5
 python main_amass_3d.py --input_n 10 --output_n 25 --skip_rate 5 --joints_to_consider 18 --mode viz --model_path ./checkpoints/CKPT_3D_AMASS --n_viz 5

Citing

If you use our code,please cite our work

@misc{sofianos2021spacetimeseparable,
     title={Space-Time-Separable Graph Convolutional Network for Pose Forecasting}, 
     author={Theodoros Sofianos and Alessio Sampieri and Luca Franco and Fabio Galasso},
     year={2021},
     eprint={2110.04573},
     archivePrefix={arXiv},
     primaryClass={cs.CV}
}

Acknowledgments

Some of our code was adapted from HisRepsItself by Wei Mao.

The authors wish to acknowledge Panasonic for partially supporting this work and the project of the Italian Ministry of Education, Universities and Research (MIUR) “Dipartimenti di Eccellenza 2018-2022”.

License

MIT license

stsgcn's People

Contributors

alessiosam avatar fraluca avatar theodoriss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

stsgcn's Issues

Evaluation metrics

Hello,
It is an awesome work and achievement.
Thank you for providing the code.
I would like to ask about the unit in the evaluation method.
In the paper it is use the mm unit. If I am not wrong by reading the code of the utils/loss_funcs.py,
It does not show any unit in mm or pixels. Am I wrong? or do you use the pixels first and then convert it to mm in the paper?
Thank you.
Best regards.

Complete full metrics table from paper

Hi,
I understand the issue with the metric I already read all points you mentioned. However, would be nice if you also post/write somewhere (or update the Arxiv PDF) the full metric tables you have in the paper. because it's quiet hard to check values with many papers that differ related to 80ms, 160ms, and so on for all datasets you used in the paper.

Thank you for the effort and all corrections you made.

I failed to get the prediction results in the paper

I failed to get the prediction results in the paper, while used the proposeed code and pretrained model. I speculate that a possible reason is: Using my own test code, due to its absence in the public code.

My test method is as follows:
Calculate the average 3D joints error in each action per time step as shown in paper. For instance, In 'walking’, I calculate the mean 3D error at 40ms, 160ms, 320ms and 400ms. The perdiction errors are significantly worse than the results published in the paper.

To address above issue, I think the best way is to publish your test code in the paper.

Which evaluation standard is correct?

  1. In your code, 80ms represents the average error of the first frame and the second frame, but some people think that 80ms only represents the error of the second frame, which is correct?
  2. Why is your code saving the model every 10 iterations instead of saving the best model

Models for training

Dear author,

First of all congratulations for the paper and for the results.
As I have checked from your "checkpoints" directory, it seems that we have one weight for for each frame that we want to predict. Is that right? That means, for predicting frames from 4 to 25 do we need to use different weights, or with one we can obtain all the results published in the paper?

Really thanks for your time

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.