microsoft / sgn Goto Github PK

This is the implementation of CVPR2020 paper “Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition”.

License: MIT License

Python 100.00%

sgn's Introduction

Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition (SGN)

Introduction

Skeleton-based human action recognition has attracted great interest thanks to the easy accessibility of the human skeleton data. Recently, there is a trend of using very deep feedforward neural networks to model the 3D coordinates of joints without considering the computational efficiency. In this work, we propose a simple yet effective semantics-guided neural network (SGN). We explicitly introduce the high level semantics of joints (joint type and frame index) into the network to enhance the feature representation capability. Intuitively, semantic information, i.e., the joint type and the frame index, together with dynamics (i.e., 3D coordinates) reveal the spatial and temporal configuration/structure of human body joints and are very important for action recognition. In addition, we exploit the relationship of joints hierarchically through two modules, i.e., a joint-level module for modeling the correlations of joints in the same frame and a frame-level module for modeling the dependencies of frames by taking the joints in the same frame as a whole. A strong baseline is proposed to facilitate the study of this field. With an order of magnitude smaller model size than most previous works, SGN achieves the state-of-the-art performance.

Figure 1: Comparisons of different methods on NTU60 (CS setting) in terms of accuracy and the number of parameters. Among these methods, the proposed SGN model achieves the best performance with an order of magnitude smaller model size.

Framework

Figure 2: Framework of the proposed end-to-end Semantics-Guided Neural Network (SGN). It consists of a joint-level module and a frame-level module. In DR, we learn the dynamics representation of a joint by fusing the position and velocity information of a joint. Two types of semantics, i.e., joint type and frame index, are incorporated into the joint-level module and the frame-level module, respectively. To model the dependencies of joints in the joint-level module, we use three GCN layers. To model the dependencies of frames, we use two CNN layers.

Prerequisites

The code is built with the following libraries:

Python 3.6
Anaconda
PyTorch 1.3

Data Preparation

We use the dataset of NTU60 RGB+D as an example for description. We need to first dowload the NTU-RGB+D dataset.

Extract the dataset to ./data/ntu/nturgb+d_skeletons/
Process the data

 cd ./data/ntu
 # Get skeleton of each performer
 python get_raw_skes_data.py
 # Remove the bad skeleton 
 python get_raw_denoised_data.py
 # Transform the skeleton to the center of the first frame
 python seq_transformation.py

Training

# For the CS setting
python  main.py --network SGN --train 1 --case 0
# For the CV setting
python  main.py --network SGN --train 1 --case 1

Testing

Test the pre-trained models (./results/NTU/SGN/)

# For the CS setting
python  main.py --network SGN --train 0 --case 0
# For the CV setting
python  main.py --network SGN --train 0 --case 1

Reference

This repository holds the code for the following paper:

Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition. CVPR, 2020.

If you find our paper and repo useful, please cite our paper. Thanks!

@inproceedings{zhang2020semantics,
  title={Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition},
  author={Zhang, Pengfei and Lan, Cuiling and Zeng, Wenjun and Xing, Junliang and Xue, Jianru and Zheng, Nanning},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2020},
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

sgn's People

Contributors

Stargazers

Watchers

sgn's Issues

The result is higher than that in the paper

As Cross Subject setting, the accuracy is 89%, but the result in the paper is 86.6%.
And paper use two GCN layers, but you use three...

Expriment of Kinect 400 or Kinect 600 dataset

Have you operated experiments on Kinect 400 or 600 dataset? Could you please release the preprocess code for Kinect 400 or Kinect 600 dataset?
Thanks a lot !

N-UCLA dataset

Thank you for such a great work. Recently I'd like to verify my model on the N-UCLA dataset. It would be great for me if you could provide the code to generate and preprocess the raw data. Thanks again ~

Some questions about the baseline accuracy in the paper

Firstly, thank very much for your contribution. Your paper gave me a lot of inspiration.

But I still have some questions. I ran the source code again in my environment, but I can't achieve the same accuracy especially in NTU_RGBD dataset. My environment is one GeForce RTX 2080 Ti, Python is 3.6.5, Pytorch is 1.2.0 and I use python virtual environment instead of Anaconda.
Here is my experiment results:

Please🙏, your replay is very important to me.

'Linear' object has no attribute 'log_softmax'

Excuseme,in zhe 267th row of main.py file, pred = pred.log_softmax(dim=self.dim) has a error,pred is a linear(FC),did you have such error?

Expriment of WorkoutSU-10 dataset（使用WorkoutSU-10跑该模型）

Does anyone use this model to run the WorkoutSU-10 dataset?
Why can't the loss of val drop during training?

Great work!

self.spa = self.one_hot(bs, num_joint, self.seg) #(bs, 25, 20) --->[bs, 20, 25, 25]
self.spa = self.spa.permute(0, 3, 2, 1).cuda() #[bs, 20, 25, 25]--->[bs, 25, 25, 20]

Can the second code be:
self.spa = self.spa.permute(0, 2, 3, 1).cuda() #[bs, 20, 25, 25]--->[bs, 25, 25, 20]

The dimensions are the same. Will this make any difference? Thank you.

How to train skeleton-400 dataset?

I have problem in training skeleton-400 dataset , could you please send the related code for me ? My e-mail is [email protected]

How to improve the performance on Skeleton-Kinetics?

I have run the code on Skeleton-Kinetics with seg = 20, but the performace is not good, top-1 accuracy is only 26.679%. How can I improve the performance？ Could you give me same advice?

How to implement skeleton detection of people in pictures or videos?

Hello，how to implement skeleton detection of people in pictures or videos?

CUDA_VISIBLE_DEVICES differs for each host system

Hello an thank you for the contribution of your project.

Currently I am trying to run your implementation and faced and issue with the default CUDA_VISIBLE_DEVICES defined in main.py.

In my opinion it would be useful if you could mention in the installation, that a change of the CUDA_VISIBLE_DEVICE could be necessary to run the project depending on the host system (in my case a change from '1' to '0' was necessary). Otherwise it would not run out of the box.

Best regards,
Martin

Inference on data

Hi,
I'd like to use this code with pre-trained models to do inference on my own data.
In other words, I'd like to give a certain video in input to the framework and have the predicted action back as output.
Is it possible? And how can I dot it?
Thanks

how to understand cs setting and cv setting?

where is h5 file?i want to run testing programme

OSError: Unable to open file (unable to open file: name = './data/ntu/NTU_CS.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
there is no h5 file in ./data/ntu

内存问题

想问一下怎么削减内存开销，加载数据集过程中内存开销太大了

how to train on other dataset?

如何在自己的数据集上复现该代码？
期待您的回复
谢谢

How to train the model on SYSU using parameters pretrained on NTU60?

Thanks for your great job!
When reading your paper, I notice the table as bellow

I want to know how to get the result of SGN*!

Looking forward to your reply!
Thank you very much!

对于非序列数据的情况，精度可以吗？

@microsoft 对于非序列数据的情况，精度可以吗？

How to understand the SS setting for SYSY?

Thanks for your great job!
In your paper, we konw that on the SYSU 3D Human-Object Interaction Dataset (SYSU), each subject perform each action one time. For the Same Subject (SS) setting, half of the samples of each activity are used for training and the rest for testing. Its means for the same subject, half of the frames of each action are used for training and the rest for testing, right?
Looking forward to your reply!
作者，您好。请问SS setting的意思是针对同一个对象所做的同一个动作视频的前一半帧用于训练，后一半帧用于测试吗？期待您的回复，非常感谢。