GithubHelp home page GithubHelp logo

microsoft / sgn Goto Github PK

View Code? Open in Web Editor NEW
182.0 10.0 56.0 15.17 MB

This is the implementation of CVPR2020 paper “Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition”.

License: MIT License

Python 100.00%

sgn's Introduction

Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition (SGN)

Introduction

Skeleton-based human action recognition has attracted great interest thanks to the easy accessibility of the human skeleton data. Recently, there is a trend of using very deep feedforward neural networks to model the 3D coordinates of joints without considering the computational efficiency. In this work, we propose a simple yet effective semantics-guided neural network (SGN). We explicitly introduce the high level semantics of joints (joint type and frame index) into the network to enhance the feature representation capability. Intuitively, semantic information, i.e., the joint type and the frame index, together with dynamics (i.e., 3D coordinates) reveal the spatial and temporal configuration/structure of human body joints and are very important for action recognition. In addition, we exploit the relationship of joints hierarchically through two modules, i.e., a joint-level module for modeling the correlations of joints in the same frame and a frame-level module for modeling the dependencies of frames by taking the joints in the same frame as a whole. A strong baseline is proposed to facilitate the study of this field. With an order of magnitude smaller model size than most previous works, SGN achieves the state-of-the-art performance.

Figure 1: Comparisons of different methods on NTU60 (CS setting) in terms of accuracy and the number of parameters. Among these methods, the proposed SGN model achieves the best performance with an order of magnitude smaller model size.

Framework

image

Figure 2: Framework of the proposed end-to-end Semantics-Guided Neural Network (SGN). It consists of a joint-level module and a frame-level module. In DR, we learn the dynamics representation of a joint by fusing the position and velocity information of a joint. Two types of semantics, i.e., joint type and frame index, are incorporated into the joint-level module and the frame-level module, respectively. To model the dependencies of joints in the joint-level module, we use three GCN layers. To model the dependencies of frames, we use two CNN layers.

Prerequisites

The code is built with the following libraries:

Data Preparation

We use the dataset of NTU60 RGB+D as an example for description. We need to first dowload the NTU-RGB+D dataset.

  • Extract the dataset to ./data/ntu/nturgb+d_skeletons/
  • Process the data
 cd ./data/ntu
 # Get skeleton of each performer
 python get_raw_skes_data.py
 # Remove the bad skeleton 
 python get_raw_denoised_data.py
 # Transform the skeleton to the center of the first frame
 python seq_transformation.py

Training

# For the CS setting
python  main.py --network SGN --train 1 --case 0
# For the CV setting
python  main.py --network SGN --train 1 --case 1

Testing

  • Test the pre-trained models (./results/NTU/SGN/)
# For the CS setting
python  main.py --network SGN --train 0 --case 0
# For the CV setting
python  main.py --network SGN --train 0 --case 1

Reference

This repository holds the code for the following paper:

Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition. CVPR, 2020.

If you find our paper and repo useful, please cite our paper. Thanks!

@inproceedings{zhang2020semantics,
  title={Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition},
  author={Zhang, Pengfei and Lan, Cuiling and Zeng, Wenjun and Xing, Junliang and Xue, Jianru and Zheng, Nanning},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2020},
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

sgn's People

Contributors

lcl-2019 avatar microsoft-github-operations[bot] avatar microsoftopensource avatar shuidongliu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sgn's Issues

N-UCLA dataset

Thank you for such a great work. Recently I'd like to verify my model on the N-UCLA dataset. It would be great for me if you could provide the code to generate and preprocess the raw data. Thanks again ~

Some questions about the baseline accuracy in the paper

Firstly, thank very much for your contribution. Your paper gave me a lot of inspiration.

But I still have some questions. I ran the source code again in my environment, but I can't achieve the same accuracy especially in NTU_RGBD dataset. My environment is one GeForce RTX 2080 Ti, Python is 3.6.5, Pytorch is 1.2.0 and I use python virtual environment instead of Anaconda.
Here is my experiment results:
屏幕快照 2020-06-22 下午7 51 49

Please🙏, your replay is very important to me.

Great work!

self.spa = self.one_hot(bs, num_joint, self.seg) #(bs, 25, 20) --->[bs, 20, 25, 25]
self.spa = self.spa.permute(0, 3, 2, 1).cuda() #[bs, 20, 25, 25]--->[bs, 25, 25, 20]

Can the second code be:
self.spa = self.spa.permute(0, 2, 3, 1).cuda() #[bs, 20, 25, 25]--->[bs, 25, 25, 20]

The dimensions are the same. Will this make any difference? Thank you.

CUDA_VISIBLE_DEVICES differs for each host system

Hello an thank you for the contribution of your project.

Currently I am trying to run your implementation and faced and issue with the default CUDA_VISIBLE_DEVICES defined in main.py.

In my opinion it would be useful if you could mention in the installation, that a change of the CUDA_VISIBLE_DEVICE could be necessary to run the project depending on the host system (in my case a change from '1' to '0' was necessary). Otherwise it would not run out of the box.

Best regards,
Martin

Inference on data

Hi,
I'd like to use this code with pre-trained models to do inference on my own data.
In other words, I'd like to give a certain video in input to the framework and have the predicted action back as output.
Is it possible? And how can I dot it?
Thanks

where is h5 file?i want to run testing programme

OSError: Unable to open file (unable to open file: name = './data/ntu/NTU_CS.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
there is no h5 file in ./data/ntu

内存问题

想问一下怎么削减内存开销,加载数据集过程中内存开销太大了

How to understand the SS setting for SYSY?

Thanks for your great job!
In your paper, we konw that on the SYSU 3D Human-Object Interaction Dataset (SYSU), each subject perform each action one time. For the Same Subject (SS) setting, half of the samples of each activity are used for training and the rest for testing. Its means for the same subject, half of the frames of each action are used for training and the rest for testing, right?
Looking forward to your reply!
作者,您好。请问SS setting的意思是针对同一个对象所做的同一个动作视频的前一半帧用于训练,后一半帧用于测试吗?期待您的回复,非常感谢。

ImportError:No module named 'sklearn'

Hello, I got the following error when running the seq_transformation.py file:
Traceback (most recent call last):
File "seq_transformation.py",line 9,in
ImportError:No module named 'sklearn'.
How can I solve it?

SYSU dataset

Could you please release the preprocess code for SYSU dataset?
NTU60 dataset is kind of large to train due to the computational resources limitation.
It would be very kind of you to provide code for small dataset like SYSU.
Thanks a lot !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.