self-supervised-representation-learning-with-spatial-temporal-consistency-for-slr's Introduction

Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition

Weichao Zhao, Hezhen Hu, Wengang Zhou, Min Wang and Houqiang Li

This repository includes Python (PyTorch) implementation of this paper.

Under review in TIP2024

Requirements

python==3.8.13
torch==1.8.1+cu111
torchvision==0.9.1+cu111
tensorboard==2.9.0
scikit-learn==1.1.1
tqdm==4.64.0
numpy==1.22.4

Training and Testing

Please refer to the bash scripts

Datasets

Download the original datasets, including SLR500, NMFs_CSL, WLASL and MSASL
Utilize the off-the-shelf pose estimator MMPose with the setting of Topdown Heatmap + Hrnet + Dark on Coco-Wholebody to extract the 2D keypoints for sign language videos.
The final data is formatted as follows:

    Data
    ├── NMFs_CSL
    ├── SLR500
    ├── WLASL
    └── MSASL
        ├── Video
        ├── Pose
        └── Annotations

Pretrained Model

You can download the pretrained model from this link: pretrained model on four ISLR datasets

Acknowledgment

The framework of our code is based on skeleton-contrast.

self-supervised-representation-learning-with-spatial-temporal-consistency-for-slr's People

Contributors

Stargazers

Watchers

self-supervised-representation-learning-with-spatial-temporal-consistency-for-slr's Issues

How is the trainlist01.txt, vallist01.txt and testlist01.txt file generated?

I noticed that MSASL.py under feeder/single_dataset path and other files have trainlist01.txt, vallist01.txt and testlist01.txt files, but none of them are mentioned in the code how the files are generated. Could the author please provide clear information on how these files are created?

Recommend Projects