GithubHelp home page GithubHelp logo

python-repository-hub / tspnet Goto Github PK

View Code? Open in Web Editor NEW

This project forked from verashira/tspnet

0.0 0.0 0.0 1.63 MB

TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation

License: MIT License

Shell 0.27% C++ 0.90% Python 95.90% Perl 0.64% Lua 0.24% Cuda 2.06%

tspnet's Introduction

TSPNet

TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation

By Dongxu Li*, Chenchen Xu*, Xin Yu, Kaihao Zhang, Benjamin Swift, Hanna Suominen and Hongdong Li

(* Authors contributed equally.)

The repository contains the implementation of TSPNet. Preprocessed dataset, video features and the inference results are available at Google Drive.

We thank authors of fairseq for their efforts.

Rquirements

  • PyTorch version >= 1.4.0
  • Python version >= 3.6
  • For training new models, you'll also need NVIDIA GPU and (optionally) NCCL
  • (optional) BPEMB, if you prepare datasets by yourself (see below)

Install from source

Install the project from source and develop locally:

cd TSPNet/
pip install --editable .

Getting started

Preprocessing

Download the preprocessed dataset, and arrange them as:

TSPNet/
├── i3d-features/
│   ├── span=8_stride=2
│   ├── span=12_stride=2
│   └── span=16_stride=2
├── data-bin/
│   └── phoenix2014T/
│       └── sp25000/
│   
├── README.md
├── run-scripts/
└── test-scripts/
  • i3d-features: the i3d output features of input videos
  • data-bin: the preprocessed translation texts

Training

Go to the run_scripts folder and start training:

cd TSPNet/run_scripts
SAVE_DIR=CHECKPOINT_PATH bash run_phoenix_pos_embed_sp_test_3lvl.sh

Testing

After training, you can make inference on the testing dataset by specifying a checkpoint file.

Note, CHECKPOINT_FILE_PATH points to a saved checkpoint file, rather the CHECKPOINT folder.

CHECKPOINT=CHECKPOINT_FILE_PATH bash test_phoenix_pos_embed_sp_test_3lvl.sh

The script reports multiple metrics, including the ROUGE-L and BLEU-{n} as reported in the paper.

Alternative instructions for preparing datasets by yourself

  1. Text

Install German word embeddings BPEMB by pip install bpemb.

Preprocess the translation texts using preprocess_sign.py to BPE, repeatedly for each split, for example:

python preprocess_sign.py --save-vecs data/processed/emb data/ori/phoenix2014T.train.de data/processed/train.de

python preprocess_sign.py data/ori/phoenix2014T.test.de data/processed/test.de
  1. Vocabulary

Generate the dictionary file dict.de.txt.

fairseq-preprocess --source-lang de --target-lang de --trainpref data/processed/train --testpref data/processed/test --destdir data-bin/ --dataset-impl raw
  1. Video Prepare sign videos and the corresponding video features (e.g. by pretrained i3d networks), and create a json file for each split (e.g. train.sign-de.sign). The json file should be of the format below. It should have the same number of entries as the text file, where each entry corresponds to the sentence at the same line no in the prepared text file.
[
    {
        "ident": "VIDEO_ID",
        "size": "64  // length of video features"
    },
    "..."
]
  1. Finally, arrange text files, video json files, word embeddings and vocabulary files into a folder as below:
data-bin/
├── train.sign-de.sign
├── train.sign-de.de
│
├── test.sign-de.sign
├── test.sign-de.de
│
├── emb
└── dict.de.txt

Citations

Please cite our paper and WLASL dataset (for pre-training) as:

@inproceedings{li2020tspnet,
	title        = {TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation},
	author       = {Li, Dongxu and Xu, Chenchen and Yu, Xin and Zhang, Kaihao and Swift, Benjamin and Suominen, Hanna and Li, Hongdong},
	year         = 2020,
	booktitle    = {Advances in Neural Information Processing Systems},
	volume       = 33
}

@inproceedings{li2020word,
    title={Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison},
    author={Li, Dongxu and Rodriguez, Cristian and Yu, Xin and Li, Hongdong},
    booktitle={The IEEE Winter Conference on Applications of Computer Vision},
    pages={1459--1469},
    year={2020}
}

Other works you might be interested to look at:

@inproceedings{li2020transferring,
  title={Transferring cross-domain knowledge for video sign language recognition},
  author={Li, Dongxu and Yu, Xin and Xu, Chenchen and Petersson, Lars and Li, Hongdong},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={6205--6214},
  year={2020}
}

tspnet's People

Contributors

dxli94 avatar verashira avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.