SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark

Zhengdi Yu^1,2 · Shaoli Huang² · Yongkang Cheng² · Tolga Birdal¹

¹Imperial College London, ²Tencent AI Lab

SignAvatars is the first large-scale 3D sign language holistic motion dataset with mesh annotations, which comprises 8.34M precise 3D whole-body SMPL-X annotations, covering 70K motion sequences. The corresponding MANO hand version is also provided.

News 🚩

[2023/11/2] Paper is now available. ⭐

TODO

Initial release of annotations.
Release the visualization code.
Release Videos after the agreement of video owners.
Enrich the dataset

Application examples on SLP


SLP from HamNoSys	SLP from Word

SLP from ASL	SLP from GSL

Instruction 📜

Dataset description

Dataset download

For annotations, please fill out this form to request access to use SignAvatars for non-commercial research purposes. By submitting the form, you have read and agree to the terms of the Data license and you will receive an email and please download the motion and text labels from the provided downloading links.

We do not distribute the original RGB videos due to license. We provide high-quality 3D motion labels annotated by our team. For the original video download of the 4 subsets, please follow the instructions below:

For ASL subset, please download Green Screen RGB clips from how2sign dataset and put into language2motion/.
For HamNoSys subset, please download the original videos using the data.json from the downloaded HamNoSys/data.json.
For GSL subset, please follow the official instruction to download and put into language2motion/.
For Word subset, please follow the official instruction to download and put into word2motion/.

Dataset Structure

After downloading the data, please construct the layout of dataset/ as follows:

|-- dataset
|   |-- hamnosys2motion/  
|   |   |-- images/
|   |   |   |-- <video_name>/
|   |   |   |   |-- <frame_number.jpg>   [ starts from 000000.jpg ]
|   |   |-- videos/
|   |   |   |-- <video_name>/  [ ..... ]   
|   |   |-- annotations/
|   |   |   |-- <annotation_type>  [ SMPL-X, MANO, ...]
|   |   |   |   |-- <video_name.pkl>
|   |   |-- data.json  [Text annotations]
|   |   |-- split.pkl
|   |   |
|   |-- language2motion/  
|   |   |-- images/
|   |   |   |-- <video_name>/
|   |   |   |   |-- <frame_number.jpg>   [ starts from 000000.jpg ]
|   |   |-- videos/
|   |   |   |-- <video_name>/  [ ..... ]   
|   |   |-- annotations/
|   |   |   |-- <annotation_type>  [ SMPL-X, MANO, ...]
|   |   |   |   |-- <video_name.pkl>
|   |   |-- text/
|   |   |   |-- how2sign_train.csv   [Text annotations]
|   |   |   |-- how2sign_test.csv    [Text annotations]
|   |   |   |-- how2sign_val.csv     [Text annotations]
|   |   |   |-- PHOENIX-2014-T.train.corpus.csv     [Text annotations]
|   |   |   |-- PHOENIX-2014-T.test.corpus.csv     [Text annotations]
|   |   |
|   |-- word2motion/  
|   |   |-- images/
|   |   |   |-- <video_name>/
|   |   |   |   |-- <frame_number.jpg>   [ starts from 000000.jpg ]
|   |   |-- videos/
|   |   |   |-- <video_name>/  [ ..... ]   
|   |   |-- annotations/
|   |   |   |-- <annotation_type>  [ SMPL-X, MANO, ...]
|   |   |   |   |-- <video_name.pkl>
|   |   |-- text/
|   |   |   |-- WLASL_v0.3.json   [Text annotations]
|   |   |
|-- common
|   |-- utils
|   |   |-- human_model_files
|   |   |   |-- smpl
|   |   |   |   |-- SMPL_NEUTRAL.pkl
|   |   |   |   |-- SMPL_MALE.pkl
|   |   |   |   |-- SMPL_FEMALE.pkl
|   |   |   |-- smplx
|   |   |   |   |-- MANO_SMPLX_vertex_ids.pkl
|   |   |   |   |-- SMPL-X__FLAME_vertex_ids.npy
|   |   |   |   |-- SMPLX_NEUTRAL.pkl
|   |   |   |   |-- SMPLX_to_J14.pkl
|   |   |   |   |-- SMPLX_NEUTRAL.npz
|   |   |   |   |-- SMPLX_MALE.npz
|   |   |   |   |-- SMPLX_FEMALE.npz
|   |   |   |-- mano
|   |   |   |   |-- MANO_LEFT.pkl
|   |   |   |   |-- MANO_RIGHT.pkl

In common/ folder, human_model_files contains smpl, smplx, mano, and flame 3D model files. Download the files from [SMPL_NEUTRAL] [SMPL_MALE.pkl and SMPL_FEMALE.pkl] [smplx] [SMPLX_to_J14.pkl] [mano]. Alternatively, you can directly download our packed model files from Dropbox and unzip to human_model_files.

Data Description

SMPL-X Annotation

In each of the .pkl files, the keys are in the format:

width, height: (1,) (1,) the video width and height
focal: (num_frames, 2)
princpt: (num_frames, 2)
2d: (num_frames, 106, 3)
pred2d: (num_frames, 106, 3)
total_valid_index: (num_frames,)
left_valid: (num_frames,)
right_valid: (num_frames,)
bb2img_trans: (num_frames, 2, 3)
smplx: (num_frames, 182)
unsmooth_smplx: (num_frames, 169)

For motion generation and motion prior learning tasks, you should use the data in smplx for better stability, whilst unsmooth_smplx can be used for pose estimation tasks. Please refer to code for more details. For example, you can extract smplx parameters as follow:

all_parameters = results_dict['smplx']
root_pose, body_pose, left_hand_pose, right_hand_pose, jaw_pose, shape, expression, cam_trans = \
all_parameters[:, :3], all_parameters[:, 3:66], all_parameters[:, 66:111], all_parameters[:, 111:156], \
all_parameters[:, 156:159], all_parameters[:, 159:169], all_parameters[:, 169:179], all_parameters[:, 179:182]

all_parameters = results_dict['unsmooth_smplx']
root_pose, body_pose, lhand_pose, rhand_pose, shape, cam_trans = \
all_parameters[:, :3], all_parameters[:, 3:66], all_parameters[:, 66:111], all_parameters[:, 111:156], \
all_parameters[:, 156:166], all_parameters[:, 166:169]

root_pose: (num_frames, 3)
body_pose: (num_frames, 63)
expression: (num_frames, 10)
jaw_pose: (num_frames, 3)
betas: (num_frames, 10)
left_hand_pose: (num_frames, 45)
right_hand_pose: (num_frames, 45)

Please note that the transl is set to 0 in these subsets as there is no root position change in the video.

Text Annotations

HamNoSys2Motion

The signers are standing and doing a single sign.
Each video is annotated with hamnosys glyph and hamnosys text:
- "hamsymmlr,hamflathand,hamextfingero,hampalml"
The average length of the video is 60 frames with 24 fps

Language2Motion

The signers are sitting and doing multiple signs.
Each video is annotated with natural language translations:
- "So we're going to start again on this one."
The average length of the video is 162 frames with 24 fps

Word2Motion

The signers are standing and doing a single sign.
Each video is annotated with word-level English:
The average length of the video is 57 frames with 24 fps

Citation

@inproceedings{yu2023signavatars,
  title = {SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark},
  author = {Yu, Zhengdi and Huang, Shaoli and Cheng, Yongkakng and Birdal, Tolga},
  journal = {arXiv preprint arXiv:2310.20436},
  month     = {November},
  year      = {2023}
  }

Contact

For technical questions, please contact [email protected] or [email protected]. For license, please contact [email protected].

zhengdiyu / signavatars Goto Github PK

signavatars's Introduction

SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark

News 🚩

TODO

Application examples on SLP

Instruction 📜

Dataset description

Dataset download

Dataset Structure

Data Description

SMPL-X Annotation

Text Annotations

HamNoSys2Motion

Language2Motion

Word2Motion

Citation

Contact

signavatars's People

Contributors

Stargazers

Watchers

Forkers

signavatars's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs