GithubHelp home page GithubHelp logo

lsa-t's Introduction

LSA-T: The first continuous LSA dataset

LSA-T is the first continuous Argentinian Sign Language (LSA) dataset. It contains ~22 hs of video extracted from the CN Sordos YouTube channel with spanish subtitles, joints for each signer and the infered signer if there is more than one person in a clip. Videos are in 30 FPS full HD (1920x1080). We've also developed Seni.ar, a platform for exploring, validating and augmenting LSA-T.

Files and format

This file contains labels and extra metadata for the 8459 clips. It contains the following columns:

  • id: id of the clip. It's used as the name of the mp4 file and the key for its keypoints.
  • label: the spanish translation of the clip.
  • video: title of the video which the clip belongs to.
  • playlist: title of the playlist which the clip belongs to.
  • start: time in seconds where the clip starts (respect to the video).
  • end: time in seconds where the clip ends (respect to the video).
  • duration: duration of the clip in seconds.
  • splits: clips are cropped according to sentences. Sentences might be split into many pieces of subtitles, so data of this splits is kept as a list of tuples containing (piece_of_subtitle, start, end). start and end are stored as time in seconds respect to the whole video.
  • prev_delta: as sometimes cropping clips according to the subtitles timestamp resulted in cropping signs, a time delta was added. By default is of 0.5 secs, unless there is another clip that ends nearer the video, in that case, the distance of the ending time of the previous clip and current clip's starting time it's used as delta.
  • post_delta: same as prev_delta but at the end of each clip.
  • signers_amount: amount of people (potential signers) present in the clip (detected by YOLOv8).
  • infered_signer: id of the person infered to be the signer (used to identify it's keypoints in the keypoints file).
  • infered_signer_confidence: confidence of the semi automatic signer inference (from 0 to 1).
  • movement_per_signer: amount of movement computed for each signer.

Contains the clips in mp4, full HD 30 FPS. Their name matches their id in the labels.csv file.

Contains the joints for each person in each clip and their bounding boxes. The hdf5 dataset contains a group for each clip (accesed by the clip's id) and then a group for each signer (ids as signer_i).

Each keypoint is represented as x,y,z,confidence.

Statistics and comparison with other DBs (original paper version)

LSA-T PHOENIX* SIGNUM CSL GSL KETI
language Spanish German German Chinese Greek Korean
sign language LSA GSL GSL CSL GSL KLS
real life Yes Yes No No No No
signers 103 9 25 50 7 14
duration (h) 21.78 10.71 55.3 100+ 9.51 28
# samples 14,880 7096 33,210 25,000 10,295 14,672
# unique sentences 14,254 5672 780 100 331 105
% unique sentences 95.79% 79.93% 2.35% 0.4% 3.21% 0.71%
vocab. size (w) 14,239 2887 N/A 178 N/A 419
# singletons (w) 7150 1077 0 0 0 0
% singletons (w) 50.21% 37.3% 0% 0% 0% 0%
vocab. size (gl) - 1066 450 - 310 524
# singletons (gl) - 337 0 - 0 0
# singletons (gl) - 31.61% 0% - 0% 0%
resolution 1920x1080 210x260 776x578 1920x1080 848x480 1920x1080
fps 30 25 30 30 30 30

Usage

Deprecated, working in a new version of the loader

This repository can be installed via pip and contains the LSA_Dataset class (in lsat.dataset.LSA_Dataset module). This class inherits from the Pytorch dataset class and implements all necessary methods for using it with a Pytorch dataloader. It also manages the downloading and extraction of the database.

Also, useful transforms for the clips and keypoints are provided in lsat.dataset.transforms

Citation

@inproceedings{dal2022lsa,
    title={Lsa-t: The first continuous argentinian sign language dataset for sign language translation},
    author={Dal Bianco, Pedro and R{\'\i}os, Gast{\'o}n and Ronchetti, Franco and Quiroga, Facundo and Stanchi, Oscar and Hasperu{\'e}, Waldo and Rosete, Alejandro},
    booktitle={Ibero-American Conference on Artificial Intelligence},
    pages={293--304},
    year={2022},
    organization={Springer}
}

lsa-t's People

Contributors

pedroodb avatar

Stargazers

Davi Neves avatar Lourdes Aybar avatar Facundo Quiroga avatar  avatar Tomás Barak avatar Alexey Prikhodko avatar  avatar Gaston Rios avatar Fabian Martinez avatar

Watchers

Facundo Quiroga avatar  avatar

lsa-t's Issues

Agregar playlist a la que pertenece cada video

Agregar a los metadatos de cada clip los campos "playlist" y "playlist_url" con la playlist a la que originalmente pertenecen. Siendo que algunas playlist tienen videos con mucha más producción que otros e interpretes más fijos puede que sean más difíciles que otras.

What sign language label?

Am I understanding your labels correctly - only subtitles in the spoken language? And also every gloss is in line with every sign? Phonological feature of sign?

Inquiry about frames alignment

I recently came across your repository for the LSA-T sign language dataset.
However, I noticed that the dataset does not explicitly include start and end times for each gloss (frames alignment), which is crucial for my current project.

I was wondering if such frames alignment data is available, even if not currently included in the repository. If so, would it be possible to access it? Any additional information or guidance on aligning glosses with their respective video timestamps would be incredibly helpful.

Thank you for your time and for sharing your work with the community. I look forward to your response!

2d or 3d annotations

I noticed that you included the landmarks of some features, such as hands.I was wondering if they are in 3d or 2d. I havent downloaded the dataset yet because it would take a while and was hoping to get an answer before I started.

Downloading the dataset

Hello there, I have installed the repository like 'pip install lsat'. Then I have tried running python PytorchDataset.py but I get the error ModuleNotFoundError: No module named 'lsat.typing'

Is there any instructions we can follow to download the dataset? Thank you very much~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.