📝 ⇝ 🧏 Transcription

Repository for sign language transcription related models.

Ideally pose based models should use a shared large-pose-language-model, able to encode arbitrary pose sequence lengths, and pre-trained on non-autoregressive reconstruction.

_shared - includes shared utilities for all models
video_to_pose - performs pose estimation on a video
pose_to_segments - segments pose sequences
text_to_pose - animates poses using text
pose_to_text - generates text from poses

Installation

pip install git+git://github.com/sign-language-processing/transcription.git

Development Setup

# Update conda
conda update -n base -c defaults conda

# Create environment
conda create -y --name sign python=3.10
conda activate sign

# Install all dependencies, may cause a segmentation fault
pip install .[dev]

export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python

Example Usage: Video-to-Text

Let's start with having a video file of a sign language sentence, word, or conversation.

curl https://media.spreadthesign.com/video/mp4/13/93875.mp4 --output sign.mp4

Next, we'll use video_to_pose to extract the human pose from the video.

pip install mediapipe # depends on mediapipe
video_to_pose -i sign.mp4 --format mediapipe -o sign.pose

Now let's create an ELAN file with sign and sentence segments: (To demo this on a longer file, you can download a large pose file from here)

pip insatll pympi # depends on pympi to create elan files
pose_to_segments -i sign.pose -o sign.eaf --video sign.mp4

Next Steps

After looking at the ELAN file, adjusting where needed, we'll transcribe every sign segment into HamNoSys or SignWriting:

pose_to_text --notation=signwriting --pose=sign.pose --eaf=sign.eaf

After looking at the ELAN file again, fixing any mistakes, we finally translate each sentence segment into spoken language text:

text_to_text --sign_language=us --spoken_language=en --eaf=sign.eaf

Example Usage: Text-to-Video

Let's start with having a spoken language word, or sentence - "Hello World".

Next Steps

First, we'll translate it into sign language text, in SignWriting format:

text_to_text --spoken_language=en --sign_language=us \
  --notation=signwriting --text="Hello World" > sign.txt

Next, we'll animate the sign language text into a pose sequence:

text_to_pose --notation=signwriting --text=$(cat sign.txt) --pose=sign.pose

Finally, we'll animate the pose sequence into a video:

# Using Pix2Pix
pose_to_video --model=pix2pix --pose=sign.pose --video=sign.mp4 --upscale=true
# OR Using StyleGAN3
pose_to_video --model=stylegan3 --pose=sign.pose --video=sign.mp4 --upscale=true
# OR Using Mixamo
pose_to_video --model=mixamo --pose=sign.pose --video=sign.mp4

herochen7372 / transcription Goto Github PK

transcription's Introduction

📝 ⇝ 🧏 Transcription

Installation

Development Setup

Example Usage: Video-to-Text

Example Usage: Text-to-Video

transcription's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs