Light

sailordiary / lipnet-pytorch Goto Github PK

View Code? Open in Web Editor NEW

62.0 4.0 20.0 16.28 MB

"LipNet: End-to-End Sentence-level Lipreading" in PyTorch

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

lipreading deep-learning visual-speech-recognition pytorch-implementation cnn-architecture

lipnet-pytorch's Introduction

"LipNet: End-to-End Sentence-level Lipreading" in PyTorch

An unofficial PyTorch implementation of the model described in "LipNet: End-to-End Sentence-level Lipreading" by Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, and Nando de Freitas. Based on the official Torch implementation.

Usage

First, create symbolic links to where you store images and alignments in the data folder:

mkdir data
ln -s PATH_TO_ALIGNS data/align
ln -s PATH_TO_IMAGES data/images

Then run the program:

python3 train_lipnet.py

This trains on the "unseen speakers" split. To train on the "overlapped speakers" split:

python3 train_lipnet.py --test_overlapped

The overlapped speakers file list we use (list_overlapped.json) is exported directly from the authors' Torch implementation release here.

To monitor training progress:

tensorboard --logdir logs

The images folder should be organised as:

├── s1
│   ├── bbaf2n
│   │   ├── mouth_000.png
│   │   ├── mouth_001.png
...

And the align folder:

├── s1
│   ├── bbaf2n.align
│   ├── bbaf3s.align
│   ├── bbaf4p.align
...

That's it! You can specify the GPU to use in the program where the environment variable CUDA_VISIBLE_DEVICES is set. Feel free to play around with the parameters.

Dependencies

Python 3.x
PyTorch 1.1+ (for native CTCLoss and TensorBoard support; we highly recommend using nightly builds, because PyTorch CTC is quite buggy and often fixes are not reflected in due course.)
tensorboardX (if you are not using PyTorch 1.1+, or your TensorFlow version is incompatible with native PyTorch Tensorboard support)
ctcdecode (for beam search decoding)
torchsummary
progressbar2
editdistance
scikit-image
torchvision
pillow

Results

TODO

Pending

Add saliency visualisation
Add preprocessing code

lipnet-pytorch's People

Contributors

Stargazers

Watchers

lipnet-pytorch's Issues

请教 end-to-end audiovisual speech recognition的.npy的文件生成问题

我是天津大学的一名研究生，，目前在复现end-to-end audiovisual speech recognition 论文实验结果，我看见了您在博主的git上留言了，并发现了其中的错误，我想找您请教一下那篇论文生成.npy的文件问题

About the code

@sailordiary Hi, thank you for your code!
I want to run it on my computer, but I don't know how to start my work.
Can you tell me something about running this code? For example, what structure should the dataset be and where should I put the datasets and pretrained weight?
Thank you very much!
Can I add your wechat or QQ if you are convenient？

请问大佬pending的代码什么时候公开啊？

Sample data to test on

Hi! I want to test this code on some sample data. I found GRID but their videos and alignments aren't organized the way your code wants it. Do you have any sample data that is organized correctly?

spend times on training

@sailordiary hi, thank you for your code!
i want to know whether the model have a quick convergence speed? i use Dense3D model to train but find loss cannot quickly decrease .Do you know some good lipreading models spend little time on training,thank you

Running model on google colabs

Hello, can please tell me the steps to run this model in google collabs.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs