GithubHelp home page GithubHelp logo

sailordiary / lipnet-pytorch Goto Github PK

View Code? Open in Web Editor NEW
62.0 4.0 20.0 16.28 MB

"LipNet: End-to-End Sentence-level Lipreading" in PyTorch

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
lipreading deep-learning visual-speech-recognition pytorch-implementation cnn-architecture

lipnet-pytorch's Introduction

"LipNet: End-to-End Sentence-level Lipreading" in PyTorch

An unofficial PyTorch implementation of the model described in "LipNet: End-to-End Sentence-level Lipreading" by Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, and Nando de Freitas. Based on the official Torch implementation.

LipNet video

Usage

First, create symbolic links to where you store images and alignments in the data folder:

mkdir data
ln -s PATH_TO_ALIGNS data/align
ln -s PATH_TO_IMAGES data/images

Then run the program:

python3 train_lipnet.py

This trains on the "unseen speakers" split. To train on the "overlapped speakers" split:

python3 train_lipnet.py --test_overlapped

The overlapped speakers file list we use (list_overlapped.json) is exported directly from the authors' Torch implementation release here.

To monitor training progress:

tensorboard --logdir logs

The images folder should be organised as:

├── s1
│   ├── bbaf2n
│   │   ├── mouth_000.png
│   │   ├── mouth_001.png
...

And the align folder:

├── s1
│   ├── bbaf2n.align
│   ├── bbaf3s.align
│   ├── bbaf4p.align
...

That's it! You can specify the GPU to use in the program where the environment variable CUDA_VISIBLE_DEVICES is set. Feel free to play around with the parameters.

Dependencies

  • Python 3.x
  • PyTorch 1.1+ (for native CTCLoss and TensorBoard support; we highly recommend using nightly builds, because PyTorch CTC is quite buggy and often fixes are not reflected in due course.)
  • tensorboardX (if you are not using PyTorch 1.1+, or your TensorFlow version is incompatible with native PyTorch Tensorboard support)
  • ctcdecode (for beam search decoding)
  • torchsummary
  • progressbar2
  • editdistance
  • scikit-image
  • torchvision
  • pillow

Results

TODO

Pending

  • Add saliency visualisation
  • Add preprocessing code

lipnet-pytorch's People

Contributors

sailordiary avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

lipnet-pytorch's Issues

About the code

@sailordiary Hi, thank you for your code!
I want to run it on my computer, but I don't know how to start my work.
Can you tell me something about running this code? For example, what structure should the dataset be and where should I put the datasets and pretrained weight?
Thank you very much!
Can I add your wechat or QQ if you are convenient?

Sample data to test on

Hi! I want to test this code on some sample data. I found GRID but their videos and alignments aren't organized the way your code wants it. Do you have any sample data that is organized correctly?

spend times on training

@sailordiary hi, thank you for your code!
i want to know whether the model have a quick convergence speed? i use Dense3D model to train but find loss cannot quickly decrease .Do you know some good lipreading models spend little time on training,thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.