GithubHelp home page GithubHelp logo

pranoot / speech_separation Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bill9800/speech_separation

0.0 0.0 0.0 31.07 MB

Include some core functions and model to handle speech separation

License: MIT License

Python 100.00%

speech_separation's Introduction

speech_separation

This is a repository for speech separation tasks.

This project is highly inspired by the paper[1], and is still working to improve the performance.

Data

AVspeech dataset : contains 4700 hours of video segments, from a total of 290k YouTube videos.

Customized video and audio downloader are provided in data/. (based on youtube-dl,sox,ffmpeg)

Preprocessing

There are several preprocess functions in the lib. Including STFT, iSTFT, power-law compression, complex mask etc.

Apply MTCNN to detect face and correct it by checking the provided face center. [2]

The visual frames are transfered to 1792 (avg pooling layer) face embeddings with facenet pre-trained model[3].

Model

Audio part : Dilated CNN + Bidirectional LSTM.

Video part : (pretrained MTCNN + Facenet) + dilated CNN + Bidirectional LSTM.

Loss function : modified discriminative loss function inspired from paper[4].

Prediction

Apply complex ratio mask (cRM) to enhance phase spectrum. Maintain the quality during transformation by hyperbolic tangent fucntion.[5]

The model will be evaluated by signal-to-distortion ratio.

Reference

[1] Lookng to Listen at the Cocktail Party:A Speaker-Independent Audio-Visual Model for Speech Separation, A. Ephrat et al., arXiv:1804.03619v2 [cs.SD] 9 Aug 2018

[2] MTCNN face detection

[3] FaceNet Pretrained model

[4] Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation, P. Hunag et al,arXiv:1502.04149v4 [cs.SD] 1 Oct 2015

[5] Complex Ratio Masking for Monaural Speech Separation

speech_separation's People

Contributors

bill9800 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.