GithubHelp home page GithubHelp logo

izilotti / punctuation-restoration Goto Github PK

View Code? Open in Web Editor NEW

This project forked from k9luo/punctuation-restoration

0.0 0.0 0.0 4.75 MB

A TensorFlow Implementation of Punctuation Restoration.

License: MIT License

Shell 0.16% Python 52.89% Jupyter Notebook 46.95%

punctuation-restoration's Introduction

Punctuation Restoration

Requirements

Imagine that you are building a software for transcribing speech to text. The speech transcription part works perfectly, but cannot transcribe punctuations. The task is to train a predictive model to ingest a sequence of text and add punctuation (period, comma or question mark) in the appropriate locations. This task is important for all downstream data processing jobs.

Example input:

this is a string of text with no punctuation this is a new sentence

Example output:

this is a string of text with no punctuation <period> this is a new sentence <period>

Solution

My solution is largely based on Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration.

The architecture is defined as follows:

  1. Obtain words embeddings from GloVe.
  2. The word embeddings are then processed by densely connected Bi-LSTM layers.
  3. These Bi-LSTM layers are followed by a RNN with an attention mechanism and conditional random field (CRF) log likelihood loss.

The experiments are performed on the IWSLT dataset which consists of TED Talks transcript.

The detailed analysis can be found in this notebook.

Setup and Installation

First step, clone the repo:

https://github.com/k9luo/Punctuation-Restoration.git

Second step, you can download pretrained GloVe word embeddings and create a new conda virutal environment with setup.sh. Or you can manually do these steps yourself. Note that the running setup.sh will install the GPU version of TensorFlow:

sh setup.sh

Third step, activate the virtual environment:

conda activate restore_punct

Fourth step, add the new virutal environment to Jupyter Notebook:

python -m ipykernel install --user --name=restore_punct

Training and Inference

Please run python main.py.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.