GithubHelp home page GithubHelp logo

syllabify's Introduction

Syllabify

Automatically convert plain text into phonemes (US English pronunciation) and syllabify.

Adapted from the repository set up by Anthony Evans with some key changes, itemised below:

  • Ported to Python 3 from Evans' Python 2 code;
  • Correction of key onset and coda rules which affect consonant clusters and involve the 'maximise onsets principle';
  • Removal of all ambisyllabicity from onset and coda rules, since it's not uncontroversial;
  • Removal of 'test' (demo) option from syllable script.

Please see Anthony Evans' README file for a detailed background to the project.

Set up

Requires Python 3 (Anthony Evans used Python 2: if that's what you prefer, see his repo).

Clone or download this repo and you're good to go!

Usage

One word at a time:

python3 syllable3.py linguistics

Or several (space-separated):

python3 syllable3.py colourless green ideas

Or as preprocessing for the wordseg program, `wordseg_prep' takes a CHILDES corpus (e.g. Brown) and syllabifies infant-directed speech (i.e. excluding CHI utterances) in phonemic format, with appropriate phone, syllable and word delimiters per wordseg defaults:

python3 wordseg_prep.py $CORPUSPATH

Output

If the input word is found in the dictionary, a phonemic, syllabified transcript is returned. For example, for the word linguistics:

{o: L , n: IH [st:0 ln:short], c: NG }
{o: G W , n: IH [st:1 ln:short], c: empty}
{o: S T , n: IH [st:0 ln:short], c: K S }

There's one syllable per line. Each syllable is made up of an 'o' onset, 'n' nucleus, and 'c' coda. Phonemes are space-separated and capitalized in ARPAbet format. In line with phonological theory, the nucleus must have content, whereas the onset and coda may be empty. Within the vocalic content of the nucleus there's also an indication whether the syllable is stressed ('st':0 or 1), and whether the length ('ln') is short or long.

CMU Pronouncing Dictionary

Syllabify depends on the CMU Pronouncing Dictionary of North American English word pronunciations. Version 0.7b was the current one at time of writing, but it throws a UnicodeDecodeError, so we're still using version 0.7a (amended to remove erroneous 'G' from SUGGEST and related words). Please see the dictionary download website to obtain the current version, add the cmudict-N.nx(.phones|.symbols)* files to the CMU_dictionary directory, remove the '.txt' suffixes, and update the line VERSION = 'cmudict-n.nx' in cmuparser3.py

To do

Offer the option to 'translate' US to UK pronunciations; for instance dealing with lack of rhoticity by converting AXR and ER phones to UK equivalents (I know, which UK variety!? Cross that bridge etc..)

Contact

If you have queries or feedback please contact cainesap at gmail.com

Andrew Caines, September 2017

syllabify's People

Contributors

cainesap avatar klingklangklong avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.