GithubHelp home page GithubHelp logo

plumpmath / initialisms Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wiseman/initialisms

0.0 0.0 0.0 3.99 MB

Guess sentences from initial letters of each word

License: MIT License

Clojure 50.42% Python 49.58%

initialisms's Introduction

initialisms

Code card

From http://ask.metafilter.com/255675/Decoding-cancer-addled-ramblings:

In my grandmother's final days battling brain cancer, she became unable to speak and she filled dozens of index cards with random letters of the alphabet. I'm beginning to think that they are the first letters in the words of song lyrics, and would love to know what song this was. This is a crazy long shot, but I've seen Mefites pull off some pretty impressive code-breaking before!

This program guesses sentences from initial letters of each word using the unreasonable effectiveness of data. For example, if given the right seed texts, it can decode the input "OFWAIHHBTNTKCTWBDOEAIIIHFUTDODBAFUOT" into "our father who ascended into heaven hallowed be thy name thy kingdom come thy will be done on earth as it is in heaven for us this day our daily bread and forgive us of the"

Inspired by http://norvig.com/ngrams/ch14.pdf.

Getting ready to use it

It's probably easiest to use virtualenv:

$ virtualenv env

Then install nltk and python-gflags:

$ env/bin/pip install nltk python-gflags

Example use

Start the program, feeding it some prayer texts:

$ env/bin/python decode.py apostles-creed.txt athanasian-creed.txt nicene-creed.txt order-of-morning.txt

Once it says "Enter initials:", type "OFWAIHHBTN" and press Enter. It will output something like this:

5.99453913576e-12 our father who ascended into heaven hallowed be thy name

This means that its best guess for "OFWAIHHBTN" is "our father who ascended into heaven hallowed be thy name", with a probability of some small number.

It is case-insenstive: OFWAIHHBTN is treated the same as "ofwaihhbtn".

You can use "$" to indicate the start of a sentence, for example "$OFWAIHHBTN".

Corpora

The program makes its guesses based on text you feed it. I've included 8 pieces of text, all in the corpora subdirectory:

Filename Description
apostles-creed.txt The Apostles Creed
athanasian-creed.txt The Athanasian Creed
bible-kjv.txt The King James Bible
hymnprayerbo00kunz_djvu.txt Hymn and prayer book: for the use of such Lutheran churches as use the English language (1795)
nicene-creed.txt The Nicene Creed
order-of-morning.txt The Order of Morning Service
prayerbookreligi00lasauoft_djvu.txt Prayer-book for religious: a complete manual of prayers and devotions for the use of the members of all religious communities : a practical guide to the particular examen and to the methods of meditation (1914, c1904)
tlh.txt The Lutheran Hymnal

To use corpora, supply them as arguments on the command line. For example:

$ env/bin/python decode.py bible-kjv.txt nicene-creed.txt

If you want to use other texts, put them in the corpora subdirectory. Then you can specify their filenames on the command line.

More and larger corpora slow the program down tremendously. For example, using just the King James Bible, trying to decode just three letters, like "ofw", can take 10-20 seconds. Trying to decode 4 or 5 or more can take minutes--or hours.

Command line flags

The program uses Viterbi decoding and assumes a "noisy channel"--meaning that it assumes there's a chance the letters you give it as input are wrong. By default it assumes there's a 0.1% chance of an error. If you want to change that, use the --error_prob flag. For example, this tells it there's a 50% chance of an error per letter:

$ env/bin/python decode.py --error_prob=0.5 bible-kjv.txt

Final notes

Of course there's no reason this code is limited to interpreting religious codes. It is limited only by its corpora (and its bigram model, and its slowness, and...).

initialisms's People

Contributors

wiseman avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.