GithubHelp home page GithubHelp logo

gengram's Introduction

gengram

A lightweight n-gram random text generator written in Python. It was developed for automating Happy Hour weekly reminder emails at the Cambridge Computer Lab.

An n-gram is a contiguous sequence of symbols. For the text,

The cat sat on the mat.

The bigrams (2-long n-grams) are: The cat, cat sat, sat on, on the, the mat, mat .

How it works

We arrange n-grams in a table of symbol_sequence : (next_symbol, frequency) to record which sequences are most common. From a given symbol_sequence we can make a random weighted choice for the next_symbol.

An excerpt from the example Happy Hour corpus, using trigrams (3-long n-grams):

symbol sequence next symbol frequency
yet another conference 1
paper 1
friday 1
happy 2
have a keg 6
recent 1
nice 1

So if we have a symbol sequence that currently ends with have a, we are more likely to choose keg as the next symbol to output, rather than recent or nice. By iterating this we can generate long sequences of symbols.

Once we output a punctuation symbol from . ! ?, we record that at sentence has been generated.

Usage

  1. Place source text corpus into corpus.txt
  2. Run the gengram.py script

This first preprocess the text, normalizing whitespace, and then calls the main method gengram_sentence. Its arguments are:

  • corpus - preprocessed text corpus
  • N - how long the n-grams are (default: 3)
  • sentence_count - how many sentences to generate (default: 5)
  • start_seq - seed start sequence (default: None)

If no seed start_seq is given, gengram chooses a random one. After symbol sequences are generated, they are postprocessed to correct whitespace and capitalization.

Sample output

start_seq = "Join us"

Join us for the happy hour. We have a recent repeat. Cyclops. Rawr! We apologize for the usual collection of snacks.

start_seq = "Once again"

Once again we invite you to our usual selection of soft drinks, bottled ales, and a keg of sparta, crisps, dips, bins, napkins, paper plates and chopping boards. Now is our keg filled with sparta 4.3%; our packets of nuts for the happy hour! Join us for the crisp eaters, there will be crisps, dips, bins, napkins, paper plates and chopping boards. Now is our keg filled with sparta 4.3%; our packets of crisps. What more could you want? Join us for the usual selection of snacks, juice, snacks, juice, snacks, beer, snacks, beer, a keg of beer!

start_seq = "There will"

There will be a keg of justinian 3.9%, bottled ales, lagers, ales, lagers, ales, lagers, ales, lagers, ciders, a selection of soft drinks and snacks available. Our usual selection. Roses are red, happy hour. We considered having happy hour and replacing the keg, lots of lagers, crisps, bottled beers and cider, and the usual. Join us at 5pm this afternoon.

Dependencies

gengram's People

Contributors

errollw avatar

Watchers

James Cloos avatar Gabriel Badila avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.