GithubHelp home page GithubHelp logo

mcmarkov's Introduction

mcmarkov

MC Markov is a python application for generating new text based on a corpus of observed text (like the ebooks twitter accounts).

The markov chain trainer makes a number of improvements over other out-of-the-box markov-model python applications (at least the ones that I've seen):

  1. The markov chain can train higher-order models that look for chains of 2, 3, or 4 words (ngrams) in the corpus
  2. The model is built using numpy arrays so it's very fast
  3. The probability matrix used for the model only contains observed ngrams, so it's feasible to train high-order models on large corpi.

TO-DO:

###Efficiency: 3. Parallelize the model-fitting/counting procedure 4. Use broadcasting for normalizing the numpy array

###UI:

  1. Improve handling of mis-formatted arguments (e.g., single list instead of list-of-lists for the corpus)
  2. Write documentation!

###New Features 5. Write song-writing module that includes the following components

  • Ability to specify the ending word of a line (for rhyming)
  • Ability to end the next line with a word that rhymes with the last word of the previous line
    • Create dictionary of rhymes for last-words that are used as 'seeds'
      • How to handle frequency? Should the list be unique, or should it reflect observed frequency?
    • Write a method for choosing a rhyming word that rhymed with the last line but is not the same word
    • Write a method for building raps using couplets
  • Option to 'clean' the corpus by removing certain punctuation
  • Ability to specify the syllable count of a line

Testing:

From the installed directory, run python -m unittest discover -s . -p 'test.py'

mcmarkov's People

Contributors

mikekaminsky avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.