GithubHelp home page GithubHelp logo

pombredanne / liblevenshtein-coffeescript Goto Github PK

View Code? Open in Web Editor NEW

This project forked from universal-automata/liblevenshtein-coffeescript

0.0 1.0 0.0 33 KB

Various utilities regarding Levenshtein transducers. (CoffeeScript / JavaScript / Node.js)

License: MIT License

CoffeeScript 100.00%

liblevenshtein-coffeescript's Introduction

liblevenshtein

CoffeeScript / JavaScript / Node.js

A library for generating Finite State Transducers based on Levenshtein Automata.

npm version Build Status Join the chat at https://gitter.im/universal-automata/liblevenshtein-coffeescript

Levenshtein transducers accept a query term and return all terms in a dictionary that are within n spelling errors away from it. They constitute a highly-efficient (space and time) class of spelling correctors that work very well when you do not require context while making suggestions. Forget about performing a linear scan over your dictionary to find all terms that are sufficiently-close to the user's query, using a quadratic implementation of the Levenshtein distance or Damerau-Levenshtein distance, these babies find all the terms from your dictionary in linear time on the length of the query term (not on the size of the dictionary, on the length of the query term).

If you need context, then take the candidates generated by the transducer as a starting place, and plug them into whatever model you're using for context (such as by selecting the sequence of terms that have the greatest probability of appearing together).

For a quick demonstration, please visit the Github Page, here.

The library is currently written in Java, CoffeeScript, and JavaScript, but I will be porting it to other languages, soon. If you have a specific language you would like to see it in, or package-management system you would like it deployed to, let me know.

Basic Usage:

Node.js

Install the module via npm:

% npm install liblevenshtein
info trying registry request attempt 1 at 12:59:16
http GET https://registry.npmjs.org/liblevenshtein
http 304 https://registry.npmjs.org/liblevenshtein
[email protected] node_modules/liblevenshtein

Then, you may require it to do whatever you need:

var levenshtein = require('liblevenshtein');

// Assume "completion_list" is a list of terms you want to match against in
// fuzzy queries.
var builder = new levenshtein.Builder()
  .dictionary(completion_list, false)  // generate spelling candidates from unsorted completion_list
  .algorithm("transposition")          // use Levenshtein distance extended with transposition
  .sort_candidates(true)               // sort the spelling candidates before returning them
  .case_insensitive_sort(true)         // ignore character-casing while sorting terms
  .include_distance(false)             // just return the ordered terms (drop the distances)
  .maximum_candidates(10);             // only want the top-10 candidates

// Maximum number of spelling errors we will allow the spelling candidates to
// have, with regard to the query term.
var MAX_EDIT_DISTANCE = 2;

var transducer = builder.build();

// Assume "term" corresponds to some query term. Once invoking
// transducer.transduce(term, MAX_EDIT_DISTANCE), candidates will contain a list
// of all spelling candidates from the completion list that are within
// MAX_EDIT_DISTANCE units of error from the query term.
var candidates = transducer.transduce(term, MAX_EDIT_DISTANCE);

In the Browser

To use the library on your website, reference the desired file from the <head/> of your document, like so:

<!DOCTYPE html>
<html>
  <head>
    <!-- stuff ... -->
    <script type="text/javascript"
      src="http://universal-automata.github.com/liblevenshtein/javascripts/2.0.4/levenshtein-transducer.min.js">
    </script>
    <!-- more stuff ... -->
  </head>
  <body>
    <!-- yet another fancy document ... -->
  </body>
</html>

Once the script loads, you should construct a transducer via the Builder Api:

$(function ($) {
  "use strict";

  // Maximum number of spelling errors we will allow the spelling candidates to
  // have, with regard to the query term.
  var MAX_EDIT_DISTANCE = 2;

  var completion_list = getCompletionList(); // fictitious method

  var builder = new levenshtein.Builder()
    .dictionary(completion_list, false)  // generate spelling candidates from unsorted completion_list
    .algorithm("transposition")          // use Levenshtein distance extended with transposition
    .sort_candidates(true)               // sort the spelling candidates before returning them
    .case_insensitive_sort(true)         // ignore character-casing while sorting terms
    .include_distance(false)             // just return the ordered terms (drop the distances)
    .maximum_candidates(10);             // only want the top-10 candidates

  var transducer = builder.build();

  var $queryTerm = $('#query-term-input-field');
  $queryTerm.keyup(function (event) {
    var candidates, term = $.trim($queryTerm.val());

    if (term) {
      candidates = transducer.transduce(term, MAX_EDIT_DISTANCE);
      printAutoComplete(candidates); // print the list of completions
    } else {
      clearAutoComplete(); // user has cleared the search box
    }

    return true;
  });
});

This will give the user autocompletion hints as he types in the search box.

Reference

This library is based largely on the work of Stoyan Mihov, Klaus Schulz, and Petar Nikolaev Mitankin: "Fast String Correction with Levenshtein-Automata". For more details, please see the wiki.

liblevenshtein-coffeescript's People

Contributors

dylon avatar gitter-badger avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.