GithubHelp home page GithubHelp logo

retext's Introduction

Retext logo

Build Status Coverage Status Code Climate

browser support

See Browser Support for more information (a.k.a. don’t worry about those grey icons above).


Hey all! First, thanks a lot for watching, starring, and forking retext! Secondly, I wanted to invite you all to leave any feedback or issues you might have, to help me make retext even cooler 😄.


retext is a extensible natural language system—by default using parse-latin to transform natural language into a TextOM object model. Retext provides a pluggable system for analysing and manipulating natural language. In JavaScript. NodeJS, and the browser. Tests provide 100% coverage.

Rather than being a do-all library for Natural Language Processing (e.g., NLTK or OpenNLP), retext aims to be useful for more practical use cases (such as censoring profane words or decoding emoticons, but the possibilities are endless) instead of more academic goals (research purposes). retext is inherently modular—it uses plugins (similar to rework for CSS) instead of providing everything out of the box (e.g., Natural). This makes retext a viable tool for use on the web.

Installation

NPM:

$ npm install retext

Component.js:

$ component install wooorm/retext

Usage

var Retext = require('retext'),
    emoji = require('retext-emoji'),
    smartypants = require('retext-smartypants'),
    input;

// Modified first paragraph from: 
//   http://en.wikipedia.org/wiki/Three_wise_monkeys
input = 'The three wise monkeys [. . .] sometimes called the ' +
        'three mystic apes--are a pictorial maxim. Together ' +
        'they embody the proverbial principle to ("see no evil, ' +
        'hear no evil, speak no evil"). The three monkeys are ' +
        'Mizaru (:see_no_evil:), covering his eyes, who sees no ' +
        'evil; Kikazaru (:hear_no_evil:), covering his ears, ' +
        'who hears no evil; and Iwazaru (:speak_no_evil:), ' +
        'covering his mouth, who speaks no evil.';

var text = new Retext()
  .use(emoji({
      'convert' : 'encode'
  }))
  .use(smartypants())
  .parse(input)
  .toString();
// The three wise monkeys […] sometimes called the three
// mystic apes—are a pictorial maxim. Together they
// embody the proverbial principle to (“see no evil,
// hear no evil, speak no evil”). The three monkeys are
// Mizaru (🙈), covering his eyes, who sees no evil;
// Kikazaru (🙉), covering his ears, who hears no evil;
// and Iwazaru (🙊), covering his mouth, who speaks no evil.

Plugins used: retext-emoji and retext-smartypants.

API

Retext(parser)

var Retext = require('retext'),
    ParseEnglish = require('parse-english');

var retext = new Retext(new ParseEnglish()).parse(/* ...some english... */);

Return a new Retext instance with the given parser (defaults to parse-latin).

Retext.prototype.use(plugin)

Takes a plugin—a humble function. When Retext#parse is called, the plugin will be invoked with the parsed tree, and the Retext instance as arguments. Returns self.

Retext.prototype.parse(source)

Parses the given source and returns the (by used plugins, modified) tree.

Plugins

Desired Plugins

Hey! Want to create one of the following, or any other plugin, for retext but not sure where to start? I suggest to read retext-visit’s source code to see how it’s build first (it’s probably the most straight forward to learn), and go from there. Let me know if you still have any questions, go ahead and send me feedback or raise an issue.

  • retext-date — detect time and date in text;
  • retext-language — Detect the language of text;
  • retext-live — Detect changes in a textarea (contenteditable?), sync the diffs over to a retext tree, let plugins modify the content, and sync the diffs back to the textarea;
  • retext-profanity — Censor profane words;
  • retext-punctuation-pair — detect which opening or initial punctuation, belongs to which closing or final punctuation mark (and vice versa);
  • retext-sentiment — Detect sentiment;
  • retext-summary — Summarise text;
  • retraverse — like Estraverse;

Parsers

Browser Support

Pretty much every browser (available through browserstack) runs all retext unit tests.

Benchmark

Run the benchmark yourself:

$ npm run benchmark

On a MacBook Air, it parser about 2 big articles, 24 sections, or 218 paragraphs per second.

              retext.parse(source);
 218 op/s » A paragraph (5 sentences, 100 words)
  24 op/s » A section (10 paragraphs, 50 sentences, 1,000 words)
   2 op/s » An article (100 paragraphs, 500 sentences, 10,000 words)

Related

License

MIT

retext's People

Contributors

wooorm avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.