See Browser Support for more information (a.k.a. don’t worry about those grey icons above).
Hey all! First, thanks a lot for watching, starring, and forking retext! Secondly, I wanted to invite you all to leave any feedback or issues you might have, to help me make retext even cooler 😄.
retext is a extensible natural language system—by default using parse-latin to transform natural language into a TextOM object model. Retext provides a pluggable system for analysing and manipulating natural language. In JavaScript. NodeJS, and the browser. Tests provide 100% coverage.
Rather than being a do-all library for Natural Language Processing (e.g., NLTK or OpenNLP), retext aims to be useful for more practical use cases (such as censoring profane words or decoding emoticons, but the possibilities are endless) instead of more academic goals (research purposes). retext is inherently modular—it uses plugins (similar to rework for CSS) instead of providing everything out of the box (e.g., Natural). This makes retext a viable tool for use on the web.
NPM:
$ npm install retext
Component.js:
$ component install wooorm/retext
var Retext = require('retext'),
emoji = require('retext-emoji'),
smartypants = require('retext-smartypants'),
input;
// Modified first paragraph from:
// http://en.wikipedia.org/wiki/Three_wise_monkeys
input = 'The three wise monkeys [. . .] sometimes called the ' +
'three mystic apes--are a pictorial maxim. Together ' +
'they embody the proverbial principle to ("see no evil, ' +
'hear no evil, speak no evil"). The three monkeys are ' +
'Mizaru (:see_no_evil:), covering his eyes, who sees no ' +
'evil; Kikazaru (:hear_no_evil:), covering his ears, ' +
'who hears no evil; and Iwazaru (:speak_no_evil:), ' +
'covering his mouth, who speaks no evil.';
var text = new Retext()
.use(emoji({
'convert' : 'encode'
}))
.use(smartypants())
.parse(input)
.toString();
// The three wise monkeys […] sometimes called the three
// mystic apes—are a pictorial maxim. Together they
// embody the proverbial principle to (“see no evil,
// hear no evil, speak no evil”). The three monkeys are
// Mizaru (🙈), covering his eyes, who sees no evil;
// Kikazaru (🙉), covering his ears, who hears no evil;
// and Iwazaru (🙊), covering his mouth, who speaks no evil.
Plugins used: retext-emoji and retext-smartypants.
var Retext = require('retext'),
ParseEnglish = require('parse-english');
var retext = new Retext(new ParseEnglish()).parse(/* ...some english... */);
Return a new Retext
instance with the given parser (defaults to parse-latin).
Takes a plugin—a humble function. When Retext#parse
is called, the plugin will be invoked with the parsed tree, and the Retext instance as arguments. Returns self.
Parses the given source and returns the (by use
d plugins, modified) tree.
- retext-ast — Encoding and decoding between AST (JSON) and TextOM object model;
- retext-content — Append, prepend, remove, and replace content into/from Retext nodes;
- retext-directionality — (demo) — Detect the direction text is written in;
- retext-dom — (demo) — Create a (living) DOM tree from a TextOM tree;
- retext-double-metaphone — (demo) — Implementation of the Double Metaphone algorithm;
- retext-emoji — (demo) — Encode or decode Gemojis;
- retext-keywords — Extract keywords and keyphrases;
- retext-link — (demo) — Detect links in text;
- retext-metaphone — (demo) — Implementation of the Metaphone algorithm;
- retext-porter-stemmer — (demo) — Implementation of the Porter stemming algorithm;
- retext-pos — Part-of-speech tagger;
- retext-range — Sequences of content within a TextOM tree between two points;
- retext-search — (demo) — Search in a TextOM tree;
- retext-smartypants — (demo) — Implementation of SmartyPants;
- retext-visit — (demo) — Visit nodes, optionally by type;
Hey! Want to create one of the following, or any other plugin, for retext but not sure where to start? I suggest to read retext-visit’s source code to see how it’s build first (it’s probably the most straight forward to learn), and go from there. Let me know if you still have any questions, go ahead and send me feedback or raise an issue.
- retext-date — detect time and date in text;
- retext-language — Detect the language of text;
- retext-live — Detect changes in a textarea (contenteditable?), sync the diffs over to a retext tree, let plugins modify the content, and sync the diffs back to the textarea;
- retext-profanity — Censor profane words;
- retext-punctuation-pair — detect which opening or initial punctuation, belongs to which closing or final punctuation mark (and vice versa);
- retext-sentiment — Detect sentiment;
- retext-summary — Summarise text;
- retraverse — like Estraverse;
- parse-latin (default);
- parse-english — Specifically for English;
- parse-dutch — Specifically for Dutch;
Pretty much every browser (available through browserstack) runs all retext unit tests.
Run the benchmark yourself:
$ npm run benchmark
On a MacBook Air, it parser about 2 big articles, 24 sections, or 218 paragraphs per second.
retext.parse(source);
218 op/s » A paragraph (5 sentences, 100 words)
24 op/s » A section (10 paragraphs, 50 sentences, 1,000 words)
2 op/s » An article (100 paragraphs, 500 sentences, 10,000 words)
MIT