GithubHelp home page GithubHelp logo

azu / retext Goto Github PK

View Code? Open in Web Editor NEW

This project forked from retextjs/retext

0.0 3.0 0.0 421 KB

Extensible system for analysing and manipulating natural language

License: MIT License

JavaScript 100.00%

retext's Introduction

Retext

Build Status Coverage Status Code Climate

retext is an extensible natural language system—by default using parse-latin to transform natural language into a TextOM object model. Retext provides a pluggable system for analysing and manipulating natural language in JavaScript. NodeJS and the browser. Tests provide 100% coverage.

Rather than being a do-all library for Natural Language Processing (such as NLTK or OpenNLP), retext aims to be useful for more practical use cases (such as censoring profane words or decoding emoticons, but the possibilities are endless) instead of more academic goals (research purposes). retext is inherently modular—it uses plugins (similar to rework for CSS) instead of providing everything out of the box (such as Natural). This makes retext a viable tool for use on the web.

Installation

npm:

npm install retext

Component.js:

component install wooorm/retext

Bower:

bower install retext

Duo:

var Retext = require('wooorm/retext');

UMD (globals/AMD/CommonJS) (uncompressed and compressed):

<script src="path/to/retext.js"></script>
<script>
  var retext = new Retext();
</script>

Usage

The following example uses retext-emoji (to show emoji) and retext-smartypants (for smart punctuation).

/* Require dependencies. */
var Retext = require('retext');
var emoji = require('retext-emoji');
var smartypants = require('retext-smartypants');

/* Create an instance using retext-emoji and -smartypants. */
var retext = new Retext()
    .use(emoji, {
        'convert' : 'encode'
    })
    .use(smartypants);

/* Read a document. */
retext.parse(
    'The three wise monkeys [. . .] sometimes called the ' +
    'three mystic apes--are a pictorial maxim. Together ' +
    'they embody the proverbial principle to ("see no evil, ' +
    'hear no evil, speak no evil"). The three monkeys are ' +
    'Mizaru (:see_no_evil:), covering his eyes, who sees no ' +
    'evil; Kikazaru (:hear_no_evil:), covering his ears, ' +
    'who hears no evil; and Iwazaru (:speak_no_evil:), ' +
    'covering his mouth, who speaks no evil.',
    function (err, tree) {
        /* Handle errors. */
        if (err) {
            throw err;
        }

        /* Log the text content of the tree (the transformed input). */
        console.log(tree.toString());
        /**
         * This logs the following:
         *   The three wise monkeys […] sometimes called the three
         *   mystic apes—are a pictorial maxim. Together they
         *   embody the proverbial principle to (“see no evil,
         *   hear no evil, speak no evil”). The three monkeys are
         *   Mizaru (🙈), covering his eyes, who sees no evil;
         *   Kikazaru (🙉), covering his ears, who hears no evil;
         *   and Iwazaru (🙊), covering his mouth, who speaks no evil.
         */
    }
);

API

Retext(parser?)

var Retext = require('retext');
var ParseEnglish = require('parse-english');
var retext = new Retext(new ParseEnglish());

/* There, ol’ chap. */
retext.parse('Some English', function (err, tree) {/* ... */});

Return a new Retext instance with the given parser (defaults to an instance of parse-latin).

Retext#use(plugin, options?)

Takes a plugin—a humble function to transform the object model. Optionally takes an options object, but it’s up to plugin authors to support settings.

Retext#parse(value, options?, done(err, tree))

Parses the given source and, when done, passes either an error (the first argument), or the (by used plugins, modified) document (the second argument) to the callback.

plugin

A plugin is simply a function, with function(retext, options?) as its signature. The first argument is the Retext instance a user attached the plugin to. The plugin is invoked when a user uses the plugin (not when a document is parsed) and enables the plugin to modify the internal Object Model (retext.TextOM) or the parser (retext.parser).

The plugin can return another function: function(NLCSTNode, options, next?). This function is invokeded when a document is parsed. It’s given the document as created by Retext#parse() before it’s given to the user.

Plugins

Desired Plugins

Hey! Want to create one of the following, or any other plugin, for retext but not sure where to start? I suggest to read retext-visit’s source code to see how it’s build first (it’s probably the most straight forward to learn), and go from there. Let me know if you still have any questions, go ahead and send me feedback or raise an issue.

  • retext-date — Detect time and date in text;

  • retext-frequen -words — Like retext-keywords, but based on frequency and stop-words instead of a POS-tagger;

  • retext-hyphen — Insert soft-hyphens where needed; this might have to be implemented with some sort of node which doesn’t stringify;

  • retext-location — Track the position of nodes (line, column);

  • retext-no-pants — Opposite of retext-smartypants;

  • retext-no-break — Inserts non-breaking spaces between things like “100 km”;

  • retext-profanity — Censor profane words;

  • retext-punctuation-pair — Detect which opening or initial punctuation, belongs to which closing or final punctuation mark (and vice versa);

  • retext-summary — Summarise text;

  • retext-sync — Detect changes in a textarea (or contenteditable?), sync the diffs over to a retext tree, let plugins modify the content, and sync the diffs back to the textarea;

  • retext-typography — Applies typographic enhancements, like (or using?) retext-smartypants and retext-hyphen;

  • retraverse — Like Estraverse.

Parsers

Benchmark

On a MacBook Air, it parses about 2 big articles, 25 sections, or 230 paragraphs per second.

           retext.parse(value, callback);
  230 op/s » A paragraph (5 sentences, 100 words)
   25 op/s » A section (10 paragraphs, 50 sentences, 1,000 words)
    2 op/s » An article (100 paragraphs, 500 sentences, 10,000 words)

Related

License

MIT © Titus Wormer

retext's People

Contributors

blakeembrey avatar jlburkhead avatar joshwyatt avatar wooorm avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.