GithubHelp home page GithubHelp logo

rlugojr / classify.js Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sbyrnes/classify.js

0.0 2.0 0.0 15 KB

Naieve Bayesian Classifier in pure javascript.

License: MIT License

JavaScript 99.41% Shell 0.59%

classify.js's Introduction

classify.js

Naieve Bayesian Classifier in pure javascript.

Description

Classify.js is a Naieve Bayesian classifier for javascript applications. After training it will examples, it can classify new inputs into the groups you defined in the training set.

An example use of such a classifier is an email spam filter, where the text of incoming emails is classified as either spam or not spam.

Installation

npm install classify.js

Usage

To use Classify.js, require the classify.js module and follow the 2 steps below.

  var Classifier = require('classify.js');

STEP 1. Train your classifier.

First, create a new classifier.

  var classifier = new Classifier();

And then provide a series of training examples that specify the classification group and the input that matches that group. You should provide as many examples per group as possible. Note that all inputs for a given group should use exactly the same group name.

  classifier.train("GROUP-A", "Some input that belongs in GROUP-A");
  classifier.train("GROUP-A", "Some other input that belongs in GROUP-A");
  classifier.train("GROUP-B", "Some input that belongs in GROUP-B");

STEP 2. Classify.

To classify, simply provide an input and the return value will be the name of the group that best matches the input.

  var group = classifier.classify("Some input that should be GROUP-B");

  // group = 'GROUP-B'

Working With Files

It will be rare that the training and classification data will be simple strings in memory. To read and classify files, just use the following equivalents to the functions mentioned above:

To train from a file:

classifier.trainFromFile("GROUP-A", filename);

To classify a file:

classifier.classifyFile(filename);

You can train the same classifier using both files and strings, as well as use the same classifier to classify both strings and files.

Advanced

The classifier works by calculating the probability that a given input matches the patterns seen in the training examples. The classification group that has the highest probability of matching the input is considered the classification of the input.

However, in some cases it might be useful to retrieve the rank order of all possible groups along with their probabilities. This can be helpful when creating tools such as auto-complete text boxes on websites. To retrieve a rank ordered list of the groups for a given input (along with probabilities) you can do the following.

  var groupList = Classifier.rankGroups("Some input that should be GROUP-B");

  // groupList = [ { group: 'GROUP-B', probability: -0.75 }, { group: 'GROUP-A', probability: -0.45 } ]

Note that the probabilities listed should not be considered accurate on their own, they are only useful in comparison to one another. They are not actual probabilities since in many cases the numeric values of the probabilities would be too small and instead are the logarithm of the calculated probability weights.

For example, if a group has a probability of 0.8 that does not mean it's 80% likely, it means that it is four times more likely than something with a probability of 0.2. This is due to the nature of Naieve Bayesian statistics, where the form of the distribution is not known.

About Bayesian Statistics

Bayesian classifiers utilize a statistical tool known as Bayes' Theorem while allows you to calculate the conditional probability of two events based on other evidence. The Theorem is written as:

P(A|B) = P(B|A) * P(A) / P(B)

The probability of A given B is equal to the probability of B given A times the probability of A divided by the probability of B. In the case of this classifier, this formula is used to compute the probability of a given classification group (A) given an input (B).

'Naieve Bayes' refers to the fact that classifier has no prior knowledge of the inputs before the training begins. While this is a very general tool, it is often the case that adjusting the model based on information known about the groups ahead of time can produce better results.

One of the draw backs to this method is that the probabilities computed are not always reliable. This is because the distribution is not known ahead of time and the model is so simple it may learn incorrect distributions.

classify.js's People

Contributors

sbyrnes avatar

Watchers

Ray Lugo, Jr. avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.