GithubHelp home page GithubHelp logo

15863004186sunchi / mongodb-datamining-shell Goto Github PK

View Code? Open in Web Editor NEW

This project forked from selvinsource/mongodb-datamining-shell

0.0 2.0 0.0 832 KB

MongoDB shell implementation of the data mining algorithms

License: Apache License 2.0

JavaScript 100.00%

mongodb-datamining-shell's Introduction

##MongoDB Data Mining Shell

MongoDB shell implementation of the data mining algorithms.

##Installation

git clone https://github.com/selvinsource/mongodb-datamining-shell.git
cd mongodb-datamining-shell
mongoimport --db mongodbdm --collection weatherData --type csv --headerline --file dataset/weatherData.csv
mongo mongodbdm --eval "var inputCollectionName = \"weatherData\", target = \"play\"" datamining/classification/oner.js
mongoimport --db mongodbdm --collection iris --type csv --headerline --file dataset/iris.csv
mongo mongodbdm --eval "var inputCollectionName = \"iris\", k = 3" datamining/clustering/kmeans.js

Follow this tutorial to compare the results to the Weka Data Mining Software.

##Documentation

Data mining or also called knowledge discovery is a set of activities aiming at analyzing large databases and extracting extra information meaningful for decision making or problem solving.

###Classification

Classification is one of the most common knowledge discovery task that consists in creating a model that predicts a target class based on explanatory variables.

####OneR OneR is a simple yet accurate classification algorithm that produces a one level decision tree.
For a visual description of the algorithm see OneR pseudocode.
Its oner.js MongoDB implementation takes as input two parameters:

  • inputCollectionName - the collection used as training dataset
  • target - the target class of the collection

Usage:

mongo yourdatabase --eval "var inputCollectionName = \"yourcollection\", target = \"yourtargetclass\"" datamining/classification/oner.js

Example of a collection and its target class play: weather data.

Limitation:

  • the target class must be a categorical variable with values Yes and No
  • the explanatory variables must be categorical variables, numerical variables should be discretized in a small number of distinct ranges before running the algorithm

###Clustering

Clustering is the task of identifying and segmenting the instances into a finite number (k) of categories (clusters) which are not predefined (unlike classification).

####K-Means

K-Means is the classic clustering technique that partitions the instances into k clusters whereas k is predefined.
For an high level description of the algorithm see K-Means pseudocode.
Its kmeans.js MongoDB implementation takes as input two parameters:

  • inputCollectionName - the collection used as training dataset
  • k - the number of predefined clusters

Usage:

mongo mongodbdm --eval "var inputCollectionName = \"yourcollection\", k = numberofclusters" datamining/clustering/kmeans.js

Example of a collection: iris data.

Limitation:

  • the variables must be all numerical

Note:

  • If a field in the collection is called "class", this is excluded from the computation, instead it will be printed in the result with the assigned cluster

###References

  • Hartigan, J. A. (1975) Clustering Algorithms, Probability & Mathematical Statistics, John Wiley & Sons Inc.
  • Holte, R. C. (1993) Very simple classification rules perform well on most commonly used datasets, Machine Learning, 11, pp 63-91
  • Selvaggio, V. (2011) Customer Churn prediction for an Automotive Dealership using computational Data Mining, MSc dissertation, City University London
  • UCI Machine Learning Repository University of California, School of Information and Computer Science

mongodb-datamining-shell's People

Contributors

selvinsource avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.