xiaoanshi / xia-nb Goto Github PK
View Code? Open in Web Editor NEWThis project forked from rxiacn/libnb
A C++ implementation of naive Bayes model
License: Other
This project forked from rxiacn/libnb
A C++ implementation of naive Bayes model
License: Other
Table of Contents ================= - Introduction - Installation - Data Format - Usage - Examples - Additional Information Introduction ============ XIA-NB is a C++ implementation of Naive Bayes Classifier, which is a well-known generative classification algorithm for applications such as text classification. The Naive Bayes algorithm requires the probabilistic distribution to be discrete. XIA-NB uses the multinomial event model for representation, the maximum likelihood estimate with a Laplace smoothing technique for learning parameters. A sparse-data structure is defined to represent the feature vector in XIA-NB to seek higher computational speed. Installation ============ On Linux system, type `make' to build the `nb_learn' and `nb_classify' programs. Run them without arguments to show the usages of them. On Windows system, refer to `Makefile' to build them, or use the pre-built binaries (in the directory `windows'). Data Format =========== The format of training and testing data file is: <label> <index1>:<value1> <index2>:<value2> ... . . . Each line contains an instance and is ended by a '\n' character. <label> is an integer indicating the class id. The range of class id should be from 1 to the size of classes. For example, the class id is 1, 2, 3 and 4 for a 4-class classification problem. <label> and <index>:<value> are sperated by a '\t' character. <index> is a postive integer denoting the feature id. The range of feature id should be from 1 to the size of feature set. For example, the feature id is 1, 2, ... 9 or 10 if the dimension of feature set is 10. Indices must be in ASCENDING order. <value> is a float denoting the feature value. The value must be an INTEGER since Naive Bayes Algorithm requires the probabilistic distribution to be discrete. If the feature value equals 0, the <index>:<value> is encouraged to be neglected for the consideration of storage space and computational speed. Labels in the testing file are only used to calculate accuracy or errors. If they are unknown, just fill the first column with any class labels. Usuage ====== XIA-NB learning module usage: nb_learn [options] training_file model_file options: -h -> help -e [0,1] -> 0: multi-variate Bernoulli event model -> 1: multinomial event model (default) -s [0] -> Laplace smoothing (default) XIA-NB classification module usage: nb_classify [options] testing_file model_file output_file options: -h -> help -e [0,1] -> 0: multi-variate Bernoulli event model -> 1: multinomial event model (default) -f [0..2] -> 0: only output class label (default) -> 1: output class label with log-likelihood -> 2: output class label with probability Examples ======== The "data" directory contains a dataset of text classification task. This dataset has six class labels and more than 250,000 features. For learning with the default multinomial event model: > nb_learn data/train.samp data/nb.mod For learning with the multi-variate Bernoulli event model: > nb_learn -e 0 data/train.samp data/nb0.mod For classifing with the default multinomial event model and the default output format: > nb_classify data/test.samp data/nb.mod data/nb.out For classifing with the multi-variate Bernoulli event model and the loglikelihood output: > nb_classify -e 0 -f 1 data/test.samp data/nb0.mod data/nb0.out Additional Information ====================== For any questions and comments, please email [email protected].
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.