erelsgl / limdu Goto Github PK

Machine-learning for Node.js

License: GNU Lesser General Public License v3.0

JavaScript 100.00%

limdu's Issues

Question

Firstly, this library looks great!

Secondly, I'm experimenting with some ML for a little side project of mine to help me learn and just wondering what I could achieve with this library / point me in the right direction as you seem extremely knowledgable on the subject =]

[
  'Where could i get a hot drink in Manchester?',
  'whats the best coffee shop in manchestr',
  'coolest cafe in Manc?'
]

So for my examples you can see the various ways (spelling mistakes intentional) of asking about a Cafe in a named location.

As you can see there is lots of ways X activity can be asked about. At most I'd have say 6 different types of activities (i.e going to a cafe, a park, visiting a gym) so the options there are limited, but lots of ways of describing them.

Second part of the query is a location, sometimes an abbreviated location, I have a list of all the possible locations and subsequent abbreviations/aliases, how could i extract these as well as handling spelling mistakes/typos with the potential that multiple locations could be mentioned in the same sentence, at most 3 - 4 locations.

The intent classifier seemed like a good start and basic tests seemed to work, but unsure how to handle the issues above, i.e spelling (string distance perhaps?) and named locations?

Any pointers, examples etc would be appreciated

Thanks,
Mike

Feature suggestions

Randomising the partitions (e.g.: like how train-test-split does it for training/test sets or tvt-split for training/validation/test sets)?
SVG/Canvas representation of the model (or any other way to visualise the model being used)

Meaning of Retraining?

// Initialize a classifier with a feature extractor:
return new limdu.classifiers.EnhancedClassifier({
    classifierType: TextClassifier,
    featureExtractor: WordExtractor,
    pastTrainingSamples: [], // to enable retraining
});

What's the meaning of the "retraining"?
What's the benefits of enabling "retraining"?

Thanks :-)

Example Result

The result from the example differs:
Feature extraction - converting an input sample into feature-value pairs

result:
[ expansioned: {},
features: { I: 1, want: 1, an: 1, apple: 1, and: 1, a: 1, banana: 1 } ]
[ expansioned: {},
features: { I: 1, WANT: 1, AN: 1, APPLE: 1, AND: 1, A: 1, BANANA: 1 } ]

Dataset mutation

After running unit tests on a project that uses limdu, I noticed that once the classifiers train method is called, the dataset is being mutated.
What I mean by this is that I have a class Learner that has a dataset field and a classifier one (which is similar to intentClassifier in the examples).
dataset has the [ { input: 'string', output: 'category string' }, ...] structure, after train() is called on say Learner.classifier, the dataset (so both the training and testing sets) has outputs being arrays with the strings.

I'm not sure if it's intended or if the format (post-mutation) is what should be used instead of what's in the docs.

Ref: https://github.com/all-contributors/ac-learn/tree/limdu

deleted

How to handle a lot of input data?

Thanks for the amazing module!

What's the right way to train using a lot of input data objects (in my case 45k objects, but let's assume it can be even bigger)?

work in browser?

can limdu work in a browser?
i get this error inside webpack

ERROR Failed to compile with 10 errors 14:49:43

These dependencies were not found:

fs in C:/Users/franc/Documents/pror/~~/limdu/classifiers/kNN/kNN.js, C:/Users/franc/Documents/pror/~~/limdu/classifiers/svm/SvmPerf.js and 5 others
child_process in C:/Users/franc/Documents/pror/~~/limdu/classifiers/svm/SvmPerf.js, C:/Users/franc/Documents/pror/~~/limdu/classifiers/svm/SvmLinear.js and 1 other

To install them, you can run: npm install --save fs child_process

Label Classification Result correct?

Hi,
I have some question as below:

Here is my code:

var limdu = require('limdu');

// First, define our base classifier type (a multi-label classifier based on winnow):
var TextClassifier = limdu.classifiers.multilabel.BinaryRelevance.bind(0, {
	binaryClassifierType: limdu.classifiers.Winnow.bind(0, {retrain_count: 10})
});

// Now define our feature extractor - a function that takes a sample and adds features to a given features set:
var WordExtractor = function(input, features) {
	input.split(" ").forEach(function(word) {
		features[word]=1;
	});
};

// Initialize a classifier with the base classifier type and the feature extractor:
var intentClassifier = new limdu.classifiers.EnhancedClassifier({
	classifierType: TextClassifier,
	featureExtractor: WordExtractor,
	normalizer: limdu.features.LowerCaseNormalizer,
	pastTrainingSamples: [], // to enable retraining
});

// Train and test:
intentClassifier.trainBatch([
	{input: "I want an apple", output: "apl"},
	{input: "I want a banana", output: "bnn"},
	{input: "I want chips", output:    "cps"},
]);

console.log( intentClassifier.classify("I want chips and a doughnut") );
intentClassifier.trainOnline("I want", "req");
intentClassifier.trainOnline("You want", "req");
intentClassifier.trainOnline("We want", "req");
intentClassifier.trainOnline("They want", "req");
intentClassifier.trainOnline("He want", "req");
intentClassifier.trainOnline("She want", "req");
intentClassifier.trainOnline("It want", "req");
intentClassifier.trainOnline("I want a doughnut", "dnt");
console.log( intentClassifier.classify("I want chips and a doughnut") );
intentClassifier.retrain();
console.log( intentClassifier.classify("I want chips and a doughnut") );

An actual results are:

[ 'cps' ]
[ 'dnt', 'cps' ]
[ 'dnt', 'cps' ]

There're correct results or should be like this:

[ 'cps' ]
[ 'dnt', 'cps' ]
[ 'req', 'dnt', 'cps' ]

Security Notice & Bug Bounty - Command Injection - huntr.dev

Overview

limdu is an A machine learning framework for Node.js. Supports multi-level classification and online learning.

This package is vulnerable to Command Injection through the trainBatch function.

Bug Bounty

We have opened up a bounty for this issue on our bug bounty platform. Want to solve this vulnerability and get rewarded 💰? Go to https://huntr.dev/

We will submit a pull request directly to your repository with the fix as soon as possible. Want to learn more? Go to https://github.com/418sec/huntr 📚

Automatically generated by @huntr-helper...

Incorrect Accuracy

After doing some checks on my limbdu wrapper, and seeing that the Accuracy fields don't return what one would except knowing TP, TN and count's values (for both microAverage and macroAverage).
I was wondering, why is the TRUE field used for calculating Accuracy when it's not equal to TP which means it's not using the standard (TP + TN) / count formula which applies to both 2-class and multi-class classifications?

Languages supported?

What are the languages supported for the limdu classifications?
I assume all the language using a-z alphabet.

How's about the Hebrew, Chinese, Hindi, Korea... those are not a-z alphabet?

Thanks :-)

Big multi label classifier on db

Hi,
Can multi label classifiers with big data writed to a SQL DB and streamed in classify phase? Or the classifier must work always in memory? Form big classifier is best split the data in many classificators, serialize it and calo sequentually or in parallel?

Confusion in README

I've noticed that the cross-validation example uses macroAverage:

var macroAverage = new limdu.utils.PrecisionRecall();

limdu.utils.partitions.partitions(dataset, numOfFolds, function(trainSet, testSet) {
	console.log("Training on "+trainSet.length+" samples, testing on "+testSet.length+" samples");
	var classifier = new IntentClassifier();
	classifier.trainBatch(trainSet);
	limdu.utils.test(classifier, testSet, /* verbosity = */0,
		microAverage, macroAverage);
});

macroAverage.calculateMacroAverageStats(numOfFolds);
console.log("\n\nMACRO AVERAGE:"); console.dir(macroAverage.fullStats());

But utils.testAndTrain's test function uses macroSum which is confusing.
Is it meant to be macroSum in the README or is the function not using the right term?

Also, not related to this but would it be a good idea to add (an optional) randomization to the partitions (e.g.: like how train-test-split does it)?

Explainations

Could you please provide a more detailed overview of how the explanations system works, and what the math behind it is? I am having a little difficulty understanding the maths of certain explainations.

npm install error: "Native code compile failed!!"

npm version 2.7.4
node version v0.12.2

> [email protected] install <path>/node_modules/limdu/node_modules/execSync
> node install.js

[execsync v1.0.2] Attempting to compile native extensions.
[execSync v1.0.2]
    Native code compile failed!!
[email protected] node_modules/limdu
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected] ([email protected])
├── [email protected] ([email protected])
└── [email protected] ([email protected])

How do I know the progress of Training?

First of all, great work with this library. While it works for small data sets. I wanted to do the same for large datasets. It looks like it is stuck in trainbatch and I am not sure about the progress of the training completed so far.

How do I know how many records have been trained so far or get some feedback that its processing?

This package is no longer usable because "brain" library has been removed

Hey !

First thank you for your awesome module :)

I'm the founder of Gladys, an open-source home automation assistant written in Node.js.

We are using limdu as a dependency, and since this week the installation of Gladys is broken because the dependency brain that is used in limdu has been deprecated and is no longer possible to install.

I saw that there is a community fork called brain.js, I don't know if it supports all features used in limdu, but if yes maybe limdu could switch from the deprecated "brain" to "brain.js".

If you think that's a good idea, I can submit a PR to help you make this module work again :)

Node.js version

Limdu currently runs on Node.js 12 and later versions.

Should probably be:

Limdu currently runs on Node.js 0.12 and later versions.

Current node version is 8.x

Missed fields on cross-validation

After following the example on cross-validation on a project, I've noticed that both microAverage and macroAverage have some fields left as empty or (whichever was the default value in the PrecisionRecall's constructor).

Here's an example (taken from ac-learn)

const Learner = require('./ac-learn`)
const learner = new Learner(); //creates and object where the classifier is the `intentClassifier` from the examples
learner.crossValidate(/*folds=*/ 5);

Which outputs this (seemingly incomplete) object:

{
  macroAvg: {
    count: 79,
    TP: 69,
    TN: 0,
    FP: 3.6,
    FN: 10,
    TRUE: 68.2,
    startTime: 2019-05-02T11:17:15.451Z,
    dep: {},
    confusion: {},
    macroPrecision: 0,
    macroRecall: 0,
    macroF1: 0,
    Accuracy: 0.8632911392405064,
    HammingLoss: 0.17215189873417722,
    HammingGain: 0.8278481012658228,
    Precision: 0.9235869565217393,
    Recall: 0.8734177215189874,
    F1: 0.89154213836478,
    endTime: 'Thu May 02 2019 11:17:15 GMT+0000 (GMT)', //Shouldn't this be in the EPOCH time format like the rest?
    timeMillis: 8.6,
    timePerSampleMillis: 0.1088607594936709,
    shortStatsString: 'Accuracy=86% HammingGain=83% Precision=92% Recall=87% F1=89% timePerSample=0[ms]'
  },
  microAvg: {
    count: 395,
    TP: 345,
    TN: 0,
    FP: 18,
    FN: 50,
    TRUE: 341,
    startTime: 2019-05-02T11:17:15.451Z,
    dep: {},
    confusion: {},
    macroPrecision: 0,
    macroRecall: 0,
    macroF1: 0,
    Accuracy: 0.8632911392405064,
    HammingLoss: 0.17215189873417722,
    HammingGain: 0.8278481012658228,
    Precision: 0.9504132231404959,
    Recall: 0.8734177215189873,
    F1: 0.9102902374670185,
    endTime: 2019-05-02T11:17:16.325Z,
    timeMillis: 874,
    timePerSampleMillis: 2.212658227848101,
    shortStatsString: 'Accuracy=341/395=86% HammingGain=1-68/395=83% Precision=95% Recall=87% F1=91% timePerSample=2[ms]'
  }
}

I had a glance at the code and it seems that labels, dep, confusion, macroPrecision, macroRecall and macroF1 should be filled instead of being {} or 0 so I was wondering if it was a bug?

liblinear question

Sorry for the noob questions.
LibLinear does not seem to have a test binary in 2.1
It compiles a train and predict is predict => test

        train_command: "liblinear_train",
        test_command: "liblinear_test",

Command failed

Error: Command failed: liblinear_train -c 20 tempfiles/SvmLinearDemo_14799415844
74_7224.learn tempfiles/SvmLinearDemo_1479941584474_7224.model

SvmLinear not functioning in new update

undefined function https://github.com/erelsgl/limdu/blob/master/classifiers/svm/SvmLinear.js#L37

moving https://github.com/erelsgl/limdu/blob/master/classifiers/svm/SvmLinear.js#L46 up and renameing L36 function doesn't fix issues in this file as well. the command string isn't valid.

https://github.com/erelsgl/limdu/blob/master/classifiers/svm/SvmLinear.js#L102

Continuous learning

Hi!

How I can improve my ML to continue learning? I cannot putting all inputs in the world. But how limdu can learn about new inputs? Like categories classification?

Or an example like this:

I have 3 documents and I need parse and categorizing. The first are talking about node, the seconds is about JS but the last are talking about Cook! But my train not have any examples about cook!

How I can add a new input for limdu understand how is Cook and Cook are the real good word for this categorie?

CrossValidation example issue

limdu.utils.test(...) does not seem to exist, is there a new API to test the cross validation.

Serialization & Deserialization

I'm having issues with the Serialisation and Deserialisation for combination of limdu + serialization npm packages.

Normally, we will serialise into a .json file.
Using serialize.toString(), I don't where is the serialized .json file.

Q1. Maybe limdu/serialization will not generate the .json file?

Anyone can enlighten?
Thanks.

Does limdu support json file for saving and loading classifier?

If it does, could we have a documentation for it?
For example: https://www.npmjs.com/package/natural-synaptic

What have people used this for?

Has anyone used this in production or in an app they're building? What have you actually used it for? Interested (:

sprintf deprecation

warning ac-learn > limdu > [email protected]: The sprintf package is deprecated in favor of sprintf-js

svmlinear fix

The svmlinear doesn't work.
Since I don't have permission to push diff, I just post here:

diff --git a/classifiers/svm/SvmLinear.js b/classifiers/svm/SvmLinear.js
index 4688a71..8d2d272 100644
--- a/classifiers/svm/SvmLinear.js
+++ b/classifiers/svm/SvmLinear.js
@@ -16,46 +16,41 @@
  *  <li>multiclass - if true, the 'classify' function returns an array [label,score]. If false (default), it returns only a score.
  */
 
+var util  = require('util')
+  , child_process = require('child_process')
+  , exec = require('child_process').exec
+  , fs   = require('fs')
+  , svmcommon = require('./svmcommon')
+  , _ = require('underscore')._
+
+var FIRST_FEATURE_NUMBER=1;  // in lib linear, feature numbers start with 1
+
 function SvmLinear(opts) {
-	if (!SvmLinear.isInstalled()) {
-		var msg = "Cannot find the executable 'liblinear_train'. Please download it from the LibLinear website, and put a link to it in your path.";
-		console.error(msg)
-		throw new Error(msg);
-	}
 	this.learn_args = opts.learn_args || "";
 	this.model_file_prefix = opts.model_file_prefix || null;
 	this.bias = opts.bias || 1.0;
 	this.multiclass = opts.multiclass || false;
 	this.debug = opts.debug||false;
-  	this.train_command = opts.train_command || 'liblinear_train'
-  	this.test_command = opts.test_command || 'liblinear_test'
-  	this.timestamp = ""
+	this.train_command = opts.train_command || 'liblinear_train'
+	this.test_command = opts.test_command || 'liblinear_test'
+	this.timestamp = ""
 
 	if (!SvmLinear.isInstalled()) {
-                var msg = "Cannot find the executable 'liblinear_train'. Please download it from the LibLinear website, and put a link to it in your path.";
-                console.error(msg)
-                throw new Error(msg);
-        }
+			var msg = "Cannot find the executable 'liblinear_train'. Please download it from the LibLinear website, and put a link to it in your path.";
+			console.error(msg)
+			throw new Error(msg);
+	}
 }
 
 SvmLinear.isInstalled = function() {
 	try {
-	    var result = execSync(this.train_command);
+	    var result = child_process.execSync(this.train_command);
 	} catch (err) {
 	    return false
 	}
 	return true
 };
 
-var util  = require('util')
-  , child_process = require('child_process')
-  , exec = require('child_process').exec
-  , fs   = require('fs')
-  , svmcommon = require('./svmcommon')
-  , _ = require('underscore')._
-
-var FIRST_FEATURE_NUMBER=1;  // in lib linear, feature numbers start with 1
-
 
 SvmLinear.prototype = {
 		trainOnline: function(features, expected) {

Synaptic

Please use Synaptic in the next version Limdu :D

https://www.npmjs.com/package/synaptic

SVM wrapper doesn't work

HI.
I'm trying to use external SVM wrappers, as suggested by you, like svm_perf_learn, but even if I put in PATH, I've got this issue:
Cannot find the executable 'svm_perf_learn'. Please download it from the SvmPerf website, and put a link to it in your path.
C:\Users\a\node_modules\limdu\classifiers\svm\SvmPerf.js:29
throw new Error(msg);
It's quite weird because in SvmPerf.js the check it's always true.
Thanks

Always the same result for Bayesian classifier

Hi,
Let's take this example https://github.com/erelsgl/limdu-demo/blob/master/batch_bayesian_explain_serialize.js
I always get the same result { classes: 'white', explanation: [ 'white: 0.0621402182289608', 'black: 0.031460948468170505' ] } no matter what I put as an argument to method classify. In particular shouldn't I get classes: 'black' when I make such call colorClassifier.classify({ r: 0.03, g: 0.7, b: 0.5 } ?

Br,
Samir

limdu.utils.test does not exist.

The cross-validation example uses this, and the utils/index.js has it commented out and trainAndTest.js has never been committed. Could you commit a copy of it (or post it here) so I can try to add it to the library?

Visualize Correlations between data?

Hi I'm using the Online Learning module.
I want to see somehow how data is connected, I'm happy to make charts/diagrams in D3 etc, but just working out what kind of output I can get,

Any advice would be great,
Thankyou.
Vince.

Online Learning

Hi, although I owe you underpaid pull request (I can not find the time to finish it), but I want to ask you something.

How does "trainOnline" learning in real time? From what I noticed the whole network is stored in memory and updated at the same time after adding a new variable?

With each new word network learns more slowly and after some time, and after a certain amount of memory the words no longer stand and comes in console error associated with the lack of memory.

With your hand you have any comments you would like to solve? You can use streams?

online training behaviour for missclassification

I'm making a simple search where the user can evaluate the result as useful or not.
Each time an result is evaluated I "onlineTrain" on server. If the user change its mind, is possible to change the evaluation.

The problem is that sometimes the classify function return an empty value...
Why this happend ? It looks like the online classifier is confused... ( just guessing)

Like:

birdClassifier.trainOnline({'wings': 1, 'flight': 1, 'beak': 1, 'eagle': 1}, 1); 
birdClassifier.trainOnline({'wings': 0, 'flight': 0, 'beak': 0, 'dog': 1}, 0);

Then:

birdClassifier.trainOnline({'wings': 1, 'flight': 1, 'beak': 1, 'eagle': 1},0); 
birdClassifier.trainOnline({'wings': 0, 'flight': 0, 'beak': 0, 'dog': 1}, 1);

It starts to being confused, obviously, and some times it does not return nothing on "classify" ...
I want to know why?

erelsgl / limdu Goto Github PK

limdu's Issues

Overview

Bug Bounty

Recommend Projects

Recommend Topics

Recommend Org

Jobs