mljs / pca Goto Github PK

View Code? Open in Web Editor NEW

94.0 17.0 21.0 1.31 MB

Principal component analysis

Home Page: https://mljs.github.io/pca/

License: MIT License

TypeScript 100.00%

hacktoberfest

pca's Introduction

ml-pca

Principal component analysis (PCA).

Maintained by Zakodium

Installation

$ npm install ml-pca

Usage

const { PCA } = require('ml-pca');
const dataset = require('ml-dataset-iris').getNumbers();
// dataset is a two-dimensional array where rows represent the samples and columns the features
const pca = new PCA(dataset);
console.log(pca.getExplainedVariance());
/*
[ 0.9246187232017269,
  0.05306648311706785,
  0.017102609807929704,
  0.005212183873275558 ]
*/
const newPoints = [
  [4.9, 3.2, 1.2, 0.4],
  [5.4, 3.3, 1.4, 0.9],
];
console.log(pca.predict(newPoints)); // project new points into the PCA space
/*
[
  [ -2.830722471866897,
    0.01139060953209596,
    0.0030369648815961603,
    -0.2817812120420965 ],
  [ -2.308002707614927,
    -0.3175048770719249,
    0.059976053412802766,
    -0.688413413360567 ]]
*/

License

MIT

pca's People

Contributors

Stargazers

Watchers

pca's Issues

PCA crash if the data is only one element

Check the view

http://visualizer.epfl.ch/c/home/index.html?viewURL=%2Fc%2F322d373f937cb95a6706334d60a2c544%2Fview.json

And use as data

1,1

When I construct the PCA object from a dataset and run getExplainedVariance() I get a number[] of variance values, sorted in descending order. I'd like to know which feature relates to each variance value. I could do this if the report specified the original index of the feature in the sample or allowed me to pass in labels.

I've read the API documentation and pulled down the source code, but I really haven't found much explanation there. Am I missing how to do this? I'd be happy to help improve documentation or add this code if you agree it's missing and necessary.

Use case here (see console)
code

Getting different results that numeric.js

Trying to visualize eigenvectors of a point set:

I compared the results with numeric.js using this code:
http://davywybiral.blogspot.co.uk/2012/11/numeric-javascript.html

Noob question: How do I project my data into the PCA space?

For instance check out how this simple ruby library works.

https://github.com/gbuesing/pca

It outputs the dataset in the desired number of dimensions, ready to graph. Is this easy with your library and I am just missing something obvious?

Thanks!

Add missing typescript types and move documentation to typescript

The documentation is published but can be improved (before release maybe):

see https://mljs.github.io/pca/classes/PCA.html

Dimension of eigenvectors

Hi, by advance sorry if my question is stupid as I'm not en expert in statistics

I'm trying to implement a SOM algorithm and the best way to initiate neurons' vectors is by using PCA,

I'm trying to understand the behavior of the generated eigenvectors but can't figure why I've got this on my unit tests:

  const dataSetSize = 10;
  const numDimensions1 = 3;
  const numDimensions2 = 11;

  // Matrix 1 generation
  const data = _.range(0, dataSetSize).map(vec => _.range(0, numDimensions1).map(val => Math.random());

 // Matrix 2 generation
  const data2 = _.range(0, dataSetSize).map(vec => _.range(0, numDimensions2).map(val => Math.random());

  const pca = new PCA(data, {
    center: true,
    scale: false,
  });

  const pca2 = new PCA(data2, {
    center: true,
    scale: false,
  });

  describe('eigenvectors:', () => {
    const eigenvectors = pca.getEigenvectors();
    const eigenvectors2 = pca2.getEigenvectors();
    it('should have as many eigenvectors than the num of dimensions on the dataset', () => {
      assert.strictEqual(eigenvectors.length, numDimensions1);
      assert.strictEqual(eigenvectors2.length, numDimensions2);
    });
    it('an eigenvector should have as many dimensions than a vector from the dataset', () => {
      assert.strictEqual(eigenvectors[0].length, numDimensions1);
      // FAIL:
      // assert.strictEqual(eigenvectors2[0].length, numDimensions2);
    });
  });

Question - Plotting Ability

Hello, first of all I want to give my thanks for the existence of this library. I don't have a lot of experience with multivariate analysis so forgive me if this is a dumb question but I was wondering if it's possible to plot the data generated from the analysis? For example, if you take a look at this Python Plotly graph, they are plotting using the data generated from sklearn. Is this something that is achievable with the data generated from this library? Thank you.

Documentation breaks TravisCI

When the package is installed it push to docs, so it throws an error in TravisCI fatal: empty ident name (for <travis@testing-worker-linux-docker-d81d9b9d-3397-linux-15.prod.travis-ci.org>) not allowed

Reference to the feature names in getExplainedVariance

Hello,

When using new PCA(dataset[]), regardless the order of my observations, (for instance dataset = [[1, 2], [100, 2000]] or dataset = [[2, 1], [2000,100]], the method getExplainedVariance always seems to returnthe same array of values. if there a way to know the reference of the column/feature the score refers to ?

Thank you!

NIPALS returns nans

From the documentation, I understand specifying the nComponents only works for the options.method = 'NIPALS'.

However

options.method='NIPALS' 
options.nCompNIPALS = 2

pca = new PCA(dataset, options)
embedding = pca.predict(dataset, options)

always returns arrays of NaNs. [NaN,NaN]. I've tried it with 'SVD' and there are no NaNs, however the other methods don't support nComponents. So how do you use this for dimensionality reduction?

How to use `getExplainedVariance` results?

I ran getExplainedVariance on my dataset:

const pca = new PCA(students);
console.log(pca.getExplainedVariance());

It worked fine, and I got a list of variances back - however, how do I tell which variance corresponds to which feature?

Optimize predictions when nComponents is specified

number of components ?

I don't understand... we don't choose the number of components of our PCA ?
See how we do with SKlearn :
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

A PCA enables dimensionality reduction, so we have to choose our new dimensionality.

Migrate project to typescript

Before you start please have a loot at:

fix DOI in citation.md once published on Zenodo

Mixed ES6 and CJS causes issues in some enviroments

Thank you for ml-pca, we are using it in a browser project and so far looks great. One issue we are having, however, is that the JS files in ml-pca and ml-matrix use ES6 features within a CJS file. When using JSPM/SystemJS in the browser this means that the files are detected as CJS and not transpiled.

Here is a relevant SystemJS issue: systemjs/systemjs#811

One your end I can see three approaches:

Use babel to transpile on publish.
Use ESM import/export syntax (this may force you to do #1 also).
Ignore it and wait for it to be fixed by JSPM/SystemJS (maybe next version).

Add warning that this is ES6 only package (as it depends on ml-matrix)

Crashes when a feature variance is 0

It would be more desirable if it can skip features with zero variance instead of crashing.

I have written some code to do the check myself, but it feels very inefficient. Suggestion on improvement is welcome

        let mat = new Matrix(input).transpose();
        let mat2 = [];
        mat.forEach((vec, idx) => {
            let mean = this.mean(vec);
            let variance = this.variance(vec, mean);
            if (variance > 1e-7) {
                let svec = this.standardize(vec, mean, variance);
                mat2.push(svec);
            } else {
                // consider 0 variance
            }
        })
        mat2 = new Matrix(mat2);
        mat2 = mat2.transpose();

        // scaled myself to avoid 0-division (caused by 0-variance) problem;
        let pca = new Stat.PCA(mat2, {mean: false, scale: false});

Test

Mono dimensional arrays: TypeError: First argument must be a positive number or an array

I have an array of features with size N with M samples dataset. This is a vector N-dimensional of hashes representing a fingerprint of an audio file. I would like to run PCA on it using this library. So assumed my dataset size is M x N, how to run against this library?

Add Reconstructed loadings

Scores

A getScores() method would be useful.
To do this I used:

let scores = pca.predict(dataset)

I made several attempts before being able to get what I needed (the scores) but I'm not a statistician, it's true. Even a reference to the scores also in the documentation can help those who are not in the field, like me.

update readme once Matrix has a custom inspect function

Test doesn't run in Node.js 4.1

I'm getting error while trying to call new PCA(Matrix(dataset))

TypeError: Class constructors cannot be invoked without 'new'

It should be new PCA(new Matrix(dataset))

What engine (node, iojs, babel) is this package targetting?

RangeError: Submatrix indices are out of range

Usage:

const pca = new PCA([vector]);
const result = pca.predict([vector], { nComponents: 2 });

...where vector is an array with 1,536 elements a la an Open AI embedding. E.g. const vector = [0.00728, -0.0181, 0.014, ....]

The goal is to reduce/project the 1,536-element vector into 2d space for graphing related vectors.

However, when I call pca.predict, it throws this error:

(PID 60093) 2023-07-03T22:51:51.304Z  [ERROR] script-cli.js:127: RangeError: Submatrix indices are out of range
    at checkRange (/Users/josiahbryan/devel/rubber/backend/node_modules/ml-matrix/matrix.js:1038:11)
    at Matrix.subMatrix (/Users/josiahbryan/devel/rubber/backend/node_modules/ml-matrix/matrix.js:2455:5)
    at PCA.predict (/Users/josiahbryan/devel/rubber/backend/node_modules/ml-pca/lib/pca.js:123:28)
    [snipped]