GithubHelp home page GithubHelp logo

pca's Introduction

ml-pca

Principal component analysis (PCA).

Zakodium logo

Maintained by Zakodium

NPM version build status DOI npm download

Installation

$ npm install ml-pca

Usage

const { PCA } = require('ml-pca');
const dataset = require('ml-dataset-iris').getNumbers();
// dataset is a two-dimensional array where rows represent the samples and columns the features
const pca = new PCA(dataset);
console.log(pca.getExplainedVariance());
/*
[ 0.9246187232017269,
  0.05306648311706785,
  0.017102609807929704,
  0.005212183873275558 ]
*/
const newPoints = [
  [4.9, 3.2, 1.2, 0.4],
  [5.4, 3.3, 1.4, 0.9],
];
console.log(pca.predict(newPoints)); // project new points into the PCA space
/*
[
  [ -2.830722471866897,
    0.01139060953209596,
    0.0030369648815961603,
    -0.2817812120420965 ],
  [ -2.308002707614927,
    -0.3175048770719249,
    0.059976053412802766,
    -0.688413413360567 ]]
*/

License

MIT

pca's People

Contributors

jajoe avatar jwist avatar lpatiny avatar mljs-bot avatar ppierzc avatar redhaam avatar sebastien-ahkrin avatar targos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pca's Issues

Feature Labels

When I construct the PCA object from a dataset and run getExplainedVariance() I get a number[] of variance values, sorted in descending order. I'd like to know which feature relates to each variance value. I could do this if the report specified the original index of the feature in the sample or allowed me to pass in labels.

I've read the API documentation and pulled down the source code, but I really haven't found much explanation there. Am I missing how to do this? I'd be happy to help improve documentation or add this code if you agree it's missing and necessary.

Use case here (see console)
code

Dimension of eigenvectors

Hi, by advance sorry if my question is stupid as I'm not en expert in statistics

I'm trying to implement a SOM algorithm and the best way to initiate neurons' vectors is by using PCA,

I'm trying to understand the behavior of the generated eigenvectors but can't figure why I've got this on my unit tests:

  const dataSetSize = 10;
  const numDimensions1 = 3;
  const numDimensions2 = 11;

  // Matrix 1 generation
  const data = _.range(0, dataSetSize).map(vec => _.range(0, numDimensions1).map(val => Math.random());

 // Matrix 2 generation
  const data2 = _.range(0, dataSetSize).map(vec => _.range(0, numDimensions2).map(val => Math.random());

  const pca = new PCA(data, {
    center: true,
    scale: false,
  });

  const pca2 = new PCA(data2, {
    center: true,
    scale: false,
  });

  describe('eigenvectors:', () => {
    const eigenvectors = pca.getEigenvectors();
    const eigenvectors2 = pca2.getEigenvectors();
    it('should have as many eigenvectors than the num of dimensions on the dataset', () => {
      assert.strictEqual(eigenvectors.length, numDimensions1);
      assert.strictEqual(eigenvectors2.length, numDimensions2);
    });
    it('an eigenvector should have as many dimensions than a vector from the dataset', () => {
      assert.strictEqual(eigenvectors[0].length, numDimensions1);
      // FAIL:
      // assert.strictEqual(eigenvectors2[0].length, numDimensions2);
    });
  });

Question - Plotting Ability

Hello, first of all I want to give my thanks for the existence of this library. I don't have a lot of experience with multivariate analysis so forgive me if this is a dumb question but I was wondering if it's possible to plot the data generated from the analysis? For example, if you take a look at this Python Plotly graph, they are plotting using the data generated from sklearn. Is this something that is achievable with the data generated from this library? Thank you.

Documentation breaks TravisCI

When the package is installed it push to docs, so it throws an error in TravisCI fatal: empty ident name (for <travis@testing-worker-linux-docker-d81d9b9d-3397-linux-15.prod.travis-ci.org>) not allowed

Reference to the feature names in getExplainedVariance

Hello,

When using new PCA(dataset[]), regardless the order of my observations, (for instance dataset = [[1, 2], [100, 2000]] or dataset = [[2, 1], [2000,100]], the method getExplainedVariance always seems to returnthe same array of values. if there a way to know the reference of the column/feature the score refers to ?

Thank you!

NIPALS returns nans

From the documentation, I understand specifying the nComponents only works for the options.method = 'NIPALS'.

However

options.method='NIPALS' 
options.nCompNIPALS = 2

pca = new PCA(dataset, options)
embedding = pca.predict(dataset, options)

always returns arrays of NaNs. [NaN,NaN]. I've tried it with 'SVD' and there are no NaNs, however the other methods don't support nComponents. So how do you use this for dimensionality reduction?

How to use `getExplainedVariance` results?

I ran getExplainedVariance on my dataset:

const pca = new PCA(students);
console.log(pca.getExplainedVariance());

It worked fine, and I got a list of variances back - however, how do I tell which variance corresponds to which feature?

Mixed ES6 and CJS causes issues in some enviroments

Thank you for ml-pca, we are using it in a browser project and so far looks great. One issue we are having, however, is that the JS files in ml-pca and ml-matrix use ES6 features within a CJS file. When using JSPM/SystemJS in the browser this means that the files are detected as CJS and not transpiled.

Here is a relevant SystemJS issue: systemjs/systemjs#811

One your end I can see three approaches:

  1. Use babel to transpile on publish.
  2. Use ESM import/export syntax (this may force you to do #1 also).
  3. Ignore it and wait for it to be fixed by JSPM/SystemJS (maybe next version).

Crashes when a feature variance is 0

It would be more desirable if it can skip features with zero variance instead of crashing.

I have written some code to do the check myself, but it feels very inefficient. Suggestion on improvement is welcome

        let mat = new Matrix(input).transpose();
        let mat2 = [];
        mat.forEach((vec, idx) => {
            let mean = this.mean(vec);
            let variance = this.variance(vec, mean);
            if (variance > 1e-7) {
                let svec = this.standardize(vec, mean, variance);
                mat2.push(svec);
            } else {
                // consider 0 variance
            }
        })
        mat2 = new Matrix(mat2);
        mat2 = mat2.transpose();

        // scaled myself to avoid 0-division (caused by 0-variance) problem;
        let pca = new Stat.PCA(mat2, {mean: false, scale: false});

Scores

A getScores() method would be useful.
To do this I used:

let scores = pca.predict(dataset)

I made several attempts before being able to get what I needed (the scores) but I'm not a statistician, it's true. Even a reference to the scores also in the documentation can help those who are not in the field, like me.

Test doesn't run in Node.js 4.1

I'm getting error while trying to call new PCA(Matrix(dataset))

TypeError: Class constructors cannot be invoked without 'new'

It should be new PCA(new Matrix(dataset))

What engine (node, iojs, babel) is this package targetting?

RangeError: Submatrix indices are out of range

Usage:

const pca = new PCA([vector]);
const result = pca.predict([vector], { nComponents: 2 });

...where vector is an array with 1,536 elements a la an Open AI embedding. E.g. const vector = [0.00728, -0.0181, 0.014, ....]

The goal is to reduce/project the 1,536-element vector into 2d space for graphing related vectors.

However, when I call pca.predict, it throws this error:

(PID 60093) 2023-07-03T22:51:51.304Z  [ERROR] script-cli.js:127: RangeError: Submatrix indices are out of range
    at checkRange (/Users/josiahbryan/devel/rubber/backend/node_modules/ml-matrix/matrix.js:1038:11)
    at Matrix.subMatrix (/Users/josiahbryan/devel/rubber/backend/node_modules/ml-matrix/matrix.js:2455:5)
    at PCA.predict (/Users/josiahbryan/devel/rubber/backend/node_modules/ml-pca/lib/pca.js:123:28)
    [snipped]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.