GithubHelp home page GithubHelp logo

saehm / druidjs Goto Github PK

View Code? Open in Web Editor NEW
104.0 7.0 9.0 6.29 MB

A JavaScript Library for Dimensionality Reduction

JavaScript 86.35% HTML 13.65%
dimensionality-reduction clustering javascript-library unsupervised-learning matrix t-sne mds pca umap ltsa

druidjs's Introduction

DruidJS — A JavaScript Library for Dimensionality Reduction.

DruidJS is a JavaScript library for dimensionality reduction. With dimesionality reduction you can project high-dimensional data to a lower dimensionality while keeping method-specific properties of the data. DruidJS makes it easy to project a dataset with the implemented dimensionality reduction methods.


Resources

@inproceedings{cutura2020druid,
  title={{DRUIDJS — A JavaScript Library for Dimensionality Reduction}},
  author={Cutura, Rene and Kralj, Christoph and Sedlmair, Michael},
  booktitle={2020 IEEE Visualization Conference (VIS)},
  pages={111--115},
  year={2020},
  organization={IEEE}
}

Installation

If you use npm, install with npm install @saehrimnir/druidjs, and use it with

import * as druid from "@saehrimnir/druidjs";

Otherwise download the files here, or use for instance unpkg this way:

<script src="https://unpkg.com/@saehrimnir/druidjs"></script>

Matrix

DruidJS uses internally the Matrix class for storing data. You can use it by creating a druid.Matrix object for instance with the function from, in example:

    import * as druid from '@saehrimnir/druidjs';

    let data = [[...], [...], ...];
    let matrix = druid.Matrix.from(data);

You can create a druid.Matrix object programmatically by:

    let fn = (row, col) => row == col ? 1 : 0;
    let matrix = new druid.Matrix(rows, columns, fn);

If rows == columns, then matrix would be a identity matrix. A shortcut for a identity matrix is:

    let matrix = new druid.Matrix(rows, columns, "I");
    // or
    let matrix = new druid.Matrix(rows, columnbs, "identity");

There are more shortcuts for creating matrices:

    let matrix = new druid.Matrix(3, 3, "zeros"); // matrix would be a 3x3 matrix with zeroes
    let matrix = new druid.Matrix(3, 3, "center"); // matrix would be a 3x3 center matrix;
    let number = 12;
    let matrix = new druid.Matrix(3, 3, number); // matrix would b a 3x3 matrix filled with 'number'

If you want to use a druid.Matrix object, for instance, with d3, you can use either the to2dArray property, the iterate_rows generator function, or just use the druid.Matrix object as an iterable (works with d3 since version 6).

    let data = await d3.csv("data.csv");
    let matrix = druid.Matrix.from(data);
    d3.selectAll("datapoints").data(matrix.to2dArray)//...
    d3.selectAll("datapoints").data(matrix.iterate_rows())//...
    d3.selectAll("datapoints").data(matrix)//...

DR methods

Transform

Example

Generator

Example

TopoMap Example

Example ...

druidjs's People

Contributors

dependabot[bot] avatar fil avatar johkehrer avatar saehm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

druidjs's Issues

Scaling up?

RangeError: Array buffer allocation failed

this is the error I get if I try to pass more than 30k points to UMAP. (I suppose that DruidJS should accept and process data in a more compact format?)

LDA has fixed dimension d=2

When I pass d=3 or d=4, all the dimred methods work except:

  • LDA, which returns 2D vectors
  • LSE, crashes with A.concat(B, "vertical"): A and B need same number of columns, A has 4 columns, B has 2 columns.

learn then transform?

If I understand correctly, the current approach learns and transforms at the same time. As a consequence you can't learn on a subset (train set), then transform the whole dataset. It would be nice to be able to train a model then apply it to incoming data.

(It's something that is very easy with some algorithms (PCA), but more difficult with others.)

API design

With the current API, if one wants to project in d=3, one has to know the exact number n of optional arguments before specifying 3 as the n+1th argument. This feels a bit uneasy, and it means that we can't add a supplementary hyperparameter to any method without it being a breaking change.

It seems to be that it would be nice to rethink the API "à la D3", so that:

  • all the algorithms can be called interchangeably
  • we could separate the training and transform phases (#11)
  • we could specify hyperparameters individually
  • we could serialize the model (in and out : save and load)

I would imagine that this could be structured as:

  • new Druid([method or model]) — create a druid
  • druid.values([accessor]) — sets the values accessor if specified, and returns the druid; return the values accessor if not specified
  • druid.dimensions([number]) — sets or returns the dimensions (default: 2)
  • druid.class([accessor]) — sets or returns the class accessor (for LDA)
  • druid.method([name or class]) — sets the current method (UMAP, FASTMAP etc) if specified and returns the druid ; if not specified, return the method (as a Class or function).
  • druid.fit(data) — train the model on the data and returns the druid
  • druid.transform([data]) — transforms the data if specified; if data is not specified, returns the transformed train set
  • druid.model([model]) — returns the serialized model (JSON) if a model is not specified, loads the model if specified

And for each hyperparameter, for example UMAP/min_dist

  • druid.min_dist([min_dist]) — if specified, sets the min_dist hyperparameter and returns the druid, or read it if not specified

With this we could say for example:

const dr = new Druid("LDA"); // dr
dr.dimensions(2).class(d => d.species).values(d => [+d.sepal_length, +d.petal_length, ]).fit(data); // dr
dr.transform(); // transformed data
const model = dr.model(); // JSON {}

const dr = new Druid(model); // dr
dr.transform([new data]); // apply the model to new data…

I wonder what should be done for NaN, I suppose they should be automatically ignored if the values accessor returns any NaN.

Note also that some methods such as UMAP can accept a distance matrix instead of a data array.

PS: Sorry for spamming your project :) The potential is very exciting.

FASTMAP and UMAP .transform are not reentrant

const data = [[0, 0], [1, 1], [2, 1], [2, 2], [1, 2.5]];
const dr = new druid.FASTMAP(data);
dr.transform(); // [[2.828…, 0.707…], [1.414…, 0.707…]…
dr.transform(); // [[0, 0], [0, 0]…

similar issue with UMAP, where the second transform returns "meaningless" values

UMAP metrics?

I was trying to use the jaccard distance as a metric on UMAP, but it seems that it knows only euclidian_squared and "precomputed". Is it something you'd be willing to add?

function jaccard(a, b) {
  let c = 0, n = 0;
  for (let i = 0, l = a.length; i < l; ++i) {
    c += a[i] && b[i];
    n += a[i] || b[i]
  }
  return 1 - c / n;
}

API documentation?

Hello,

I've tried to upgrade my older notebook from 0.3.5 to 0.7.1, and several of the methods break (FASTMAP, PCA, etc). I'm not sure where to find the new API documentation so I can fix this? Thanks!

efficient SVD implementation

The current implementation of the singular value decomposition is naive and in most circumstances infeasible to use, since it computes for an input matrix $X$ both, the covariance matrix $X^\top X$ and the Gram matrix (inner products) $XX^\top$.
This defeats the purpose of the SVD for a very important performance optimization use case of PCA.

For PCA of datasets with high dimensionality (e.g. d > 1000, or d >> n) it is quite costly to compute the covariance matrix and then compute the eigen decomposition of it. Instead we can compute the eigen decomposition implicitly through the SVD without computing the covariance matrix. Like so: $Q\Lambda Q^{-1} = X^\top X = (USV^\top)^\top (USV^\top) = V S U^\top U S V^\top = VS^2V^\top$. This means that the SVD $X = USV^\top$ suffices to get the eigen decomposition of $X^\top X$, i.e., $Q=V$ (orthonormal eigenvectors), $\Lambda = S^2$ (diagonal matrix of eigenvalues).

But for this to give a performance benefit, the SVD implemenmtation cannot be based on computing covariance and Gram matrix (since we want to avoid their computation). Instead an implementation such as described by Golub and Reinsch would be needed.

I know that this is not the easiest algorithm to implement and it seems that there already has been an attempt in implementing it. There is also svd-js that solely implements this algorithm in javascript.

I think druidjs would benefit a lot from a decent svd implementation. But maybe its also good enough to use another lib to compute the svd when needed. What do you think?

Add types

It would be really nice if declaration types will be added so that the package could be compatible with TypeScript.

Could not find a declaration file for module '@saehrimnir/druidjs' / module has no exports error

I am trying to incorporate your library into my project. I am using Yarn package manager and I installed the package with yarn add @saehrimnir/druidjs. The installation finished successfully and I can see the package in node_modules. However, upon importing it I get this message:
image

Moreover, after yarn start I get an error in the console like this, even though the modules from druid import are clearly recognized when I try to use them:
image
image

License?

Hi,

in NPM I see the license as BSD, but in the repo not. Could you please clarify?

The docs should state which type of MDS is implemented

Hi there, thank you for this project. I found it in search for a non-metric MDS implementation in JS, which brings me right to my point.

There are many different flavors of MDS around, see e.g. this survey paper, and they are also not always consistently named: {metric, non-metric, ordinal, generalized, weighted, classical, ...} MDS. Therefore it would be nice if the docs would state explicitly which one is used here.

Judging from the code it seems your implementation is what's often called classical MDS, i.e., the flavor which is not weighted and assumes an Euclidean distance matrix?

Thanks again!

Experimental JavaScript features cause problems with babel or create-react-app

When trying to use DruidJS in a React application that uses (un-ejected) create-react-app, I came across the following error when importing druid using either

import * as druid from '@saehrimnir/druidjs';
or
import * as druid from '@saehrimnir/druidjs/dist/druid.js'; (same with .min)

Failed to compile.

./node_modules/@saehrimnir/druidjs/dist/druid.js
SyntaxError: <censored>\node_modules\@saehrimnir\druidjs\dist\druid.js: Support for the experimental syntax 'classProperties' isn't currently enabled (355:23):

  353 |      * Makes a {@link Matrix} object an iterable object.
  354 |      */
> 355 |     [Symbol.iterator] = this.iterate_rows;
      |                       ^
  356 | 
  357 |     /**
  358 |      * Sets the entries of {@link row}th row from the Matrix to the entries from {@link values}.

Add @babel/plugin-proposal-class-properties (https://git.io/vb4SL) to the 'plugins' section of your Babel config to enable transformation.
If you want to leave it as-is, add @babel/plugin-syntax-class-properties (https://git.io/vb4yQ) to the 'plugins' section to enable parsing.

I found a workaround that allows customizing the babel config without ejecting: https://devinschulz.com/modify-create-react-apps-babel-configuration-without-ejecting/

This works for me, but I would prefer druid to 'just work' also with React, since others might not be able to figure out a solution and might therefore abandon the library. Maybe there is a way to transpile the code, such that experimental features are avoided.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.