saehm / druidjs Goto Github PK
View Code? Open in Web Editor NEWA JavaScript Library for Dimensionality Reduction
A JavaScript Library for Dimensionality Reduction
RangeError: Array buffer allocation failed
this is the error I get if I try to pass more than 30k points to UMAP. (I suppose that DruidJS should accept and process data in a more compact format?)
When I pass d=3 or d=4, all the dimred methods work except:
A.concat(B, "vertical"): A and B need same number of columns, A has 4 columns, B has 2 columns.
The current implementation of the singular value decomposition is naive and in most circumstances infeasible to use, since it computes for an input matrix
This defeats the purpose of the SVD for a very important performance optimization use case of PCA.
For PCA of datasets with high dimensionality (e.g. d > 1000, or d >> n) it is quite costly to compute the covariance matrix and then compute the eigen decomposition of it. Instead we can compute the eigen decomposition implicitly through the SVD without computing the covariance matrix. Like so:
But for this to give a performance benefit, the SVD implemenmtation cannot be based on computing covariance and Gram matrix (since we want to avoid their computation). Instead an implementation such as described by Golub and Reinsch would be needed.
I know that this is not the easiest algorithm to implement and it seems that there already has been an attempt in implementing it. There is also svd-js that solely implements this algorithm in javascript.
I think druidjs would benefit a lot from a decent svd implementation. But maybe its also good enough to use another lib to compute the svd when needed. What do you think?
I was trying to use the jaccard distance as a metric on UMAP, but it seems that it knows only euclidian_squared and "precomputed". Is it something you'd be willing to add?
function jaccard(a, b) {
let c = 0, n = 0;
for (let i = 0, l = a.length; i < l; ++i) {
c += a[i] && b[i];
n += a[i] || b[i]
}
return 1 - c / n;
}
Hi,
in NPM I see the license as BSD, but in the repo not. Could you please clarify?
It would be really nice if declaration types will be added so that the package could be compatible with TypeScript.
I notice in your conference talk, you present some examples in Observable.
It would be useful to link to them from the README here.
Great work!
I'm not sure if self-organizing maps are in the scope of this tool, but the algorithm is simple enough to consider. See https://observablehq.com/@fil/som-delaunay for an example. (I might do a PR at some point if you're open to the idea.)
If I understand correctly, the current approach learns and transforms at the same time. As a consequence you can't learn on a subset (train set), then transform the whole dataset. It would be nice to be able to train a model then apply it to incoming data.
(It's something that is very easy with some algorithms (PCA), but more difficult with others.)
I was going to ask about Sammon Mapping and just saw this JS implementation https://observablehq.com/@jonnydedwards/animated-sammon-map
static parameter_list = [];
breaks on Safari
See https://github.com/tc39/proposal-static-class-features
We can either wait for Safari to catch up, or use normal properties instead.
Using new DR(data, ...P, 3)
with FASTMAP results in a planar diagram, with transformed[0] === transformed[2]
https://observablehq.com/d/f0ac59b22d266632
(I added plotly to render the maps in 3D)
I am trying to incorporate your library into my project. I am using Yarn package manager and I installed the package with yarn add @saehrimnir/druidjs
. The installation finished successfully and I can see the package in node_modules. However, upon importing it I get this message:
Moreover, after yarn start
I get an error in the console like this, even though the modules from druid import are clearly recognized when I try to use them:
With the current API, if one wants to project in d=3, one has to know the exact number n of optional arguments before specifying 3 as the n+1th argument. This feels a bit uneasy, and it means that we can't add a supplementary hyperparameter to any method without it being a breaking change.
It seems to be that it would be nice to rethink the API "à la D3", so that:
I would imagine that this could be structured as:
And for each hyperparameter, for example UMAP/min_dist
With this we could say for example:
const dr = new Druid("LDA"); // dr
dr.dimensions(2).class(d => d.species).values(d => [+d.sepal_length, +d.petal_length, …]).fit(data); // dr
dr.transform(); // transformed data
const model = dr.model(); // JSON {}
…
const dr = new Druid(model); // dr
dr.transform([new data]); // apply the model to new data…
I wonder what should be done for NaN, I suppose they should be automatically ignored if the values accessor returns any NaN.
Note also that some methods such as UMAP can accept a distance matrix instead of a data array.
PS: Sorry for spamming your project :) The potential is very exciting.
It would be great to link up to the conference talk in the README.
const data = [[0, 0], [1, 1], [2, 1], [2, 2], [1, 2.5]];
const dr = new druid.FASTMAP(data);
dr.transform(); // [[2.828…, 0.707…], [1.414…, 0.707…]…
dr.transform(); // [[0, 0], [0, 0]…
similar issue with UMAP, where the second transform returns "meaningless" values
When trying to use DruidJS in a React application that uses (un-ejected) create-react-app, I came across the following error when importing druid using either
import * as druid from '@saehrimnir/druidjs';
or
import * as druid from '@saehrimnir/druidjs/dist/druid.js';
(same with .min)
Failed to compile.
./node_modules/@saehrimnir/druidjs/dist/druid.js
SyntaxError: <censored>\node_modules\@saehrimnir\druidjs\dist\druid.js: Support for the experimental syntax 'classProperties' isn't currently enabled (355:23):
353 | * Makes a {@link Matrix} object an iterable object.
354 | */
> 355 | [Symbol.iterator] = this.iterate_rows;
| ^
356 |
357 | /**
358 | * Sets the entries of {@link row}th row from the Matrix to the entries from {@link values}.
Add @babel/plugin-proposal-class-properties (https://git.io/vb4SL) to the 'plugins' section of your Babel config to enable transformation.
If you want to leave it as-is, add @babel/plugin-syntax-class-properties (https://git.io/vb4yQ) to the 'plugins' section to enable parsing.
I found a workaround that allows customizing the babel config without ejecting: https://devinschulz.com/modify-create-react-apps-babel-configuration-without-ejecting/
This works for me, but I would prefer druid to 'just work' also with React, since others might not be able to figure out a solution and might therefore abandon the library. Maybe there is a way to transpile the code, such that experimental features are avoided.
Hello,
I've tried to upgrade my older notebook from 0.3.5 to 0.7.1, and several of the methods break (FASTMAP, PCA, etc). I'm not sure where to find the new API documentation so I can fix this? Thanks!
I took UMAP as an example, but note that many methods crash or return garbage on small datasets. We could probably limit the number of neighbors to n-1, etc?
Hi there, thank you for this project. I found it in search for a non-metric MDS implementation in JS, which brings me right to my point.
There are many different flavors of MDS around, see e.g. this survey paper, and they are also not always consistently named: {metric, non-metric, ordinal, generalized, weighted, classical, ...} MDS. Therefore it would be nice if the docs would state explicitly which one is used here.
Judging from the code it seems your implementation is what's often called classical MDS, i.e., the flavor which is not weighted and assumes an Euclidean distance matrix?
Thanks again!
TopoMap: A 0-dimensional Homology Preserving Projection of High-Dimensional Data
Harish Doraiswamy, Julien Tierny, Paulo J. S. Silva, Luis Gustavo Nonato, Claudio Silva
See https://arxiv.org/abs/2009.01512 [Submitted on 3 Sep 2020]
the current reference implementation in in C++ but the code is really short https://github.com/harishd10/TopoMap/blob/master/cpp/TopoMap.cpp
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.