GithubHelp home page GithubHelp logo

sonjageorgievska / embed-dive Goto Github PK

View Code? Open in Web Editor NEW
6.0 1.0 0.0 900 KB

A tool that performs 3D embedding of data and provides interactive visualization.

Home Page: https://sonjageorgievska.github.io/DiVE/

License: Other

Python 10.01% CSS 19.44% HTML 9.40% JavaScript 61.15%
embedding interactive-visualization dimensionality-reduction manifold-learning

embed-dive's Introduction

Please cite the software if you are using it in your scientific publication.

DOI

Please also cite the paper on LargeVis as stated here.

Embed-Dive

Embed-Dive is a pipeline consisting of LargeVis and DiVE. Please visit the respective repositories for more information, individual licencing, latest versions etc.

The purpose of this repository is to provide a release of the pipeline to (for the moment) Linux users that want to explore their data, provided that the input is a similarities graph.

Screenshot

##Usage - for Linux users ##

Download the compiled version of Embed-Dive_linux from [here] (https://github.com/sonjageorgievska/Embed-Dive/releases) in the same folder where you have downloaded Embed_Dive.

For quick try and test:

./Embed_Dive_linux -input test/sim.txt -output test/coord.txt -outdim 3 -samples 1

At the end you should be able to see your browser opening the visualization.

Required parameteres:

  • -input: Input file of similarities (see the test folder for input format).
  • -output: Output file of low-dimensional representations.

Besides the two parameters, other optional parameters include:

  • -threads: Number of threads. Default is 8. Recommended: cores x 2.

  • -outdim: The lower dimensionality LargesVis learns for visualization (2 or 3). Default is 2.

  • -samples: Number of edge samples for graph layout (in millions). Default is set to data size / 100 (million).

  • -prop: Number of times for neighbor propagations in the state of K-NNG construction, usually less than 3. Default is 3.

  • -alpha: Initial learning rate. Default is 1.0.

  • -trees: Number of random-projection trees used for constructing K-NNG. 50 is sufficient for most cases unless you are dealing with very large datasets (e.g. data size over 5 million), and less trees are suitable for smaller datasets. Default is set according to the data size.

  • -neg: Number of negative samples used for negative sampling. Default is 5.

  • -neigh: Number of neighbors (K) in K-NNG, which is usually set as three times of perplexity. Default is 150.

  • -gamma: The weights assigned to negative edges. Default is 7.

  • -perp: The perplexity used for deciding edge weights in K-NNG. Default is 50.

  • -metaData: file containing meta information about data. Format: [id] [metadata]. Format of metadata: "first_line" "second_line" "third_line" (number of lines is not limited). Example line of metadata: 35 "A dog" "Age:2" "Color brown".

  • -dir: base directory to store output file

  • -divedir: directory where DiVE resides

  • -np: A json file containing list of properties names. Ex: ["Height", "Weight", "Place of birth"].

  • -json: Name of json file which is input to DiVE. It can be uploaded at anytime later from DiVE.

Best practices:

  • Do not use the -samples parameter; let the embedding use the default value. Use -samples 1 only if you want quick results.
  • Use -outdim 2 unless the visualization results are very different than with -outdim 3.
  • The similarities graph does not need to contain duplicate edges (does not need to be bi-directional)
  • Use the -json parameter so that you can easily load the coordinates in DiVE at any later stage. Computing the coordinates with LargeVis takes a lot of time, it only needs to be done once.
  • If you forgot to include metadata before computing coordinates, you can combine the coordinates with metadata later as described here DiVE (""From output of LargeVis to input of DiVE")

Please send any suggestions for improvement to the author.

Licence

The software is released under the Creative Commons Attribution-NoDerivatives licence. Contact the author if you would like a version with an Apache licence

embed-dive's People

Contributors

sonjageorgievska avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.