GithubHelp home page GithubHelp logo

dreadlord1984 / tsne-cuda Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cannylab/tsne-cuda

0.0 3.0 0.0 13.17 MB

GPU Accelerated t-SNE for CUDA with Python bindings

License: BSD 3-Clause "New" or "Revised" License

CMake 5.52% Makefile 0.06% Cuda 61.47% C++ 24.45% C 2.30% Python 6.20%

tsne-cuda's Introduction

TSNE-CUDA

WARNING: This code is still in active development. While the core code is tested and working, some additional features need aditional testing.

This repo is an optimized CUDA version of Barnes-Hut t-SNE by L. Van der Maaten with associated python modules. We find that our implementation of t-SNE can be up to 1200x faster than Sklearn, or up to 50x faster than Multicore-TSNE when used with the right GPU. The paper describing our approach, as well as the results below, is available at https://arxiv.org/abs/1807.11824.

To begin, check out our wiki for install instructions and usage: https://github.com/CannyLab/tsne-cuda/wiki/

Benchmarks

Simulated Data

Time taken compared to other state of the art algorithms on synthetic datasets with 50 dimensions and four clusters for varying numbers of points. Note the log scale on both the points and time axis, and that the scale of the x-axis is in thousands of points (thus, the values on the x-axis range from 1K to 10M points. Dashed lines represent projected times. Projected scaling assumes an O(nlog(n)) implementation.

MNIST

The performance of t-SNE-CUDA compared to other state-of-the-art implementations on the MNIST dataset. t-SNE-CUDA runs on the raw pixels of the MNIST dataset (60000 images x 768 dimensions) in under 7 seconds.

CIFAR

The performance of t-SNE-CUDA compared to other state-of-the-art implementations on the CIFAR-10 dataset. t-SNE-CUDA runs on the raw pixels of the CIFAR-10 training set (50000 images x 1024 dimensions x 3 channels) in under 12 seconds.

Comparison of Embedding Quality

The quality of the embeddings produced by t-SNE-CUDA do not differ significantly from the state of the art implementations. See below for a comparison of MNIST cluster outputs.

Left: MULTICORE-4 (501s), Middle: BH-TSNE (1156s), Right: t-SNE-CUDA (Ours, 6.98s).

Installation

To install our library, follow the instructions in the installation section of the wiki.

Run

Like many of the libraries available, the python wrappers subscribe to the same API as sklearn.manifold.TSNE.

You can run it as follows:

from tsnecuda import TSNE
X_embedded = TSNE(n_components=2, perplexity=15, learning_rate=10).fit_transform(X)

It's worth noting that if n_components is >= 3, then the program uses the naive O(n^2) method by default. If the number of components is 2, then you can use the heavily optimized Barnes-Hut implementation.

For more information on running the library, or using it as a C++ library, see the Python usage or C++ Usage sections of the wiki.

Future work

  • Allow for double precision
  • Expand FMM methods
  • Add multi-threaded CPU version for those without a GPU

Known Bugs

  • Odd bug with some datasets that causes a hang/gpu memory error.

Citation

Please cite this repository if it was useful for your research:

@misc{cudatsne2018,
  author = {Chan, D. and Rao, R. and Huang, Z.},
  title = {TSNE-CUDA},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/CannyLab/tsne-cuda.git}},
}

This library is built on top of the following technology, without this tech, none of this would be possible!

L. Van der Maaten's paper

Multicore-TSNE

BHTSNE

CUDA Utilities/Pairwise Distance

LONESTAR-GPU

FAISS

GTest

CXXopts

License

Our code is built using components from FAISS, the Lonestar GPU library, GTest, CXXopts, and OrangeOwl's CUDA utilities. Each portion of the code is governed by their respective licenses - however our code is governed by the BSD-3 license found in LICENSE.txt

tsne-cuda's People

Contributors

davidmchan avatar rmrao avatar huang4fstudio avatar

Watchers

James Cloos avatar DL avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.