GithubHelp home page GithubHelp logo

tsne-viz's Introduction

t-SNE Visualization

This repository is an easy-to-run t-SNE visualization tool for your dataset of choice. It currently supports 2D and 3D plots as well as an optional original image overlay on top of the 2D points.

Drawing Drawing

Installation

Ubuntu Installation

First clone this repository, then install the TkInter package by running:

sudo apt-get install python3-tk

Optionally create a virtualenv for this project:

cd tsne-vis
virtualenv -p python3
source venv/bin/activate

Then install the python3 dependecies:

cd tsne-vis
pip install -r requirements.txt

Usage

Example Command

python main.py --num_samples=5000 --num_dimensions=2 --compute_embeddings=False --with_images=False

This will plot a 2D t-SNE plot with no image overlay. Note that the example code uses the Fashion-MNIST dataset which you can download by running:

chmod +x download_data.sh
./download_data.sh

You'll only need to modify the load_data method if you're planning on using your own dataset. Make sure it returns a set of numpy arrays: for example, if embedding grasycale images, you'll want to return an array of images and their associated labels as follows

X: (100, 32, 32)
y: (100,)

To see all possible command line options, run

python main.py --help

which will print:

usage: main.py [-h] [--num_samples NUM_SAMPLES]
               [--num_dimensions NUM_DIMENSIONS] [--shuffle SHUFFLE]
               [--compute_embeddings COMPUTE_EMBEDDINGS]
               [--with_images WITH_IMAGES] [--random_seed RANDOM_SEED]
               [--data_dir DATA_DIR] [--plot_dir PLOT_DIR]

t-SNE Visualizer

optional arguments:
  -h, --help            show this help message and exit

Setup:
  --num_samples NUM_SAMPLES
                        # of samples to compute embeddings on. Becomes slow if
                        very high.
  --num_dimensions NUM_DIMENSIONS
                        # of tsne dimensions. Can be 2 or 3.
  --shuffle SHUFFLE     Whether to shuffle the data before embedding.
  --compute_embeddings COMPUTE_EMBEDDINGS
                        Whether to compute embeddings. Do this once per sample
                        size.
  --with_images WITH_IMAGES
                        Whether to overlay images on data points. Only works
                        with 2D plots.
  --random_seed RANDOM_SEED
                        Seed to ensure reproducibility

Path Params:
  --data_dir DATA_DIR   Directory where data is stored
  --plot_dir PLOT_DIR   Directory where plots are saved

Image Overlay

The overlay option only works for 2D plots and relies on matplotlib's AnnotationBox method. Here's an example of what it outputs:

Drawing

tsne-viz's People

Contributors

danielsnider avatar kevinzakka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

tsne-viz's Issues

Visualization for the custom dataset

@danielsnider @kevinzakka thanks for sharing the code . i had few queries

  1. Can this code be used to visualize custom dataset
    2.What are the tweaking in the code should be done for custom data
  2. The custom dataset are just images will it work for tat also

Thanks in advance

Image dimension for ploting

Hello, first let me congratulate for an excellent code. It's well documented and well written. All the other plots that are tried were not good enough to understand the data, this one is very clear.
Anyway to the point. I tried it with my own dataset, I'm able to plot in 3d and 2d, as well as to do the embedding, but I cannt overlay the images, I get the following error

Traceback (most recent call last):
File "main.py", line 134, in
main(config)
File "main.py", line 115, in main
im = OffsetImage(X_sample[i], zoom=0.1, cmap='gray')
File "/usr/local/lib/python3.5/dist-packages/matplotlib/offsetbox.py", line 1300, in init
self.set_data(arr)
File "/usr/local/lib/python3.5/dist-packages/matplotlib/offsetbox.py", line 1304, in set_data
self.image.set_data(self._data)
File "/usr/local/lib/python3.5/dist-packages/matplotlib/image.py", line 653, in set_data
raise TypeError("Invalid dimensions for image data")
TypeError: Invalid dimensions for image data

My X_sample is of shape (5000, 3, 32, 32)
where 5000 is the data size
X_sample[i].shape is (3, 32, 32)

What is the correct dimension?

Thank you for your time and code

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.