GithubHelp home page GithubHelp logo

jaidevshriram / finch-clustering Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ssarfraz/finch-clustering

0.0 0.0 0.0 111 MB

Source Code for FINCH Clustering Algorithm

License: Other

Python 32.54% MATLAB 67.46%

finch-clustering's Introduction

First Integer Neighbor Clustering Hierarchy (FINCH) Algorithm

The repository contains our Python and Matlab code for the proposed FINCH clustering algorithm described in our Efficient Parameter-free Clustering Using First Neighbor Relations CVPR 2019 oral paper.

@inproceedings{finch,
    author    = {M. Saquib Sarfraz and Vivek Sharma and Rainer Stiefelhagen}, 
    title     = {Efficient Parameter-free Clustering Using First Neighbor Relations}, 
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    pages = {8934--8943}
    year  = {2019}
}
Requirements

Python : numpy, scipy, sklearn

Matlab : 2017 or above: may run on earlier versions as well

Optional. Install PyNNDescent to get first neighbours for large data

Usage:

Python : typically you would run:

from finch import FINCH
c, num_clust, req_c = FINCH(data)

You can set options e.g., required number of cluster or distance etc,

c, num_clust, req_c = FINCH(data, initial_rank=None, req_clust=None, distance='cosine', verbose=True)

See also readme in python repo.

Matlab : Please go in the path where you copied this folder or add its path to your Matlab path.

[c, num_clust]= FINCH(data, initial_rank, verbose);

Input:

  • data: data Matrix (feature vecotrs in rows)
  • initial_rank [Optional]: Nx1 (1-neighbour) indices vector. Pass empty [] to compute the 1st neighbor via pdist or flann
  • verbos : printing some output

Output:

  • c: N x P matrix Each column vector contains cluster labels for each partition P
  • num_clust: shows total number of cluster in each partition P

In Matlab typically you would run:

[c, num_clust]= FINCH(data,[], 1);

Example: Cluster the STL-10 data (13000 images of 10 object classes. We provide the used 2048 CNN resnet features. Please load the data in Matlab from /data/STL_10/data.mat. This has 13000 vectors stored as a matrix of size (13000,2048), each vector is 2048 dimensional.

Now cluster it using FINCH, run the above command with tic toc to see the runtime. The run time includes computing first neighbours via exact distance and every thing.

alt text

On our machine it was < 3 seconds, it should be about the same depending upon if your machine has similar specs.

it returns the cluster labels for each partition in the variable c which will be of size (N x numPartitions) e.g., (13000, 5) in this case. Each column in array c provides cluster labels for that partition.

num_clust provides how many cluster it has produced in each partition or step of the run.

As you see: num_clust = [2061, 177, 37, 10, 2]

inidicating it found 2061 clusters in step 1, 177 in step 2 and so on to 10 clusters in step 4. You can pick the respective cluster labels for the data in the returned array c. For example, c(:,4) will provide labels for 10 clustering result.

[Evaluation]: The true labels for this data are also provided in the same repo in label.mat file. You can run any performance metric e.g., to compute NMI metric use nmi.mat function (provided in utils) and run nmi(labels, c(:,4)), or compute BCubed Fscore with b3(labels, c(:,4)).

Similarly you can run FINCH on other datasets e.g., we provide the used mnist10k and mice protein data.

Required number of Clusters

We provide a very simple approximation to refine one of the FINCH partition to come down to required number of clusters. However, please note this is not recommended and provided here just for completeness. One could use better ways to refine a partition, if needed.

Python: Set it in the req_clust input option.

Matlab: req_c = req_numclust(c, data,req_clust)

Example:

req_c= req_numclust(c(:,3), data, 15)

Provides the required 15 clusters by refining the respective FINCH partition (37 clusters), from FINCH returned c mat in the above example. See function help for more details.

Note:

To run it on large data (for which computing pairwise distance matrix won't fit in your memory), install pynndescent (pip install pynndescent) or flann library for matlab (In matlab, see flann_nn.m). Note that you can also change when to switch to using Approximate NN in python [finch.py] line 17 or in the matlab function [clustRank.m] line 10.

Finally, if you use FINCH on 2D toy data (for visualization) then use euclidean distance. In python [finch.py] as input option or in matlab, the file [clustRank.m] line 11. as here you have xy coordinates of each point to cluster.

The code and FINCH algorithm is not meant for commercial use. Please contact the author below for licensing information.

M. Saquib Sarfraz ([email protected])

finch-clustering's People

Contributors

ssarfraz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.