GithubHelp home page GithubHelp logo

awc's Introduction

awc

Description

Python implementation of Adaptive Weights Clustering algorithm.

AWC is a novel non-parametric clustering technique based on adaptive weights. Weights are recovered using an iterative procedure based on statistical test of "no gap". Method does not require specifying number of clusters. It applies equally well to clusters of convex structure and different density. The procedure is numerically feasible and applicable for large high dimensional datasets. The paper with detailed description of the algorithm is available on arXiv: Adaptive Nonparametric Clustering.

Installation

Build the latest development version from source:

git clone https://github.com/larisahax/awc.git
cd awc
python setup.py install

Requirements

python 2.7

numpy
scipy
matplotlib

Usage

AWC class provides the API for performing clustering.

AWC_object = AWC(n_neigh=-1, effective_dim=-1, n_outliers=0, discrete=False, speed=1.5)
    n_neigh : int, optional, default: -1
        The number of closest neighbors to be connected on the initialization step.
        If not specified, n_neigh = max(6, min(2 * effective_dim + 2, 0.1 * n_samples))

    effective_dim : int, optional, default: -1
        Effective dimension of data X. 
        If not specified, effective_dim = true dimension of the data, 
        in case the true dimension is less than 7, otherwise 2.

    n_outliers : int, optional, default: 0
        Minimum number of points each cluster must contain.
        Points from clusters with smaller size will be connected to the closest cluster.

    discrete : boolean, optional, default: False
        Specifies if data X consists of only discrete values.

    speed : int, optional, default: 1.5
        Controls the number of iterations.
        Increase of the speed parameter decreases the number of steps.
AWC_object.awc(l, X, dmatrix=None)
    l : int
        The lambda parameter.
    
    X : array, shape (n_samples, n_features), optional, default: None 
        Input data. 
        awc works with distance matrix. 
        User must specify X or dmatrix.
        From X Euclidean distance matrix is computed and passed to awc.
        No need to specify X, if dmatrix is specified. 
    
    dmatrix : array, shape (n_samples, n_n_samples), optional, default: None
        Distance matrix.

Cluster structure found by AWC

clusters = AWC_object.get_clusters()

Cluster labels

labels = AWC_object.get_labels()

To tune the parameter \lambda, plot sum of the weights for \lambda 's from some interval and take a value at the end of plateau or before a huge jump.

AWC_object.plot_sum_of_weights(lambda_interval, X=None, dmatrix=None)
  lambda_interval : list
        Lambda parameters for which sum of weights will be computed.
        
    X : array, shape (n_samples, n_features), optional, default: None 
        Input data. 
        No need to specify X, if dmatrix is specified. 
    
    dmatrix : array, shape (n_samples, n_n_samples), optional, default: None
        Distance matrix. If not specified, the Euclidean distance matrix is used from X.  

Example

See example.py for an example running awc on iris dataset.

awc's People

Contributors

larisahax avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.