GithubHelp home page GithubHelp logo

rakesh4g / fastpair Goto Github PK

View Code? Open in Web Editor NEW

This project forked from carsonfarmer/fastpair

0.0 1.0 0.0 27 KB

FastPair: Data-structure for the dynamic closest-pair problem.

License: MIT License

Python 100.00%

fastpair's Introduction

FastPair

Data-structure for the dynamic closest-pair problem.

Overview

This project is an implementation of the FastPair dynamic closest-pair data-structure described in David Eppstein's Fast Hierarchical Clustering and Other Applications of Dynamic Closest Pairs. The data-structure is based on the observation that the conga line data- structure, in practice, does better the more subsets you give to it: even though the worst case time for $k$ subsets is $O(nk\log{(n/k)})$, that worst case seems much harder to reach than the nearest neighbor algorithm.

In the limit of arbitrarily many subsets, each new addition or point moved by a deletion will be in a singleton subset, and the algorithm will differ from nearest neighbors in only a couple of ways: (1) when we create the initial data structure, we use a conga line rather than all nearest neighbors, to keep the in-degree of each point low, and (2) when we insert a point, we don't bother updating other points' neighbors.

Total space: $20n$ bytes (could be reduced to 4n at some cost in update time). Time per insertion or single distance update: $O(n)$. Time per deletion or point update: $O(n)$ expected, $O(n^2)$ worst case. Time per closest pair: $O(n)$.

This Python version of the algorithm combines ideas and code from the closest-pair data structure testbed (C++) developed around a series of papers by Eppstein et al.

Installation

FastPairs has not yet been uploaded to PyPi, as we are currently at the 'pre-release' stage*. Having said that you should be able to install it via pip directly from the GitHub repository with:

pip install git+git://github.com/carsonfarmer/fastpair.git

You can also install FastPair by cloning the GitHub repository and using the setup script:

git clone https://github.com/carsonfarmer/fastpair.git
cd fastpair
python setup.py install

* This means the API is not set, and subject to crazy changes at any time!

Testing

Build StatusCoverage Status

FastPair comes with a comprehensive preliminary range of tests. To run the tests, you can use py.test (maybe also nosetests?), which can be installed via pip using the recommended.txt file (note, this will also install numpy, scipy, matplotlib, and IPython which are all great and useful for tests and examples). So far testing has been done with CPython 2.7 and 3.4.

pip install -r recommended.txt
py.test fastpair

Features

In the following examples we use the random module to generate data.

from fastpair import FastPair, interact
import random

def rand_tuple(dim=2):
    return tuple([random.random() for _ in range(dim)])

Basics

The simplest way to use a FastPair data-structure is to initialize one and then update it with data points (via the += operator). In this first example, we create a sequence of $50 \times 10$ uniform random points and add them to a FastPair object:

points = [rand_tuple(10) for _ in range(50)]
# Create empty data-structure with `min_points=10` and
# using a Euclidean distance metric
fp = FastPair()
fp.build(points)  # Add points all at once and build conga line to start

You can then add additional points, and start to query the data-structure for the closest pair of points. As points are added, the data-structure responds and updates accordingly (see this paper for details):

fp += rand_tuple(10)
fp += rand_tuple(10)

# This is the 'FastPair' algorithm, should be fast for large n
fp.closest_pair()
# There is also a brute-force version, can be fast for smaller n
fp.closest_pair_brute_force()

FastPair has several useful properties and methods, including checking the size of the data-structure (i.e., how many points are currently stored), testing for containment of a given point, various methods for computing the closest pair, finding the neighbor of a given point, computing multiple distances at once, and even merging points (clusters):

len(fp)
rando = rand_tuple(10)
points[0] in fp  # True
rando in fp  # False
fp()  # Compute closest pair
neigh = fp.find_neighbor(rando)  # Neighbor of 'outside' point
fp.sdist(rando)  # Compute distances from rando to all points in fp

To illustrate the mergeing methods, here is a simple example of hierarchical clustering, treating points as the 'centroids' of various clusters:

for i in range(len(fp)-1):
    # First method... do it manually:
    dist, (a, b) = fp.closest_pair()
    c = interact(a, b)  # Compute mean centroid
    fp -= b
    fp -= a
    fp += c
    # Alternatively... do it all in one step:
    # fp.merge_closest()
len(fp)  # 1

Finally, plotting should be pretty obvious to those familiar with matplotlib (or other Python plotting facilities).

import matplotlib.pyplot as plt

points = [rand_tuple(2) for _ in range(50)]  # 2D points
fp = FastPair().build(points)
dist, (a, b) = fp.closest_pair()

plt.figure()
plt.scatter(*zip(*fp.points))
plt.scatter(*zip(a, b), color="red")
plt.title("Closest pair is {:.2} units apart.".format(dist))
plt.show()

License

Copyright © 2016, Carson J. Q. Farmer
Copyright © 2002-2015, David Eppstein
Licensed under the MIT License.

fastpair's People

Contributors

cjqf avatar rakesh4g avatar carsonfarmer avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.