GithubHelp home page GithubHelp logo

pombredanne / minhashnearestneighbors Goto Github PK

View Code? Open in Web Editor NEW

This project forked from joachimwolff/sparse-neighbors-search

0.0 1.0 0.0 16.47 MB

A python/c++ implementation of an approximate nearest neighbor search for sparse data structures based on the idea of local sensitiv hashfunctions.

License: MIT License

Python 47.31% C++ 36.48% Cuda 15.33% C 0.88%

minhashnearestneighbors's Introduction

Approximate k-nearest neighbors search on sparse datasets

With MinHash and WTA-Hash from the bioinf-learn package it is possible to search the approximate k-nearest neighbors within a sparse data structure. It works best for very high dimensional and very sparse datasets, e.g. one million dimensions and 400 non-zero feature ids on average.

To use it:

from sparse_neighbors_search import MinHash
minHash = MinHash()
minHash.fit(X)
minHash.kneighbors(return_distance=False)

Disclaimer

With the update to version 0.3 'Charlie Brown' which needs support for SSE4.1 by your operating system and cpu macOS is no longer supported. Feel free to use it and or to get it run on this platform but I cannot test it there and probably it will not run.

Features

  • Efficient approximate k-nearest neighbors search
  • works only on sparse datasets

Installation

Install sparse_neighbors_search by running:

python setup.py install

In a user only context:

python setup.py install --user

On MAC OS X the default compiler doesn't support openmp, it is deactivated by default. If you want to compile with openmp support, add the flag "--openmp":

python setup.py install --user --openmp

On a Linux system openmp is default. If you don't want to use it set:

python setup.py install --user --noopenmp

GPU support is provided with Nvidias CUDA. If the setup detects a CUDA installation it is using it. If you want to force an installation without CUDA add the parameter: --nocuda

Instead of cloning the repository via git clone and than running the installation, you can do it in one step with pip:

pip install git+https://github.com/joachimwolff/minHashNearestNeighbors.git

The installation requires g++ and the C++11 libs, numpy, scikit-learn, cython and scipy. For the GPU support the CUDA framework with nvcc. The software was tested on Ubuntu 14.04 with g++ 4.8, CUDA 7.5, numpy 1.10.1, scikit-learn 0.17, Cython 0.23.4 and scipy 0.16.1.

Uninstall

To delete sparse-neighbors-search run the following command:

pip uninstall sparse-neighbors-search

If you have run the uninstall command and want to make sure everything is gone, look at your python installation directory. If you have used the --user flag the path in Ubuntu 14.04 is:

~/.local/lib/python2.7/site-packages

Contribute

Support

If you are having issues, please let me know. Mail address: wolffj[at]informatik[dot]uni-freiburg[dot]de

minhashnearestneighbors's People

Contributors

joachimwolff avatar usm441 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.