GithubHelp home page GithubHelp logo

enayatullah / covertree Goto Github PK

View Code? Open in Web Editor NEW

This project forked from manzilzaheer/covertree

0.0 2.0 0.0 1.28 MB

Cover Tree implementation in C++ for k-Nearest Neighbours and range search

License: Apache License 2.0

Makefile 0.11% Python 0.15% CMake 0.22% C++ 98.15% C 1.36% Shell 0.01%

covertree's Introduction

Cover Trees

We present a distributed and parallel extension and implementation of Cover Tree data structure for nearest neighbour search. The data structure was originally presented in and improved in:

  1. Alina Beygelzimer, Sham Kakade, and John Langford. "Cover trees for nearest neighbor." Proceedings of the 23rd international conference on Machine learning. ACM, 2006.
  2. Mike Izbicki and Christian Shelton. "Faster cover trees." Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 2015.

Under active development

New: Python wrappers added

Just use python setup.py install and then in python you can import covertree. The python API details are provided in API.pdf. If you do not have root priveledges, install with python setup.py install --user and make sure to have the folder in path.

Organisation

  1. All codes are under src within respective folder
  2. Dependencies are provided under lib folder
  3. For running cover tree an example script is provided under scripts
  4. data is a placeholder folder where to put the data
  5. build and dist folder will be created to hold the executables

Requirements

  1. gcc >= 5.0 or Intel® C++ Compiler 2017 for using C++14 features

How to use

We will show how to run our Cover Tree on a single machine using synthetic dataset

  1. First of all compile by hitting make

      make
  2. Generate synthetic dataset

      python data/generateData.py
  3. Run Cover Tree

       dist/cover_tree data/train_100d_1000k_1000.dat data/test_100d_1000k_10.dat

The make file has some useful features:

  • if you have Intel® C++ Compiler, then you can instead

      make intel
  • or if you want to use Intel® C++ Compiler's cross-file optimization (ipo), then hit

      make inteltogether
  • Also you can selectively compile individual modules by specifying

      make <module-name>
  • or clean individually by

      make clean-<module-name>

Performance

Based on our evaluation the implementation is easily scalable and efficient. For example on Amazon EC2 c4.8xlarge, we could insert more than 1 million vectors of 1000 dimensions in Euclidean space with L2 norm under 250 seconds. During query time we can process > 300 queries per second per core.

Troubleshooting

If the build fails and throws error like "instruction not found", then most probably the system does not support AVX2 instruction sets. To solve this issue, in setup.py and src/cover_tree/makefile please change march=core-avx2to march=corei7.

covertree's People

Contributors

manzilzaheer avatar

Watchers

James Cloos avatar Enayat Ullah avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.