GithubHelp home page GithubHelp logo

vioshyvo / mrpt Goto Github PK

View Code? Open in Web Editor NEW
253.0 13.0 46.0 7.51 MB

Fast and lightweight header-only C++ library (with Python bindings) for approximate nearest neighbor search

License: MIT License

Python 0.39% C++ 98.67% CMake 0.01% C 0.92% Dockerfile 0.01%
approximate-nearest-neighbor-search k-nn nearest-neighbor-search random-projection mrpt knn-search similarity-search

mrpt's Introduction

MRPT - fast nearest neighbor search with random projection

Fifty shades of green

Documentation

MRPT is a lightweight and easy-to-use library for approximate nearest neighbor search. It is written in C++11 and has Python bindings. The index building has an integrated hyperparameter tuning algorithm, so the only hyperparameter required to construct the index is the target recall level!

According to our experiments MRPT is one of the fastest libraries for approximate nearest neighbor search.

In the offline phase of the algorithm MRPT indexes the data with a collection of random projection trees. In the online phase the index structure allows us to answer queries in superior time. A detailed description of the algorithm with the time and space complexities, and the aforementioned comparisons can be found in our article that was published in IEEE International Conference on Big Data 2016.

The algorithm for automatic hyperparameter tuning is described in detail in our new article that will be presented in Pacific-Asia Conference on Knowledge Discovery and Data Mining 2019 (arxiv preprint).

Currently the Euclidean distance is supported as a distance metric.

The tests for MRPT are in a separate repo.

New

  • Release MRPT 1.1.1 : faster autotuning and bug fixes. (2018/12/07)

  • Release MRPT 1.1.0 : now autotuning works also without a separate set of test queries. (2018/11/24)

  • Release MRPT 1.0.0 (2018/11/22)

  • Add documentation for C++ API (2018/11/22)

  • Add index building with autotuning: no more manual hyperparameter tuning! (2018/11/21)

Python installation

C++ compiler is needed for building python wrapper.

On MacOS, LLVM is needed for compiling: brew install llvm libomp.

On Windows, you may use MSVC compiler.

Install the module with pip install git+https://github.com/vioshyvo/mrpt/

Docker

An example docker file is provided, which builds MRPT python wrapper in Linux environment.

docker build -t mrpt .
docker run --rm -it mrpt

Minimal examples

Python

This example first generates a 200-dimensional data set of 10000 points, and 100 test query points. The exact_search function can be used to find the indices of the true 10 nearest neighbors of the first test query.

The build_autotune_sample function then builds an index for approximate k-nn search; it uses automatic parameter tuning, so only the target recall level (90% in this example) and the number of neighbors searched for have to be specified.

import mrpt
import numpy as np

n, d, k = 10000, 200, 10
target_recall = 0.9

data = np.random.rand(n, d).astype(np.float32)
q = np.random.rand(d).astype(np.float32)

index = mrpt.MRPTIndex(data)
print(index.exact_search(q, k, return_distances=False))

index.build_autotune_sample(target_recall, k)
print(index.ann(q, return_distances=False))

The approximate nearest neighbors are then searched by the function ann; because the index was autotuned, no other arguments than the query point are required.

Here is a sample output:

[9738 5033 6520 2108 9216 9164  112 1442 1871 8020]
[9738 5033 6520 2108 9216 9164  112 1442 1871 6789]

C++

MRPT is a header-only library, so no compilation is required: just include the header cpp/Mrpt.h. The only dependency is the Eigen linear algebra library (Eigen 3.3.5 is bundled in cpp/lib), so when using g++, the following minimal example can be compiled for example as:

g++ -std=c++11 -Ofast -march=native -Icpp -Icpp/lib ex1.cpp -o ex1 -fopenmp -lgomp

Let's first generate a 200-dimensional data set of 10000 points, and a query point (row = dimension, column = data point). Then Mrpt::exact_knn can be used to find the indices of the true 10 nearest neighbors of the test query.

The grow_autotune function builds an index for approximate k-nn search; it uses automatic parameter tuning, so only the target recall level (90% in this example), and the number of neighbors searched for have to be specified. This version automatically samples a test set of 100 query points from the data set to tune the parameters, so no separate test set is required.

#include <iostream>
#include "Eigen/Dense"
#include "Mrpt.h"

int main() {
  int n = 10000, d = 200, k = 10;
  double target_recall = 0.9;
  Eigen::MatrixXf X = Eigen::MatrixXf::Random(d, n);
  Eigen::MatrixXf q = Eigen::VectorXf::Random(d);

  Eigen::VectorXi indices(k), indices_exact(k);

  Mrpt::exact_knn(q, X, k, indices_exact.data());
  std::cout << indices_exact.transpose() << std::endl;

  Mrpt mrpt(X);
  mrpt.grow_autotune(target_recall, k);

  mrpt.query(q, indices.data());
  std::cout << indices.transpose() << std::endl;
}

The approximate nearest neighbors are then searched by the function query; because the index was autotuned, no other arguments than a query point and an output buffer for indices are required.

Here is a sample output:

8108 1465 6963 2165   83 5900  662 8112 3592 5505
8108 1465 6963 2165   83 5900 8112 3592 5505 7992

The approximate nearest neighbor search found 9 of 10 true nearest neighbors; so this time the observed recall happened to match the expected recall exactly (results vary between the runs because the algorithm is randomized).

Citation

Automatic hyperparameter tuning:

@inproceedings{Jaasaari2019,
  title={Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search},
  author={J{\"a}{\"a}saari, Elias and Hyv{\"o}nen, Ville and Roos, Teemu},
  booktitle={Pacific-Asia Conference on Knowledge Discovery and Data Mining},
  pages={In press},
  year={2019},
  organization={Springer}
}

MRPT algorithm:

@inproceedings{Hyvonen2016,
  title={Fast nearest neighbor search through sparse random projections and voting},
  author={Hyv{\"o}nen, Ville and Pitk{\"a}nen, Teemu and Tasoulis, Sotiris and J{\"a}{\"a}saari, Elias and Tuomainen, Risto and Wang, Liang and Corander, Jukka and Roos, Teemu},
  booktitle={Big Data (Big Data), 2016 IEEE International Conference on},
  pages={881--888},
  year={2016},
  organization={IEEE}
}

MRPT for other languages

License

MRPT is available under the MIT License (see LICENSE.txt). Note that third-party libraries in the cpp/lib folder may be distributed under other open source licenses. The Eigen library is licensed under the MPL2.

mrpt's People

Contributors

ejaasaari avatar fralik avatar kylemcdonald avatar rjpower avatar teemupitkanen avatar vioshyvo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mrpt's Issues

Python installation failed on windows

I'm on windows 10 and trying to install mrpt for python (3.5).
pip install git+https://github.com/teemupitkanen/mrpt/ returns

c:\python35\include\pyconfig.h(68): fatal error C1083: Cannot open include file: 'io.h': No such file or directory
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\cl.exe' failed with exit status 2

Tried to launch C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat as it said here. It added the file io.h file in the folder C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\include that i added to the system path. But i still have the same error, when lauching pip install ...

I also tried to launch the pip install ... from the Visual c++ 2015 x86 x64 Cross Build Tools Command Prompt but now, i have another error

cpp/mrptmodule.cpp(14): fatal error C1083: Cannot open include file: 'sys/mman.h': No such file or directory
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe' failed with exit status 2

Don't know what to do. Any help

unable to install on macos after brew install llvm

after brew install llvm worked fine, pip install git+https://github.com/teemupitkanen/mrpt/ meet some problems...hope for your help

  Running setup.py install for mrpt ... error
    Complete output from command /Users/tianyizhuang/anaconda/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/66/rk12_30x5dvbpg0pk_glwg2c0000gn/T/pip-zmys9ilr-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /var/folders/66/rk12_30x5dvbpg0pk_glwg2c0000gn/T/pip-7__rozak-record/install-record.txt --single-version-externally-managed --compile:
    WARNING: '.' not a valid package name; please use only .-separated package names in setup.py
    running install
    running build
    running build_py
    package init file '__init__.py' not found (or not a regular file)
    creating build
    creating build/lib.macosx-10.7-x86_64-3.6
    copying setup.py -> build/lib.macosx-10.7-x86_64-3.6
    copying mrpt.py -> build/lib.macosx-10.7-x86_64-3.6
    copying demo.py -> build/lib.macosx-10.7-x86_64-3.6
    running build_ext
    building 'mrptlib' extension
    creating build/temp.macosx-10.7-x86_64-3.6
    creating build/temp.macosx-10.7-x86_64-3.6/cpp
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/tianyizhuang/anaconda/include -arch x86_64 -I/Users/tianyizhuang/anaconda/include -arch x86_64 -Icpp/lib -I/Users/tianyizhuang/anaconda/lib/python3.6/site-packages/numpy/core/include -I/Users/tianyizhuang/anaconda/include/python3.6m -c cpp/mrptmodule.cpp -o build/temp.macosx-10.7-x86_64-3.6/cpp/mrptmodule.o -std=c++11 -O3 -ffast-math -s -fno-rtti -fopenmp -DNDEBUG -march=native
    clang: error: unsupported option '-fopenmp'
    error: command 'gcc' failed with exit status 1
    
    ----------------------------------------
Command "/Users/tianyizhuang/anaconda/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/66/rk12_30x5dvbpg0pk_glwg2c0000gn/T/pip-zmys9ilr-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /var/folders/66/rk12_30x5dvbpg0pk_glwg2c0000gn/T/pip-7__rozak-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/66/rk12_30x5dvbpg0pk_glwg2c0000gn/T/pip-zmys9ilr-build/

DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.

Good afternoon,
I would like your help to resolve a problem regarding the mrpt installation.
When importing the libraries below, I received information that mrpt was not loaded.
"""
import pandas as pd
import numpy as np
import vectordb
import requests
import re
from langchain.text_splitter import MarkdownHeaderTextSplitter
"""

Warning: mprt could not be imported. Install with 'pip install git+https://github.com/vioshyvo/mrpt/'. Falling back to Faiss.

When running pip or conda install, I received the following error:

(base) PS C:\Users\Marcos> pip install git+https://github.com/vioshyvo/mrpt/
Collecting git+https://github.com/vioshyvo/mrpt/
Cloning https://github.com/vioshyvo/mrpt/ to c:\users\marcos\appdata\local\temp\pip-req-build-vi6wzb7j
Running command git clone --filter=blob:none --quiet https://github.com/vioshyvo/mrpt/ 'C:\Users\Marcos\AppData\Local\Temp\pip-req-build-vi6wzb7j'
Resolved https://github.com/vioshyvo/mrpt/ to commit 88cc6f4
Preparing metadata (setup.py) ... done
Requirement already satisfied: numpy>=1.10.0 in d:\anaconda3\lib\site-packages (from mrpt==1.0) (1.26.3)
Building wheels for collected packages: mrpt
Building wheel for mrpt (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [20 lines of output]
D:\Anaconda3\Lib\site-packages\setuptools_init_.py:80: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
!!
********************************************************************************
Requirements should be satisfied by a PEP 517 installer.
If you are using pip, you can try pip install --use-pep517.
********************************************************************************
!!
dist.fetch_build_eggs(dist.setup_requires)
WARNING: '.' not a valid package name; please use only .-separated package names in setup.py
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-311
copying mrpt.py -> build\lib.win-amd64-cpython-311
running build_ext
building 'mrptlib' extension
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for mrpt
Running setup.py clean for mrpt
Failed to build mrpt
ERROR: Could not build wheels for mrpt, which is required to install pyproject.toml-based projects

=======================================================================
I installed 6 gb of vscode 2022 also without success.

I hope your help

pip install setuptools
Requirement already satisfied: setuptools in d:\anaconda3\lib\site-packages (68.2.2)

pip install wheel
Requirement already satisfied: wheel in d:\anaconda3\lib\site-packages (0.41.2)

=====================================================================

I have tried to install another conda ambiente without success:

conda create -n ragteste python=3.11.5 setuptools==68.2.2

same error

Problem with non-autotuned indices

First of all, I would like to thank the authors of this library for their great work! In my tests mrpt is faster than FLANN and many other libraries.
I have a question regarding non-autotuned indices. I am using mrpt in my program Regard3D (a tool for structure from motion) for finding matches of LIOP image descriptors (144 floats).
First I tried with creating autotuned indices, which works perfect. However, creating the indices is slower, so I tried to use the parameters found in autotuning, and to create non-autotuned indices. And for some examples this works fine, while for others I receive sometimes -1 in the query result.
I should mention that Regard3D always searches 2 nearest neighbours, to determine "good" from "bad" matches. A "good" match is one where the second-best distance is x times larger than the best one, x being configurable by the user.
In my tests, I used the parameters found in autotuning for one image (set of keypoint descriptors) for many different images and different amount of samples. Is this allowed by the library? How should one go about creating a non-autotuned index and then successfully create indices for other samples and then query 2 nearest neighbours?
Of course the obvious work-around is to discard the results with -1, but is there a better way?

Out-of-bounds access with certain parameter combinations

Hi, I am trying to use MRPT in a structure-from-motion context, with LIOP keypoint descriptors (144 floats). With most parameters MRPT works very well, but with some it crashes with out-of-bounds access in Eigen Vectors.
One example is with n_trees=5, depth=13, sparsity=0.088 (taken from the mrpt-comparison for sift), where MRPT crashes with the call stack:

>	Regard3D.exe!Eigen::DenseCoeffsBase<Eigen::Matrix<int,-1,1,0,-1,1>,1>::operator()(__int64 index) Line 425	C++
 	Regard3D.exe!Mrpt::grow_subtree(const Eigen::Matrix<int,-1,1,0,-1,1> & indices, int tree_level, int i, int n_tree, const Eigen::Matrix<float,-1,-1,0,-1,-1> & tree_projections) Line 324	C++
 	Regard3D.exe!Mrpt::grow_subtree(const Eigen::Matrix<int,-1,1,0,-1,1> & indices, int tree_level, int i, int n_tree, const Eigen::Matrix<float,-1,-1,0,-1,-1> & tree_projections) Line 340	C++
...
the same for several levels of recursion
...

 	Regard3D.exe!Mrpt::grow_subtree(const Eigen::Matrix<int,-1,1,0,-1,1> & indices, int tree_level, int i, int n_tree, const Eigen::Matrix<float,-1,-1,0,-1,-1> & tree_projections) Line 339	C++
 	Regard3D.exe!Mrpt::grow$omp$1() Line 68	C++

The VectorXf is of size 1, and MRPT tries to access it with index 1.

I am using Visual Studio 2015 on Windows 7, compiling for x64.

My question: Is this a bug, or are certain combinations of the parameters not allowed? If it is the latter, which combinations are allowed and which are not?

ImportError: No module named mrpt

Hi, I followed the instruction of the instruction to get the module file. But when I tried to run the demo, I still got the import error: No module named mrpt.

Could you help me figure it out?

Thanks very much.

Cannot load index from file

Hello!

As written in the docs it should be possible to load a saved index via the constructor:

def __init__(self, data, shape=None, mmap=False)
param data: Input data either as a NxDim numpy ndarray or as a filepath to a binary file containing the data.
param shape: Shape of the data as a tuple (N, dim). Needs to be specified only if loading the data from a file.

The following test script crashes (using python 3.7 and Windows 10) without further notice:

import mrpt, numpy as np
data = np.random.rand(10000,512).astype(np.float32)
index = mrpt.MRPTIndex(data)
index.build_autotune_sample(0.9, 5)
index.save('testindex.sav')
index2 = mrpt.MRPTIndex('testindex.sav',(10000,512))

Am I doing sth. wrong or is there an error within the c++ code?

Unit Tests for Eigen version changes

Hi !

I am wondering if you have a suite of unit tests for this library? Specifically, I am working with a codebase that uses different version of Eigen, and I would like to ensure that all functions continue to work as expected - and also to ensure compatibility as Eigen continues to change.

Dynamic index update

Hi!

Do I understand correctly that MRPT does not currently support dynamic index update / rebalancing?

compiling on clang / osx

to compile on osx i had to:

$ brew install gcc
$ CC=/usr/local/bin/gcc-6 CXX=/usr/local/bin/g++-6 pip install git+https://github.com/teemupitkanen/mrpt/

when i tried to install without gcc, first i had the problem of: WARNING: '.' not a valid package name; please use only .-separated package names in setup.py (strange).

i changed the setup.py import to add from setuptools import setup, find_packages and changed packages value to packages=find_packages(),.

next clang: error: unsupported option '-fopenmp' so i removed the -fopenmp flag.

next:

cpp/Mrpt.h:139:27: error: variable length array of non-POD element type 'Gap'
            Gap found_branches[n_trees * depth];

that's when i switched to gcc.

Auto tuned index always gives me 100% recall on 1st NN

I'm doing some research for my phd thesis with various ANN algorithms in the context of augmented reality from C++ using the latest release.

Using a sub-set of the SIFT1M dataset of around 200 000 elements, no matter the target recall I pass to the grow_autotune (tried 0.9, 0.7, 0.5, 0.2) the recall of the 1st NN is always 100% and thus quite slow to query (2nd NN recall do seem to go down).

Is this behavior expected depending on the data set?
Should I try setting the max_trees variable before running the auto tune?

Thx in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.