GithubHelp home page GithubHelp logo

manzilzaheer / covertree Goto Github PK

View Code? Open in Web Editor NEW
90.0 9.0 19.0 1.88 MB

Cover Tree implementation in C++ for k-Nearest Neighbours and range search

License: Apache License 2.0

Makefile 0.18% Python 0.13% CMake 0.01% C++ 98.76% C 0.92%

covertree's Issues

incorrect computation of nearest neighbor

Hello, I am finding that the nearestNeighbour queries are not giving the correct result.

In the attached test case, i have three 2D points in the cover-tree, and for two test points, I do get the correct kNearestneighbours (k=3) with correct distances. But the nearest neighbor is incorrect.

I will try to look into the code in detail to figure out the cause. But if you have any ideas as to why this could be happening, I would appreciate the fix for this.

The attached program produces the following output.

Number of OpenMP threads: 1
 adding 3 points to cover tree!
Entered case 1: 3.31402 1 0
Requesting global lock!
 testing for nearest neighbors!

 query point 0 : (-0.944485 0.116473)
 doing 3-nearest neighbors using direct computation!
 	 nearest 0 : 1, 1.438655
 	 nearest 1 : 2, 1.651407
 	 nearest 2 : 0, 2.003489
 doing 3-nearest neighbors using cover_tree!
	 cover_tree 0 : 2, 1.438655
	 cover_tree 1 : 0, 1.651407
	 cover_tree 2 : 0, 2.003489
 nearest	 :: direct = (1, 1.438655)	 cover_tree = (0, 1.651407)
---------------------------------------------------------------------> mismatch

 query point 1 : (-0.931471 0.781848)
 doing 3-nearest neighbors using direct computation!
 	 nearest 0 : 1, 1.537557
 	 nearest 1 : 2, 2.114969
 	 nearest 2 : 0, 2.256513
 doing 3-nearest neighbors using cover_tree!
	 cover_tree 0 : 2, 1.537557
	 cover_tree 1 : 0, 2.114969
	 cover_tree 2 : 0, 2.256513
 nearest	 :: direct = (1, 1.537557)	 cover_tree = (0, 2.114969)
---------------------------------------------------------------------> mismatch

main.cpp.txt

how to check the result from covertree

Hi,
I try to use this demo to have a test, and I run like this:

dist/cover_tree data/train_100d_1000k_1000.dat data/test_100d_1000k_10.dat
data/train_100d_1000k_1000.dat
data/test_100d_1000k_10.dat
Number of OpenMP threads: 8
Number of points: 1000000
Number of dims : 100
56.5687 85.2284 -26.9832
Build time: 62972
Number of OpenMP threads: 8
Number of points: 10000
Number of dims : 100
99.7609 -40.0263 99.2302
Querying serially
Quering parallely
k-NN serially
range serially
Query time: 4394231
sh: 1: pause: not found

How to check the output of the covertree? e.g. indices distances

Thank you.

python wrapper ImportError: undefined symbol: _ZTINSt6thread6_StateE

I cloned the repository and installed the python module using setup.py.

When I tried to import the library in a script using

import covertree

I get the following

File "../knn/agents.py", line 3, in
import covertree
File "/home/joe/anaconda3/envs/knnenv/lib/python2.7/site-packages/covertree/init.py", line 1, in
from covertree import CoverTree
File "/home/joe/anaconda3/envs/knnenv/lib/python2.7/site-packages/covertree/covertree.py", line 1, in
import covertreec
ImportError: /home/joe/anaconda3/envs/knnenv/lib/python2.7/site-packages/covertreec.so: undefined symbol: _ZTINSt6thread6_StateE

Python API segfaults on small datasets

I'm getting a segfault when using the Python API (very grateful for that addition by the way =D), am I using it correctly? The API reference PDF doesn't list the constructor. I'm using gcc/g++7.1.0 and Python 2.7, see the gdb trace below.

>>> ct = CoverTree.from_matrix(document_embeddings)
Faster Cover Tree with base 1.3
Max distance: 65.5565
Scale chosen: 18
100% [==================================================]
[New Thread 0x7ffff13fe700 (LWP 11853)]
[New Thread 0x7fffe8bfd700 (LWP 11854)]
Duplicate entry!!!

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe8bfd700 (LWP 11854)]
run (src=..., dst=...) at lib/Eigen/src/Core/Assign.h:410
410 dst.template copyPacket<Derived2, dstAlignment, srcAlignment>(index, src);

Question abount setting maxdistUB when insert

I found that in bool CoverTree::insert(const pointType& p),
several statement about setting the maxdistUB have been commented. And I wonder how does it work without setting the maxdistUB properly.

allow different distance metrics and point classes

you hardcode euclidean distance and Eigen::VectorXd points. i would prefer to have the options of

  1. using points that can be owned, or slices of (sparse or dense) matrices.
  2. using other distances.

i.e. iโ€™d like to have

template<Point>
class CoverTree {
   ...
}

with Point needing to specify distance and operator==, e.g.:

template<class Distance, class Vector>
class IndexedPoint {
private:
	Vector _vec;
	size_t _idx;
public:
	IndexedPoint(Vector v, size_t i) : _vec(v), _idx(i) {}
	const Vector& vec() const { return this->_vec; }
	size_t               idx() const { return this->_idx; }
	bool operator==(const IndexedPoint<Distance>& p) {
		return is_true(all(this->_vec == p.vec()));
	};
	double distance(const IndexedPoint<Distance>& p) const {
		return Distance::distance(*this, p);
	};
};

class CosineDistance {
public:
	static double distance(const IndexedPoint<CosineDistance>& p1, const IndexedPoint<CosineDistance>& p2) {
		return 1 - cor(p1.vec(), p2.vec());
	}
};

class EuclideanDistance { ... }

then i can do:

CoverTree<IndexedPoint<EuclideanDistance>> ct;

Query time for large K values

Hey, can you share the analysis for query time as value of K increases. We tried K=200 but it's taking
long time to query. Test points are 21 million.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.