Comments (13)
Another resource: FLANN http://www.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN has Python bindings.
They don't mention scalability, but considering it's a recent SW specialized for approx k-NN in high dim spaces, this ought to be as good as it gets.
from gensim.
More resources re. approx sim search:
from gensim.
Just curious considering http://radimrehurek.com/2013/12/performance-shootout-of-nearest-neighbours-contestants/#survivors if you feel like there is a best library to integrate now if this was to be done (thinking of tackling this issue)
from gensim.
Hmm, I wonder if it makes sense to integrate some algo fully (as in, implement Annoy directly in Python/C/Cython). Not super difficult, but not trivial either.
Another option is to rely on Annoy as a 3rd party lib, no deep integration, and just make it easier to use the Annoy API from gensim (or vice versa). I remember Annoy was picky about the type of input it accepts etc, the API was a bit unintuitive, so working around that would be a plus. Plus, reliance on C++ and Boost can make Annoy hard to install for many users.
Tackling this 4-year-old issue will be welcome :)
from gensim.
Yes I see how annoy is a bit hard to implement. I've also found https://github.com/ryanrhymes/panns which maybe could be a better fit. I'm going to get annoy installed and try and do some comparisons. Alternatively the google correlate algorithm doesn't seem that complicated to implement so that could be promising as well.
from gensim.
I don't think the Annoy algo is that hard to implement. It's pretty straightforward IIRC.
I mean, Erik's C++ implementation is involved, because it's heavily optimized, goes for memory-mapping etc etc. But the algo itself is clean.
Either way, let me know how you progress. Would be great to finally have something efficient in gensim :)
from gensim.
This would be incredibly useful! Is there any update on approximate sim. search in gensim (i.e. is anyone working on it)?
from gensim.
The only update is, @erikbern (author of Annoy) left Spotify... but he still works on Annoy, somehow :)
On the other hand, Annoy has shed its dependency on Boost + got several cleanups, fixes and improvements recently, so it's become much more viable as a 3rd party lib.
I think I'd prefer to keep the brute force exact kNN in gensim (for small problems, <1M items) and integrate cleanly with Annoy's approximate kNN for larger datasets.
@jodaiber or do you have other ideas?
from gensim.
I'm all for integrating Annoy. Obv I'm biased though :).
I'm currently running some benchmarks that could be relevant: https://github.com/erikbern/ann-benchmarks
from gensim.
Annoy has been integrated in https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/annoytutorial.ipynb
from gensim.
nice!
from gensim.
@tmylk can we change the tutorial to use a more meaningful dataset?
How about the GoogleNews word2vec model (3,000,000 x 300 matrix)? Lots of people use that.
from gensim.
I agree that it's a more illustrative example to show benefits of Annoy. It would look great in a blog post. For the tutorial we chose something that easily runs on a laptop.
from gensim.
Related Issues (20)
- EnsembleLDA with pyLDAvis visualisation
- library stubs are missing HOT 1
- Installation Error: Failed building wheel for gensim HOT 6
- Support for python3.12 HOT 2
- It fails to convert non-ascii characters in Turkish wikipedia dump. HOT 1
- Doc2Vec on Wikipedia articles HOT 1
- SyntaxError: future feature annotations is not defined HOT 4
- How can we fix this issue when i use python 3.6? HOT 2
- Please do not hardcode `libc++` HOT 4
- Where are pre-trained doc2vec model w/ recent version of Gensim
- bug about remove_markup HOT 2
- Where are pre-trained doc2vec model w/ recent version of Gensim? HOT 1
- Out-of-Period Terms in LdaSeqModel
- Gensim broken with SciPy 1.13.0 HOT 13
- The triu function is now removed from scipy module HOT 1
- Wrong parameter information in `gensim.models.keyedvectors.KeyedVectors.save()` docstring.
- Bug on Import gensim HOT 1
- import gensim error. Uses triu function from scipy.linalg which is deprecated HOT 7
- scipy probably not needed in [build-system.requires] table HOT 2
- Not able to install either by pip install or downloading source code HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gensim.