dataplayer12 / fly-lsh Goto Github PK
View Code? Open in Web Editor NEWAn implementation of efficient LSH inspired by fruit fly brain
License: MIT License
An implementation of efficient LSH inspired by fruit fly brain
License: MIT License
Why is there a blog
directory with duplicated files? Please consolidate the files and remove the duplicates.
elif self.name=='CIFAR10':
for batch_num in [1,2,3,4,5]:
filename=self.name+'/train_batch_'+str(batch_num)+'.p'
with open(filename,mode='rb') as f:
features,labels=pickle.load(f)
for begin in range(0,len(features),batch_size):
end=min(begin+batch_size,len(features))
yield features[begin:end],labels[begin:end]
这段代码是什么意思,怎能对下载的cifar文件做处理
Hi!
What would be the license for you code?
I stumbled on your excellent article at https://medium.com/@jaiyamsharma/efficient-nearest-neighbors-inspired-by-the-fruit-fly-brain-6ef8fed416ee
Thanks!
Hello author! I found that the code you gave just used the distance between images as labels, where findmap does not meet the requirements of image retrieval map. Moreover, the data set cifar10 was used for testing, and the hash value was extracted and saved as CSV file. According to the general map calculation method, the result retrieval accuracy was only 10%.
实现的时候为什么 没有达到论文的预期效果 (china)
Dear author! I am trying to use DenseFly to find k nearest neighbor to construct a nearest neighbor network. May I ask if there is any suggestion from you to construct this nearest neighbor network efficiently? Or I just iteratively use the query function in the LSH class? Thank you so much!
According to the definition of WTAHash in the paper, FlyHash should make the maximum become 1 in every block of length k, and make the other values 0 in the block. Finally FlyHash should produce a binary vector of length mk, where m values are 1.
But in the implementation of flylsh, I found the code actually do it in the opposite way: the block's length is m (hash_length) because the code find top hash_length elements in each row of the activation.
Line 446 in 23f8975
Do I misunderstand the problem? If I am right, it is still not a very severe problem, you may fix it just by swap the parameters' names.
Dear author, I tried to apply DenseFly algorithm on a large scale and a very sparse dataset which has 21612 rows x 28065 columns and the density is about 0.017 which is very sparse. My purpose is to hash the row and query their nearest neighbors efficiently. I notice that the DenseFly would firstly project the data into a high dimensional space (self.hash
) and then reduce the hash dimension into a low dimensional space (self.lowd_hashes
). Does that mean I have to set the embedding_size
variable larger than 28065? (Or reduce the column dimension smaller than embedding_size
)
Meanwhile, as far as I know, the precondition of firstly projecting the data into high dimensional space is that the data matrix is density. So that in high dimensional space the elements would be more obvious in a sparse data matrix. Does that mean the DenseFly is not appropriate for processing the sparse dataset?
Thank you very much.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.