GithubHelp home page GithubHelp logo

Comments (10)

spkrka avatar spkrka commented on July 25, 2024

There is some description of it in the readme, but I agree, it could be explained better and in more detail. I'll try to do that if I find the time.

from sparkey.

liangdong avatar liangdong commented on July 25, 2024

aha, I guess I have got the idea how the hash algorithem works, I am right? thank you~

1, To insert a pair of k/v
2, calculate the hash(key)
3, calculate the hash(key) % capacity(which is a linear-hashtable with num_of_entry * 1.3 slots) to put the key down
4, if the slot is unused, put it there.
5, or we must step one by one slot to find another empty slots.
6, as README.md says: "As soon as we reach a slot with a smaller displacement than our own, we shift the following slots up until the first empty slot one step and insert our own element."

if we find the stepped slot 's displacement is smaller than out own, we store this slot's hash and address first, put our k/v into this slot, and continue step to the next slot with the new hash and address.

according to the algorithm, we can abort step if we find the stepped slot's displacement is smaller than us, because we can't be after the position anymore.

I guess what you want is to reduce the displacement of any keys, to be balanced. Am I right?

from sparkey.

spkrka avatar spkrka commented on July 25, 2024

Yes, that's correct. The reordering of hash entries is to balance the displacements, by minimizing the maximum. This does nothing to improve the average lookup, but the worst case lookup will be better. And by knowing the maximum displacement for all of the hash table, we know when to abort on a key-miss.

from sparkey.

liangdong avatar liangdong commented on July 25, 2024

thank you very very much, you are so enthusiasm ^ ^. and the algorithm is so ingenious too ~

from sparkey.

rohansingh avatar rohansingh commented on July 25, 2024

Renamed to better reflect actual issue.

from sparkey.

nresare avatar nresare commented on July 25, 2024

Given that persistent data will be read in block sized (something like 512 bytes) chunks, the likelihood of a displaced hash ending up resulting in more than 1 read request is very small. I think that hash entry reordering might have a complexity cost that is higher than the performance benefit but now that we have it we might as well keep it :)

from sparkey.

spkrka avatar spkrka commented on July 25, 2024

It's actually the same complexity cost compared to just adding at the nearest free slot. There are some extra memory writes, but it's private memory that should be in cache already, so that should not be a big cost.

The average displacement (which you can get by running "sparkey info" on a file) is really low since we have a 30% extra hash capacity, i thInk it evaluates to slightly less than 2. I've seen maximum displacements up at around 50 slots, which would mean 800 bytes away - but then again, that's the extreme worst case.

from sparkey.

liangdong avatar liangdong commented on July 25, 2024

Thanks again for sharing, it helps a lot.

发件人: Kristofer Karlsson [mailto:[email protected]]
发送时间: 2013年9月3日 21:26
收件人: spotify/sparkey
抄送: Liang,Dong(Client-RD)
主题: Re: [sparkey] Improve hash algorithm description in README (#4)

It's actually the same complexity cost compared to just adding at the nearest free slot. There are some extra memory writes, but it's private memory that should be in cache already, so that should not be a big cost.

The average displacement (which you can get by running "sparkey info" on a file) is really low since we have a 30% extra hash capacity, i thInk it evaluates to slightly less than 2. I've seen maximum displacements up at around 50 slots, which would mean 800 bytes away - but then again, that's the extreme worst case.


Reply to this email directly or view it on GitHubhttps://github.com//issues/4#issuecomment-23711974.

from sparkey.

spkrka avatar spkrka commented on July 25, 2024

I added a more visual example of the hash algorithm now - if you think it helps explain it I could close the issue.

from sparkey.

liangdong avatar liangdong commented on July 25, 2024

Hi, I have read through the example, your description is easily understood, I think this may help other people who are intersted in your project like me, thanks again for you enthusiastic reply :)

from sparkey.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.