karanlyons / murmurhash3.js Goto Github PK
View Code? Open in Web Editor NEWMurmurHash3, in JavaScript.
License: MIT License
MurmurHash3, in JavaScript.
License: MIT License
๐ Hi there!
First of all, thank you for your work! I've been successfully using this library in production for a couple of years and it's been very useful.
Recently we've started using MurmurHash3 on other platforms - we need the results to match and noticed discrepancies between the output of the JS version and the other platforms when the input had characters that were not regular ASCII (i.e. charCodeAt
is not between 0 and 127).
This is because in some places the code does key.charCodeAt(i) & 0xff
and in other places just key.charCodeAt(i)
. The byte representation for regular ASCII characters is identical with the character code so for e.g. alphanumeric input this doesn't matter. If the input characters are outside this range, the results start to diverge with the reference implementation.
All the three variants have this problem. For example, here's the output for the x86 32bit version:
Input | C++ result (reference) | murmurhash3js result |
---|---|---|
'My hovercraft is full of eels.' | 2953494853 | 2953494853 |
'My ๐ is full of ๐ฆ.' | 1818098979 | 2899624186 |
'ๅ ๆ ้ซ ็ ง' | 3435142074 | 4163339522 |
The string was utf-8 encoded before being passed in to the C++ reference as that expects bytes. I think it's fair to expect people using other implementations that ask for bytes will do this.
I decided to change the signature of the function to make it expect bytes. I checked my implementation along with a few others against the reference C++ implementation. You can read more about it and try out an interactive version of the comparison here.
Since I needed a quick release and the new signature is a major/breaking change compared to this implementation, I published my own version of the library as murmurhash3js-revisited
. I tried to keep all attribution, but if you have any concerns please let me know!
This issue was copied from pid#3 - I was using that version of murmurhash but since it was forked from this one, looking at the code this seems to have the same problem.
Cheers!
hi,
thanks for the implementation of murmurHash3 ๐
I restructured (https://github.com/pid/murmurHash3js) your sources to get ready to publish the module to the npm registry...
I would also look after to add it to the bower registry and publish it to CDN hosting service (ie. http://cdnjs.com/)...
the question here, do you want to manage it by yourself or should I do this tasks?
Note: I want this in a project and for comfort it should be in the npm registry :-)
Hi @karanlyons
I am not getting the hash128 of the this library to match the python mmh3.
Python mmh3:
hex(mmh3.hash128("I will not buy this tobacconist's, it is scratched.")))
Yields: 0x67d73523f0079673d30654abbd8227e3
But in your readme:
murmurHash3.x64.hash128("I will not buy this tobacconist's, it is scratched.");
Yields: d30654abbd8227e367d73523f0079673
Why is there a mismatch?
In the JavaScript function x86.hash32, the switch(remainder) statement at line 237 modifies k1 in each case. However, for case 3 and 2, h does not appear to be modified by k1 before exiting the function (in case 1, it is clearly modified).
What am I missing?
Is there an es5 pre-built .js file that I can use? (vis jsdeliver CDN that just 'copies' github/npm').
I don't want to have to build es5, I just want to use it.
what is eta of npm? (it could be just name + 'karan' prefix)
ASCII only? other implementations are explicit that they are ascii only. Is this utf-8?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.