GithubHelp home page GithubHelp logo

tufo830 / pyfastmurmurhash3 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dream2333/pyfastmurmurhash3

0.0 0.0 0.0 206 KB

A Python extension module of MurmurHash3 developed using a mix of C language and Cython. 最快的MurmurHash3算法,C+Cython混合实现,用于文本指纹计算及布隆过滤器去重

License: MIT License

Python 34.55% C 59.25% Cython 6.20%

pyfastmurmurhash3's Introduction

FastMurmurHash3

中文文档按此

fmmh3 is a Python extension module developed using a mix of C language and Cython. It wraps the C language MurmurHash3 hash function, making it available for use in Python. Compared to the pure Python version of MurmurHash3, fmmh3 is several tens to hundreds of times faster. Compared to another C language implementation, the mmh3 library, fmmh3 is 1-2.5 times faster in processing medium and small texts.

Installation

Using pip

pip install fmmh3

Using Poetry

poetry add fmmh3

Benchmark Tests

We compared the performance of fmmh3, the pure Python version of MurmurHash3, and the mmh3 library bound with ctypes. Here are our test results:

Byte String Length MurmurHash3 (Python) mmh3 fmmh3
1 1x 6.27x 15.62x
10 1x 9.43x 23.08x
512 1x 197x 373x
1000 1x 324x 538x

When the byte string size is greater than 1kb, the Python version of the algorithm exceeds the test time. Therefore, we excluded the Python version of the test in data above 1kb. Here is the speed difference between mmh3 and fmmh3:

Byte String Length mmh3 fmmh3
1 1x 2.51x
10 1x 2.44x
100 1x 2.36x
512 1x 1.90x
1000 1x 1.65x
5000 1x 1.18x
10000 1x 1.09x

As we can see, fmmh3 has a significant performance advantage.

Function Usage

fmmh3 provides three functions to calculate MurmurHash3 hash values: hash32_x86, hash128_x86, and hash128_x64:

from fmmh3 import hash32_x86, hash128_x86, hash128_x64

key = b"hello world"
seed = 0

hash32_value = hash32_x86(key, seed)
hash128_x86_value = hash128_x86(key, seed)
hash128_x64_value = hash128_x64(key, seed)

The function returns a hash value integer. key is the byte string to calculate the hash value, and seed is the hash seed, usually a prime number.

Author

This project was developed by Dream2333.

The MurmurHash algorithm was originally proposed by Austin Appleby.

The C version of the algorithm comes from PeterScott.

The Python version used in the benchmark test comes from wc-duck.

Contribution

If you want to contribute to this project, you can:

  • Report issues or suggest improvements on GitHub.
  • Submit pull requests to fix issues or add new features.
  • Share this project to let more people know about it.

License

This project is licensed under the MIT license.

pyfastmurmurhash3's People

Contributors

dream2333 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.