GithubHelp home page GithubHelp logo

dpayne / loglogbeta Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 27 KB

A C++ implementation of LogLogBeta cardinality estimation algorithm making usee of AVX-512 intrinsics.

License: MIT License

CMake 22.89% C++ 64.04% Java 13.07%

loglogbeta's Introduction

loglogbeta

A C++ implementation of LogLogBeta cardinality estimation algorithm making use of AVX operations.

Original code was ported from a go implementation by seiflotfy, loglogbeta.

The algorithm comes from a paper by Jason Qin, Denys Kim, Yumei Tung LogLog-Beta and More: A New Algorithm for Cardinality Estimation Based on LogLog Counting.

Build

git clone [email protected]:dpayne/loglogbeta.git

# clone the hashing library xxHash and the testing/benchmark libraries
git submodule update --init

mkdir build
cd build
cmake ../

# To enable unit tests use
#cmake -DBUILD_PERF_TESTS=ON ../

make

Results

Error Rate

Actual Estimated Error Rate
10 10 0
100 99 0.01
1000 999 0.001
10000 10054 0.0054
100000 99408 0.00592
1000000 1017794 0.017794

Performance

Non-Avx Time (ns) Avx Time (ns)
Add Hash 2 2
Estimate Cardinality 62,381 1,555
Merge 9,133 192
Add 100,000 Hashes and Estimate 237,703 142,38

All test were run on skylake CPU running at 4.6 GHz.

Performance Comparison to other implementations

Note: most of these comparisons are for comparison sake only, this is not really a fair comparison since they are often making different tradeoffs. Namely using the more standard and well tested HLL++ algorithm.

The Java version tested is the HLL++ version taken from Facebook's jcommons library. This version also makes the trade off of calculating the sum and zero counts as it adds hashes. This makes for a fairly fast cardinality check but slows down adding of a hash.

The other C++ implement tested here, libcount, uses the HLL++ algorithm.

The Rust implementation could probably be greatly improved. It is my own version and am I not an expert in rust. As of now, rust stable does not have good intel intrinsics support yet so it uses non-avx code.

Add 100,000 Hashes and Estimate Time (ns)
LogLogBeta 142,389
LibCount 806,143
Rust 224,659
Java HLL 8,448,932
Estimate Cardinality Time (ns)
LogLogBeta 2,256
LibCount 15,564
Rust 77,948
Java HLL 2,256 (uses cached sum)
Add Single Hash Time (ns)
LogLogBeta 2
LibCount 6
Rust 1
Java HLL 1,184
Merge Two Results Time (ns)
LogLogBeta 192
LibCount 6,303
Rust 13,052
Java HLL* 503,358,957
  • The Java HLL used here did not provide a merge operation so I wrote a simple merge function in Java.

All test were run on skylake CPU running at 4.6 GHz.

Build Perf Tests

# To enable perf tests and unit tests use, firstt checkout the libcount library
git clone [email protected]:dialtr/libcount.git perf_tests/extern/libcount
# Then enable perf tests
mkdir -p build && cd build
cmake -DBUILD_PERF_TESTS=ON ../
make

loglogbeta's People

Contributors

dpayne avatar

Watchers

 avatar  avatar  avatar

loglogbeta's Issues

AVX2 port

AVX-512 is quite recent how hard is it port to AVX2?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.