GithubHelp home page GithubHelp logo

etsangsplk / hyperloglog Goto Github PK

View Code? Open in Web Editor NEW

This project forked from axiomhq/hyperloglog

0.0 1.0 0.0 101 KB

HyperLogLog with lots of sugar (Sparse, LogLog-Beta bias correction and TailCut space reduction)

License: MIT License

Go 100.00%

hyperloglog's Introduction

HyperLogLog GoDoc Go Report Card cover.run go

An improved version of HyperLogLog for the count-distinct problem, approximating the number of distinct elements in a multiset using 20-50% less space than other usual HyperLogLog implementations.

This work is based on "Better with fewer bits: Improving the performance of cardinality estimation of large data streams - Qingjun Xiao, You Zhou, Shigang Chen".

Implementation

The core differences between this and other implementations are:

  • use metro hash instead of xxhash
  • sparse representation for lower cardinalities (like HyperLogLog++)
  • loglog-beta for dynamic bias correction medium and high cardinalities.
  • 4-bit register instead of 5 (HLL) and 6 (HLL++), but most implementations use 1-byte registers out of convenience

In general it borrows a lot from InfluxData's fork of Clark Duvall's HyperLogLog++ implementation, but uses 50% less space.

Results

A direct comparison with the HyperLogLog++ implementation used by InfluxDB yielded the following results:

Exact Axiom (8.2 KB) Influx (16.39 KB)
10 10 (0.0% off) 10 (0.0% off)
50 50 (0.0% off) 50 (0.0% off)
250 250 (0.0% off) 250 (0.0% off)
1250 1249 (0.08% off) 1249 (0.08% off)
6250 6250 (0.0% off) 6250 (0.0% off)
31250 31008 (0.7744% off) 31565 (1.0080% off)
156250 156013 (0.1517% off) 156652 (0.2573% off)
781250 782364 (0.1426% off) 775988 (0.6735% off)
3906250 3869332 (0.9451% off) 3889909 (0.4183% off)
10000000 9952682 (0.4732% off) 9889556 (1.1044% off)

Note

A big thank you to Prof. Shigang Chen and his team at the University of Florida who are actively conducting research around "Big Network Data".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.