GithubHelp home page GithubHelp logo

malor / bhtt Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 63 KB

The Ben-Haim/Tom-Tov (BHTT) streaming histogram sketch implementation in Rust

License: MIT License

Rust 100.00%

bhtt's Introduction

bhtt

build status

Ben-Haim/Tom-Tov (BHTT) streaming histogram sketch implementation (http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf).

A BHTT histogram is an ordered list of bins that is a compact approximate representation of a numerical data distribution.

The desired size of the list is chosen at histogram creation time (larger histograms give more accurate estimates of quantiles, but use more memory and are generally slower). Each bin is a (value, count) pair, where value is a weighted average of count numbers (or bins) merged together.

To insert a new value to the histogram, a new (value, 1) bin is added first, preserving the ascending order of bins in the list. If the total number of bins exceeds the configured histogram size, then the two closest bins are merged together to restore the size invariant.

Unlike traditional histograms, which require one to specify both the histogram size and the bin boundaries configuration at histogram creation time, BHTT streaming histograms automatically adjust the bins as new values are inserted.

Typical operations on the constructed histograms include approximations of quantiles and counts.

BHTT histograms are mergeable but it's not possible to subtract one histogram from another (bin by bin) to get a difference between the two.

Examples

use bhtt::Histogram;

let values = vec![1.0, 0.0, -5.4, -2.1, 8.5, 10.0, 8.6, 4.3, 7.8, 5.2];

// create a new Histogram with (up to) 5 bins and insert some values
let mut h = Histogram::new(5);
for value in values {
    h.insert(value);
}

// the histogram keeps track of the total number of inserted values,
// as well as the exact minimum and maximum values
assert_eq!(h.size(), 5);
assert_eq!(h.count(), 10);
assert_eq!(h.min(), Some(-5.4));
assert_eq!(h.max(), Some(10.0));

// the histogram allows to get approximations of arbitrary quantiles
assert_eq!(h.quantile(0.0), Some(-5.4));
assert_eq!(h.quantile(0.5), Some(4.75));
assert_eq!(h.quantile(1.0), Some(10.0));

// or estimate how many values are less than or equal to the given value
assert_eq!(h.count_less_than_or_equal_to(-7.4), 0);
assert_eq!(h.count_less_than_or_equal_to(5.0), 5);
assert_eq!(h.count_less_than_or_equal_to(13.0), 10);

// if two histograms have been built separately, it's possible to merge them together
h.merge(&Histogram::from_iter(5, &[1.0, -7.6, 0.0, 5.8, 4.3, 2.1, 11.6]));
assert_eq!(h.count(), 17);
assert_eq!(h.min(), Some(-7.6));
assert_eq!(h.max(), Some(11.6));

Development

Running tests

Unit tests are put directly to the code modules in src/.

Integration tests are in tests/ directory with input data and utility functions stored in utilities/ (so that they can be shared between integration tests and benchmarks).

$ cargo test

Running benchmarks

$ cargo bench

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.