GithubHelp home page GithubHelp logo

lorenzhs / burr Goto Github PK

View Code? Open in Web Editor NEW
36.0 4.0 4.0 61 KB

Bumped Ribbon Retrieval and Approximate Membership Query

Home Page: https://arxiv.org/abs/2109.01892

License: Apache License 2.0

Makefile 0.81% C++ 99.19%
approximate-membership-query retrieval approximate-membership amq bloom-filter-alternative bloom-filter

burr's Introduction

BuRR: Bumped Ribbon Retrieval (and Filters)

BuRR is a static retrieval and approximate membership query data structure with extremely low overhead and fast queries. Our paper introducing BuRR, "Fast Succinct Retrieval and Approximate Membership Using Ribbon", won the best paper award at the 20th International Symposium on Experimental Algorithms 2022 and is available in full as an open-access publication. For additional details and measurements, please refer to the preprint on arxiv.org.

"Retrieval" means that you have a set of key-value pairs that you want to represent very compactly. The difference to a hash table is that the data structure may return garbage values when queried with keys not in the set. It's also typically far more compact: a BuRR representation of the data is typically not more than 0.1-0.5% larger than the values it represents (no keys are stored).

"Approximate Membership Query" (AMQ) means that you have a set and want to check whether an element is likely in the set. Some well-known examples are Bloom filters, Xor filters, Cuckoo filters, and Quotient filters. A query for an item in the set will always return true, while a query for an item not in the set usually returns false, but may return true with some small probability known as the false-positive probability f. The space required to represent this set depends on f, and there is a lower bound of log2(1/f) bits per item of the set. Classic Bloom filters use 1.44 log2(1/f) bits per key, meaning they use 44% more space than needed. With Xor filters, overhead on the order of 20% can be achieved, and Xor+ filters reduce this to approximately 10% for very small values of f. BuRR can achieve overheads as low as 0.1%, and even configurations that trade overhead for speed achieve overheads of below 0.5%.

Building and running

Make sure to fetch all submodules with git submodule update --init --recursive, then type make bench to compile a benchmark runner that includes a wide range of configurations, or make tests to compile the test suite. The scripts used in the evaluation are located in the scripts folder. You may also want to refer to the fastfilter_cpp repository for a comparison to other filter data structures and more benchmarks used in our paper.

Citation

If you use BuRR in the context of an academic publication, we ask that you please cite our paper:

@inproceedings{BuRR2022,
    author={Peter C. Dillinger, Lorenz Hübschle-Schneider, Peter Sanders, and Stefan Walzer},
    title={Fast Succinct Retrieval and Approximate Membership using Ribbon},
    booktitle={20th International Symposium on Experimental Algorithms (SEA 2022)},
    pages={4:1--4:20},
    year={2022},
    doi={10.4230/LIPIcs.SEA.2022.4}
}

License

BuRR is licensed under the Apache 2.0 license. Copyright is held by Lorenz Hübschle-Schneider and Facebook, Inc. It is based on Peter C. Dillinger's implementation of Standard Ribbon, which is copyright Facebook, Inc. and also licensed under the Apache 2.0 license.

burr's People

Contributors

bytehamster avatar lorenzhs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.