GithubHelp home page GithubHelp logo

hpnhxxwn / cardinality_estimation_evaluation_framework Goto Github PK

View Code? Open in Web Editor NEW

This project forked from world-federation-of-advertisers/cardinality_estimation_evaluation_framework

0.0 1.0 0.0 214 KB

Evaluation framework and methods for estimating cardinalities of groups of sets

License: Apache License 2.0

Python 100.00%

cardinality_estimation_evaluation_framework's Introduction

Methods for Estimating the Union of Multiple Sets

Overview

This repo includes code for

  • Sketches, which create an approximation of a set
  • Noisers, which add noise to those sketches
  • Estimators, which union together a series of sketches and then estimate the size of the combined Sketch
  • SetGenerators to create a series of randomly drawn sets with different kinds of relationships between the created sets
  • Simulator to combine all of the above, calculate error statistics, and compare possible methods.

Example

To start with and get a sense for how this code all works together, check out: examples/basic_comparison.py

Which will run the same experiments across multiple different estimation methods

Contributing

Sketches/Estimators/Noisers

Located in estimators/

We anticipate most additions to this repo coming in the form of new kinds of sketches and estimators, both of which are found in the estimators folder.

To get started, you should subclass either SketchBase in estimators/base.py or AnySketch in estimators/any_sketch. AnySketch is an abstraction on top of SketchBase which can make things quicker to develop for certain classes of sketch, but the abstraction in SketchBase is much simpler, so feel free to start wherever your appetite for learning a new abstraction takes you.

As the simplest example of everything one needs to implement to get started with your own estimator, look at: estimators/exact_set.py. This has all the machinery to implement the most basic sketch, which isn't really a sketch at all, but an exact representation of a set using python's built-in set data structure.

For a simple example of the AnySketch abstraction, take a look at the classic Bloom Filter implementation in estimators/bloom_filter.py. The AnySketch abstraction is particularly empowering for bloom-filter-style approaches.

Differential Privacy in Noisers

Note that while many Noisers may take in Differential Privacy parameters such as epsilon and delta, that we are making no guarantees that they are truly differentially private and suitable for protecting real user data. The noise being added is for statistical accuracy purposes only, and does not include protections against certain attacks such as the 'Least Significant Digits' problem: https://crysp.uwaterloo.ca/courses/pet/F18/cache/Mironov.pdf which is only one of many potential differential privacy 'gotchas'

SetGenerators/Simulator(s)

Located in simulations/

If you have an idea for a more realistic way to represent multiple groups of users across multiple kinds of publishers, or perhaps another corner case, you would start here with a new SetGenerator sub class.

We don't anticipate the need for a separate simulator, but still feel free to create a new one or make the current one better.

Bringing everything together

Once you have implemented anything new from the above section(s), please do add it to the following and make sure it works with the existing machinery:

  • tests/interoperability_test.py
  • examples/basic_comparison.py

cardinality_estimation_evaluation_framework's People

Contributors

hpnhxxwn avatar jgoodknight avatar kungfucraig avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.