GithubHelp home page GithubHelp logo

jasondavies / bloomfilter.js Goto Github PK

View Code? Open in Web Editor NEW
757.0 22.0 79.0 35 KB

JavaScript bloom filter using FNV for fast hashing

Home Page: http://www.jasondavies.com/bloomfilter/

License: BSD 3-Clause "New" or "Revised" License

JavaScript 100.00%

bloomfilter.js's Introduction

Bloom Filter

This JavaScript bloom filter implementation uses the non-cryptographic Fowler–Noll–Vo hash function for speed.

Usage

var bloom = new BloomFilter(
  32 * 256, // number of bits to allocate.
  16        // number of hash functions.
);

// Add some elements to the filter.
bloom.add("foo");
bloom.add("bar");

// Test if an item is in our filter.
// Returns true if an item is probably in the set,
// or false if an item is definitely not in the set.
bloom.test("foo");
bloom.test("bar");
bloom.test("blah");

// Serialisation. Note that bloom.buckets may be a typed array,
// so we convert to a normal array first.
var array = [].slice.call(bloom.buckets),
    json = JSON.stringify(array);

// Deserialisation. Note that the any array-like object is supported, but
// this will be used directly, so you may wish to use a typed array for
// performance.
var bloom = new BloomFilter(array, 16);

Implementation

Although the bloom filter requires k hash functions, we can simulate this using only two hash functions. In fact, we can use the same FNV algorithm for both hash functions, using only different base offsets for the two hashes.

Thanks to Will Fitzgerald for his help and inspiration with the hashing optimisation.

bloomfilter.js's People

Contributors

dey-dey avatar dmcgrath avatar ept avatar eugeneware avatar gleenn avatar jasondavies avatar pchaigno avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bloomfilter.js's Issues

Adding filters

If I wanted to combine two filters of the same size, can I get the array forms and add the values at each respective index? (Then reserialize into another filter)

Use optimal m/k given n/p option

I was just wondering why you don't have an option to use the optimal m and k parameters (see here) based on n and p.

There are a couple ways I imagine this going:

  • Add an optional 3rd argument in the constructor that when present and set to true would interpret the first and second values of the constructor as n and p instead of m and k.
  • Change the constructor to accept a hash of either {m: blah, k: blah} or {n: blah, k: blah} (hopefully with better names than the single letters.
  • Add a new function, something like BloomFilter.useOptimal(n, p).

I could implement any of these, but I wanted to ask you first if there was any reason why you didn't do this.

bloom.buckets doesn't work

Hello!
var array = [].slice.call(bloom.buckets),
json = JSON.stringify(array);

This method doesn't work. How I can get serialization in your package? bloom method doesn't identify the buckets. It only hash bitview, view, serialize, and etc.

License clarification

First of all, thank you for the great implementation of Bloom Filter. We're using this package from our h2-auto-push package, and it's working great.

But I want to make sure we have no license problems by depending on this package. Your LICENSE file looks similar to BSD-3-Clause but not quite the same. Can I understand it as BSD-3-Clause?

If my understanding is correct, can you please put the license info into your package.json? Because it is missing, the npm page says "license: none": https://www.npmjs.com/package/bloomfilter. It'll be as simple as adding this line:

{ "license" : "BSD-3-Clause" }

https://docs.npmjs.com/files/package.json#license

Thank you!

Deserialization works only if filter's length in bits is divisible by 32

> bloom=new BloomFilter(100, 3)
BloomFilter {m: 100, k: 3, buckets: Int32Array[4], _locations: Uint8Array[3], locations: function…}
> bloom.add('test')
undefined
> bl2=new BloomFilter(bloom.buckets, 3)
BloomFilter {m: 128, k: 3, buckets: Int32Array[4], _locations: Uint8Array[3], locations: function…}
> bl2.test('test')
false
> bloom.test('test')
true

fnv-plus

Hey I ran across your project as I was researching bloom filters, and I noticed this on your website:
"Unfortunately I can't use the 64-bit trick in the linked post as JavaScript only supports bitwise operations on 32 bits."

I don't know if you'd be interested in this, but a little while ago I wrote a version of fnv with an expanded keyspace (up to 1024-bit): https://github.com/tjwebb/fnv-plus.

contrib/ directory containing Python version?

Hi Jason,

Great library! Thanks for writing this.

I needed to send Bloom filters to and from my webapp frontend to the Python backend (i.e., do add() in JS and test() in Python and the reverse). I ended up porting bloomfilter.js to Python by doing a line-by-line translation. Maybe I missed a note in the docs about an easier way? :)

If you're interested I can send a pull-request with a contrib/ directory with the Python version. It's a little hacky, because I used a C module to get Javascript numeric semantics (modulo and arithmetic are different in Python than Javascript).

Let me know.

Ranga

Create function to serialize/deserialize bloom filter

Given the standalone nature of your bloomfilter, I was wondering if it would make sense to serialize/deserialize the bloomfilter bytearray/array.

My use case is that I'm trying to send a dictionary to the front-end to see if words the user types in are in it. I thought it might be faster/smaller to store the dictionary in a bloomfilter.

To Typescript?

It is worth it to convert this implementation to TypeScript?

It will be easier for others to understand as well as contribute.

False positives

I'm getting a much higher false positive rate than I would expect from a bloom filter of the size that I'm using

I'm using a 1024-bit bloomfilter with 16 hashes and 20 elements in each filter.

I'm running a test which adds 20 elements to a filter, checking before adding each that the item isn't already in the filter.

After running the test 500 times, there are ~4 collisions.

Given a bloom filter with those parameters, there should only be about a 1/1.3 billion chance of collision (https://hur.st/bloomfilter/?n=20&p=&m=1024&k=16)

Here's the short script:

    let collisions = 0
    for(let i=0; i< 500; i++) {
      const filter = new BloomFilter(1024, 16)
      const dict = {}
      for(let j=0; j< 20; j++) {
        const str = Math.floor(Math.random() * 1000000000).toString(16)
        if(filter.test(str) && dict[str] !== true){
          console.log("COLLISION: ", str)
          collisions++
        }
        filter.add(str)
        dict[str] = true
      }
      console.log(i)
    }
    console.log('done: ', collisions)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.