GithubHelp home page GithubHelp logo

axiak / pybloomfiltermmap Goto Github PK

View Code? Open in Web Editor NEW
740.0 49.0 138.0 2.59 MB

Fast Python Bloom Filter using Mmap

Home Page: http://axiak.github.com/pybloomfiltermmap/

License: MIT License

Python 39.81% C 59.55% Makefile 0.64%

pybloomfiltermmap's Introduction

pybloomfiltermmap Build Status

The goal of pybloomfiltermmap is simple: to provide a fast, simple, scalable, correct library for Bloom Filters in Python.

Docs

See http://axiak.github.com/pybloomfiltermmap/.

Overview

After you install, the interface to use is a cross between a file interface and a ste interface. As an example:

>>> fruit = pybloomfilter.BloomFilter(100000, 0.1, '/tmp/words.bloom')
>>> fruit.update(('apple', 'pear', 'orange', 'apple'))
>>> len(fruit)
3
>>> 'mike' in fruit
False
>>> 'apple' in fruit
True

Install

You may or may not want to use Cython. If you have it installed, the setup file will build the C file from the pyx file. Otherwise, it will skip that step automatically and build from the packaged C file.

To install:

$ sudo python setup.py install

and you should be set.

License

See the LICENSE file. It's under the MIT License.

pybloomfiltermmap's People

Contributors

235 avatar axiak avatar dbishop avatar dcrosta avatar locutusofborg avatar pbutler avatar piskvorky avatar rpstac avatar seanjensengrey avatar showard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pybloomfiltermmap's Issues

Error "Symbol not found: _EVP_DigestFinal_ex"

Hi,

I try to compile the new version of pybloomfiltermmap. All seems to work when I launch "python setup.py install", but when I try to import the module, it returns the following error:

MacBook-Pro:axiak-pybloomfiltermmap-70a85e6 root# python
ActivePython 2.6.7.20 (ActiveState Software Inc.) based on
Python 2.6.7 (r267:88850, Jun 27 2011, 14:10:26)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

import pybloomfilter
Traceback (most recent call last):
File "", line 1, in
ImportError: dlopen(/var/root/.local/lib/python2.6/site-packages/pybloomfiltermmap-0.3.4-py2.6-macosx-10.5-intel.egg/pybloomfilter.so, 2): Symbol not found: _EVP_DigestFinal_ex
Referenced from: /var/root/.local/lib/python2.6/site-packages/pybloomfiltermmap-0.3.4-py2.6-macosx-10.5-intel.egg/pybloomfilter.so
Expected in: dynamic lookup

I verify if _EVP_DigestFinal_ex exists in my system:

MacBook-Pro:axiak-pybloomfiltermmap-70a85e6 root# nm -arch x86_64 /usr/lib/libcrypto.dylib | grep DigestFinal
0000000000056840 T _EVP_DigestFinal
00000000000567b0 T _EVP_DigestFinal_ex

MacBook-Pro:axiak-pybloomfiltermmap-70a85e6 root# nm -arch x86_64 /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.7.sdk/usr/lib/libcrypto.dylib | grep DigestFinal
0000000000056840 T _EVP_DigestFinal
00000000000567b0 T _EVP_DigestFinal_ex

I can't undestand where I'm wrong!

Can you help me to solve this problem?
Thanks in advance!

Symbol not found: _FIPS_digestfinal

Hi Michael,

I am getting this error on mac OSX 10.7.5 64bit

from pybloomfilter import BloomFilter
Traceback (most recent call last):
File "", line 1, in
File "build/bdist.macosx-10.7-intel/egg/pybloomfilter.py", line 7, in
# the Free Software Foundation; either version 2 of the License, or
File "build/bdist.macosx-10.7-intel/egg/pybloomfilter.py", line 6, in bootstrap
# it under the terms of the GNU General Public License as published by
ImportError: dlopen(/Users/abhi/.python-eggs/pybloomfiltermmap-0.3.11-py2.7-macosx-10.7-intel.egg-tmp/pybloomfilter.so, 2): Symbol not found: _FIPS_digestfinal
Referenced from: /Users/abhi/.python-eggs/pybloomfiltermmap-0.3.11-py2.7-macosx-10.7-intel.egg-tmp/pybloomfilter.so
Expected in: flat namespace
in /Users/abhi/.python-eggs/pybloomfiltermmap-0.3.11-py2.7-macosx-10.7-intel.egg-tmp/pybloomfilter.so

sudo pip install not working on mac

when enter the command sudo pip install pybloomfiltermmap, it installs and generates warnings:

2 warnings generated.
g++ -bundle -undefined dynamic_lookup -L/Users/danshiff/anaconda/lib -arch x86_64 -arch x86_64 build/temp.macosx-10.5-x86_64-2.7/src/mmapbitarray.o build/temp.macosx-10.5-x86_64-2.7/src/bloomfilter.o build/temp.macosx-10.5-x86_64-2.7/src/md5.o build/temp.macosx-10.5-x86_64-2.7/src/primetester.o build/temp.macosx-10.5-x86_64-2.7/src/MurmurHash3.o build/temp.macosx-10.5-x86_64-2.7/src/pybloomfilter.o -L/Users/danshiff/anaconda/lib -lcrypto -o build/lib.macosx-10.5-x86_64-2.7/pybloomfilter.so

The site-packages directory contains pybloomfilter.so and pybloomfiltermmap-0.3.14-py2.7.egg-info, but no pybloomfiltermmap on its own.

Include pydablooms in the speed comparison

Suggested patch to also profile the bitly Dablooms Python wrapper available from https://github.com/bitly/dablooms

diff --git a/tests/comparisons/speedtest.py b/tests/comparisons/speedtest.py
index 8d10a4c..428ebd1 100755
--- a/tests/comparisons/speedtest.py
+++ b/tests/comparisons/speedtest.py
@@ -9,7 +9,7 @@ import pybloomfilter
 
 tempfiles = []
 
-ERROR_RATE = 0.1
+ERROR_RATE = 0.1 #i.e. 10%
 
 #def get_and_add_words(Creator, wordlist):
 def get_and_add_words(Creator, wordlist):
@@ -46,6 +46,7 @@ def create_word_list(filename):
     return words_set
 
 def create_cbloomfilter(*args):
+    """Using pybloomfilter.BloomFilter(capacity, error_rate, temp_file)"""
     args = list(args)
     f = tempfile.NamedTemporaryFile()
     tempfiles.append(f)
@@ -53,6 +54,15 @@ def create_cbloomfilter(*args):
     args.append(f.name)
     return pybloomfilter.BloomFilter(*tuple(args))
 
+def create_dablooms(*args):
+    """Using pydablooms.Dablooms(capacity, error_rate, temp_file)"""
+    args = list(args)
+    f = tempfile.NamedTemporaryFile()
+    tempfiles.append(f)
+    os.unlink(f.name)
+    args.append(f.name)
+    return pydablooms.Dablooms(*tuple(args))
+
 creators = [create_cbloomfilter]
 try:
     import pybloom
@@ -61,27 +71,36 @@ except ImportError:
 else:
     creators.append(pybloom.BloomFilter)
 
+try:
+    import pydablooms
+except ImportError:
+    pass
+else:
+    creators.append(create_dablooms)
+
 def run_test():
     dict_wordlist = create_word_list('words')
     test_wordlist = create_word_list('testwords')
     NUM = 10
 
+    print "Requested error rate %0.1f%%" % (100 * ERROR_RATE)
     for creator in creators:
         start = time.time()
         if NUM:
             t = timeit.Timer(lambda : get_and_add_words(creator, dict_wordlist))
-            print "%s took %0.5f s/run" % (
+            print "%s took %0.5f s/run (get and add words)" % (
                 creator,
                 t.timeit(NUM) / float(NUM))
         bf = get_and_add_words(creator, dict_wordlist)
 
         if NUM:
             t = timeit.Timer(lambda : check_words(bf, test_wordlist))
-            print "%s took %0.5f s/run" % (
+            print "%s took %0.5f s/run (check words)" % (
                 creator,
                 t.timeit(NUM) / float(NUM))
 
-        raw_input()
+        #print "Press enter to continue..."
+        #raw_input()
 
         test_errors(bf, dict_wordlist, test_wordlist)
 

Note this assumes dablooms has added support for 'in' in their Python wrapper - see bitly/dablooms#50 - you could test without it but it would require special case code to use their check method instead.

I'm not sure how representative your test set is but pydablooms is significantly faster (which I also observed on the real data sample I was trying this on). Note on my machine both libraries are doing far better than the requested 10% error rate, although pybloomfiltermmap has the lower error rate.

NameError: name 'exit' is not defined

The package installs, but trying to import it crashes the interpreter:

$ python
Python 2.7.10 (default, Jul 14 2015, 19:46:27)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from pybloomfilter import BloomFilter
[-]calg library: http://c-algorithms.sourceforge.net
$

Importing from IPython survives the import, and the error is clearer:

In [1]: from pybloomfilter import BloomFilter
[-]calg library: http://c-algorithms.sourceforge.net
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-a89b8b011857> in <module>()
----> 1 from pybloomfilter import BloomFilter

/Volumes/work/workspace/vew/ds/lib/python2.7/site-packages/pybloomfilter.py in <module>()
     29 if calgfound is None:
     30     print "[-]calg library: http://c-algorithms.sourceforge.net"
---> 31     exit(-1)
     32 calg=CDLL(calgfound)
     33

NameError: name 'exit' is not defined

What does this error mean and how do I install pybloomfiltermmap so that it works?

This is OS X Yosemite, pybloomfiltermmap installed with pip install pybloomfiltermmap:

$ uname -a
Darwin kofola3 14.5.0 Darwin Kernel Version 14.5.0: Wed Jul 29 02:26:53 PDT 2015; root:xnu-2782.40.9~1/RELEASE_X86_64 x86_64

compilation error in mageia 3 gnu/linux

Hello, i'm trying build the library and i receive some linker errors.

gcc -pthread -shared -Wl,--as-needed -Wl,--no-undefined -Wl,-z,relro -Wl,-O1 -Wl,--build-id -Wl,--enable-new-dtags build/temp.linux-x86_64-2.7/src/mmapbitarray.o build/temp.linux-x86_64-2.7/src/bloomfilter.o build/temp.linux-x86_64-2.7/src/md5.o build/temp.linux-x86_64-2.7/src/primetester.o build/temp.linux-x86_64-2.7/src/pybloomfilter.o -L. -lcrypto -lpython2.7 -o build/lib.linux-x86_64-2.7/pybloomfilter.so
build/temp.linux-x86_64-2.7/src/mmapbitarray.o: En la función `mbarray_Create_Malloc':

pybloomfiltermmap-master/src/mmapbitarray.c:48: referencia a ceil' sin definir pybloomfiltermmap-master/src/mmapbitarray.c:49: referencia aceil' sin definir
build/temp.linux-x86_64-2.7/src/mmapbitarray.o: En la función mbarray_Update': pybloomfiltermmap-master/src/mmapbitarray.c:377: referencia aceil' sin definir
pybloomfiltermmap-master/src/mmapbitarray.c:378: referencia a ceil' sin definir build/temp.linux-x86_64-2.7/src/mmapbitarray.o: En la funciónmbarray_Create_Mmap':
pybloomfiltermmap-master/src/mmapbitarray.c:99: referencia a ceil' sin definir build/temp.linux-x86_64-2.7/src/mmapbitarray.o:pybloomfiltermmap-master/src/mmapbitarray.c:111: más referencias aceil' sin definir a continuación
collect2: error: ld devolvió el estado de salida 1

can you add this parameters to setup.py?

ext_modules = [Extension("pybloomfilter",
ext_files,
libraries=['crypto'],
extra_links_args=['-lm'])]

Thanks in advance,
regards

Segmentation faults with in-memory BloomFilter

If you create an in-memory BloomFilter (i.e. no filename argument passed to BloomFilter.__init__) some actions can segfault.

Given, bloom_filter = BloomFilter(1000, 0.01), the following actions segfault:

  • bloom_filter.name
  • bloom_filter.copy('another_path')
  • bloom_filter.to_base64()

The root cause is that attempting to access self._bf.array.filename actually triggers the segfault.

There's no reasonable value for bloom_filter.name on an in-memory BloomFilter if that attribute is always supposed to be an actual file on disk. But raising something like NotImplementedError would be preferable to a segfault.

For copy() and to_base64(), they could both get their job done if they accessed the raw data through mmap instead of having to go through a file on disk. But that's a pretty good-sized code change, so raising NotImplementedError there would be fine. If you want those functions, back your filter with a file.

install error on pyconfig.h

Ubuntu 13.10 x64
have a error with installing thru pip or tar.gz :

pybloomfiltermmap-release-0.3.12$ sudo python setup.py install
info: Building from C
running install
Checking .pth file support in /usr/local/lib/python2.7/dist-packages/
/usr/bin/python -E -c pass
TEST PASSED: /usr/local/lib/python2.7/dist-packages/ appears to support .pth files
running bdist_egg
running egg_info
writing pybloomfiltermmap.egg-info/PKG-INFO
writing top-level names to pybloomfiltermmap.egg-info/top_level.txt
writing dependency_links to pybloomfiltermmap.egg-info/dependency_links.txt
reading manifest file 'pybloomfiltermmap.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'pybloomfiltermmap.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'pybloomfilter' extension
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/mmapbitarray.c -o build/temp.linux-x86_64-2.7/src/mmapbitarray.o
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/bloomfilter.c -o build/temp.linux-x86_64-2.7/src/bloomfilter.o
src/bloomfilter.c:11:14: warning: always_inline function might not be inlinable [-Wattributes]
BloomFilter *bloomfilter_Create_Malloc(size_t max_num_elem, double error_rate,
^
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/md5.c -o build/temp.linux-x86_64-2.7/src/md5.o
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/primetester.c -o build/temp.linux-x86_64-2.7/src/primetester.o
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/MurmurHash3.cpp -o build/temp.linux-x86_64-2.7/src/MurmurHash3.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/pybloomfilter.c -o build/temp.linux-x86_64-2.7/src/pybloomfilter.o
src/pybloomfilter.c:8:22: fatal error: pyconfig.h: No such file or directory
#include "pyconfig.h"
^
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

Can't initialize BloomFilter when installed with 'easy_install'

Installed from git source works fine.

Trace:

Traceback (most recent call last):
  File "./bloom.py", line 8, in 
    f = pybloomfilter.BloomFilter(30, 0.01, './bloom.bin')
TypeError: too many initializers

'easy_install':

# easy_install pybloomfilter
install_dir /usr/lib/python2.7/site-packages/
Searching for pybloomfilter
Reading http://pypi.python.org/simple/pybloomfilter/
Reading http://bitbucket.org/xmonader/pybloomfilter/
Best match: pybloomfilter 1.0
Downloading http://pypi.python.org/packages/source/p/pybloomfilter/pybloomfilter-1.0.tar.gz#md5=33f6abf334c56d10c749b05bf952a045
Processing pybloomfilter-1.0.tar.gz
Running pybloomfilter-1.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-l3AEWI/pybloomfilter-1.0/egg-dist-tmp-5imkBN
zip_safe flag not set; analyzing archive contents...
Adding pybloomfilter 1.0 to easy-install.pth file

Installed /usr/lib/python2.7/site-packages/pybloomfilter-1.0-py2.7.egg
Processing dependencies for pybloomfilter
Finished processing dependencies for pybloomfilter

copy_template should support None as filename

The default instantiator allows you to pass None as a filename, copy_template should allow you to do the same. Instead it throws an error:

TypeError: coercing to Unicode: need string or buffer, NoneType found

does not work on mac os x

Hi,

when running the following file (sent to me by Mr Andres Riancho):

from pybloomfilter import BloomFilter

bf = BloomFilter(10000, 0.001, '/tmp/bloom.remove.me')
bf.add(1)
assert 1 in bf
assert not 2 in bf

print 'Success!'

python2.6 ./bloom-test.py
Traceback (most recent call last):
File "./bloom-test.py", line 3, in
bf = BloomFilter(10000, 0.001, '/tmp/bloom.remove.me')
File "pybloomfilter.pyx", line 125, in
pybloomfilter.BloomFilter.cinit (src/pybloomfilter.c:2347)
MemoryError

python2.7 ./bloom-test.py
Traceback (most recent call last):
File "./bloom-test.py", line 3, in
bf = BloomFilter(10000, 0.001, '/tmp/bloom.remove.me')
File "pybloomfilter.pyx", line 125, in
pybloomfilter.BloomFilter.cinit (src/pybloomfilter.c:2347)
MemoryError

FYI, I installed bloomFilter using easy_install-2.x

it is not a permission problem:

$ touch /tmp/bloom.remove.me
$ echo $?
0

uname -a
Darwin xxxxxx 11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23 16:25:48
PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64

Mac OS X 10.7.5

$ python2.6 --version
2.6.8

$ python2.7 --version
2.7.3

ImportError (undefined symbol)

Any idea what could be going on here? Did the openssl libs not link in correctly?

>>> import pybloomfilter
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: /usr/lib64/python2.6/site-packages/pybloomfiltermmap-0.3.4-py2.6-linux-x86_64.egg/pybloomfilter.so: undefined symbol: EVP_sha512

BloomFilter.__len__(item) misreports after BloomFilter.open(filename)

It's a minor issue, alas, it requires changes to the file format which might be a hassle.

>> fruit = pybloomfilter.BloomFilter(100000, 0.1, '/tmp/words.bloom')
>> fruit.update(('apple', 'pear', 'orange', 'apple'))
>> len(fruit)
3
>> fruit.sync()

>> basket =  pybloomfilter.BloomFilter.open('/tmp/words.bloom')
>> 'orange' in basket
True
>> len(basket)
0
-- However, this should be 3.

typo in README

Code example in 'Overview' contains the line:
len(bf)
which most likely should be
len(fruit)

filename problem

from pybloomfilter import BloomFilter
BloomFilter(10, .001, 'tmp/a')
Segmentation fault

Bloomfilter file compatibility between 32 and 64 bit architectures

Currently the bloomfilter files generated on 32bit machines are not compatible on 64 bit machines, and vice versa.

One of the causes is the pointer size differences that is used in the bloomfilter file.
Is it possible to make it compatible between the two architectures?

Keep track of how many additions there have been

There is a demand to keep track of how many times we've added elements to the filer.
My current proposal to this would be the following example code:

>>> bf = BloomFilter(100, 0.1, '/tmp/fruit.bloom')
>>> bf.add("Apple")
>>> bf.add('Apple')
>>> bf.add('orange')
>>> print len(bf)
2
>>> bf2 = bf.copy_template('/tmp/new.bloom')
>>> bf2 |= bf
>>> print len(bf2)
ValueError: Size of bloom filter is indeterminate after unions and intersections.

So basically the proposal has 3 parts:

  • Counter is transparent and wrapped in len()
  • Counter increments whenever add() is called and the element is not there
  • Any time a union/intersection is performed the length is invalidated and subsequent calls result in a ValueError

Broken macosx intel package

andresriancho/w3af#1669

Downloading/unpacking pybloomfiltermmap
Downloading pybloomfiltermmap-0.3.14.macosx-10.9-intel.tar.gz (97kB): 97kB downloaded
Running setup.py egg_info for package pybloomfiltermmap
Traceback (most recent call last):
File "", line 16, in 
IOError: [Errno 2] No such file or directory: '/tmp/pip-build-root/pybloomfiltermmap/setup.py'
Complete output from command python setup.py egg_info:
Traceback (most recent call last):

File "", line 16, in

IOError: [Errno 2] No such file or directory: '/tmp/pip-build-root/pybloomfiltermmap/setup.py'

Thread-safe?

Just want to make sure, is this lib thread-safe?

Memory error on BloomFilter instanciation

Instantiating a BloomFilter object as the tutorial says so, I get a MemoryError

>>> from pybloomfilter import BloomFilter
>>> fruit = BloomFilter(100000, 0.1, '/tmp/words.bloom')
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-2-b436a046df8e> in <module>()
----> 1 fruit = BloomFilter(100000, 0.1, '/tmp/words.bloom')

/home/oleiade/.python-eggs/pybloomfiltermmap-0.3.8-py2.7-linux-x86_64.egg-tmp/pybloomfilter.so in pybloomfilter.BloomFilter.__cinit__ (src/pybloomfilter.c:2347)()

MemoryError: 

Using git bisect, I found out that the bug seems to be introduced by the commit be40e8c

Starting from master branch as bad

$ git bisect start
$ git bisect bad
$ git bisect good 93621b8b9f365224d7d2948d6c5060c2b576db0c
be40e8cfc19f74e900ade94f220fddb247e9efbd is the first bad commit
commit be40e8cfc19f74e900ade94f220fddb247e9efbd
Author: Mike Axiak <[email protected]>
Date:   Mon Sep 24 23:41:48 2012 -0400

    Might have fixed crypto lib issue? Refs #22

:100644 100644 2da822fa61db7cbb89510a8bbbf5fcc02cf00d6f 02dbf8a3104e01a6578a6ff641e4dd64ec92a662 M  CHANGELOG
:100644 100644 12eb6769d3ea02c11726eeab5edbf284f736cf36 e51fa1d55b29e411529141345bf1c1ce9d28e950 M  setup.py
:040000 040000 82cc0956f7fddeca724e8d4dbd2305650e5cb1c1 778ad9afef070beba0afabfd5d8b3de5b860e1fd M  src

hope it helps.

Nota : bug seems to be confirmed by travis

Disable `count_correct`

Any way to turn off the count_correct flag from Python?

I don't want to the filter to be counting elements or whatever (it ruins performance in environments where the same mmap file is shared between many processes). But I can't find a way to stop this... any way to set count_correct to 0?

Segfault

Python 2.7.5 (default, May 19 2013, 13:24:54)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

from pybloomfilter import BloomFilter
bf = BloomFilter(100, 0.1, None)
bf.add("hi")
[1] 48984 segmentation fault python

Same thing happened on both OSX 10.6 and 10.7

error with large dataset

Hi,

I would like to use the bloom filter with very large data sets. I will have
1000000000000000000000000000000
entries. The entries themselves are numbers up to the number of entries. With pybloomfiltermmap I get OverflowError or MemoryError if I use less entries. Could you suggest a solution?

Thanks,
joe42

0.3.12 pypi package is broken

I'm not being able to install the latest pybloomfiltermmap (0.3.12) from pypi, but installation works using python setup.py install.

Installation fails using pip:

(pybloomfiltermmap)andres@eug:~/PycharmProjects/virtual-envs/pybloomfiltermmap$ pip install --upgrade pybloomfiltermmap==0.3.12
Downloading/unpacking pybloomfiltermmap==0.3.12
  Downloading pybloomfiltermmap-0.3.12.tar.gz (409kB): 409kB downloaded
  Running setup.py egg_info for package pybloomfiltermmap
    info: Building from C

Installing collected packages: pybloomfiltermmap
  Running setup.py install for pybloomfiltermmap
    info: Building from C
    building 'pybloomfilter' extension
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/mmapbitarray.c -o build/temp.linux-x86_64-2.7/src/mmapbitarray.o
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/bloomfilter.c -o build/temp.linux-x86_64-2.7/src/bloomfilter.o
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/md5.c -o build/temp.linux-x86_64-2.7/src/md5.o
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/primetester.c -o build/temp.linux-x86_64-2.7/src/primetester.o
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/MurmurHash3.cpp -o build/temp.linux-x86_64-2.7/src/MurmurHash3.o
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for Ada/C/ObjC but not for C++ [enabled by default]
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/pybloomfilter.c -o build/temp.linux-x86_64-2.7/src/pybloomfilter.o
    gcc: error: src/pybloomfilter.c: No such file or directory
    gcc: fatal error: no input files
    compilation terminated.
    error: command 'gcc' failed with exit status 4
    Complete output from command /home/andres/pch/virtual-envs/pybloomfiltermmap/bin/python2.7 -c "import setuptools;__file__='/home/andres/PycharmProjects/virtual-envs/pybloomfiltermmap/build/pybloomfiltermmap/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-nRxrrc-record/install-record.txt --single-version-externally-managed --install-headers /home/andres/pch/virtual-envs/pybloomfiltermmap/include/site/python2.7:
    info: Building from C

running install

running build

running build_ext

building 'pybloomfilter' extension

creating build

creating build/temp.linux-x86_64-2.7

creating build/temp.linux-x86_64-2.7/src

gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/mmapbitarray.c -o build/temp.linux-x86_64-2.7/src/mmapbitarray.o

gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/bloomfilter.c -o build/temp.linux-x86_64-2.7/src/bloomfilter.o

gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/md5.c -o build/temp.linux-x86_64-2.7/src/md5.o

gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/primetester.c -o build/temp.linux-x86_64-2.7/src/primetester.o

gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/MurmurHash3.cpp -o build/temp.linux-x86_64-2.7/src/MurmurHash3.o

cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for Ada/C/ObjC but not for C++ [enabled by default]

gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/pybloomfilter.c -o build/temp.linux-x86_64-2.7/src/pybloomfilter.o

gcc: error: src/pybloomfilter.c: No such file or directory

gcc: fatal error: no input files

compilation terminated.

error: command 'gcc' failed with exit status 4

----------------------------------------
Cleaning up...
Command /home/andres/pch/virtual-envs/pybloomfiltermmap/bin/python2.7 -c "import setuptools;__file__='/home/andres/PycharmProjects/virtual-envs/pybloomfiltermmap/build/pybloomfiltermmap/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-nRxrrc-record/install-record.txt --single-version-externally-managed --install-headers /home/andres/pch/virtual-envs/pybloomfiltermmap/include/site/python2.7 failed with error code 1 in /home/andres/PycharmProjects/virtual-envs/pybloomfiltermmap/build/pybloomfiltermmap
Traceback (most recent call last):
  File "/home/andres/pch/virtual-envs/pybloomfiltermmap/bin/pip", line 9, in <module>
    load_entry_point('pip==1.4.1', 'console_scripts', 'pip')()
  File "/home/andres/pch/virtual-envs/pybloomfiltermmap/local/lib/python2.7/site-packages/pip/__init__.py", line 148, in main
    return command.main(args[1:], options)
  File "/home/andres/pch/virtual-envs/pybloomfiltermmap/local/lib/python2.7/site-packages/pip/basecommand.py", line 169, in main
    text = '\n'.join(complete_log)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 42: ordinal not in range(128)
(pybloomfiltermmap)andres@eug:~/PycharmProjects/virtual-envs/pybloomfiltermmap$ find . -name pybloom*
(pybloomfiltermmap)andres@eug:~/PycharmProjects/virtual-envs/pybloomfiltermmap$

Installation works using python setup.py install:

(pybloomfiltermmap)andres@eug:~/PycharmProjects/pybloomfiltermmap$ git checkout release/0.3.12
...
HEAD is now at 57bcd82... Publish 0.3.12
(pybloomfiltermmap)andres@eug:~/PycharmProjects/pybloomfiltermmap$ python setup.py install
info: Building from C
running install
running bdist_egg
running egg_info
writing pybloomfiltermmap.egg-info/PKG-INFO
writing top-level names to pybloomfiltermmap.egg-info/top_level.txt
writing dependency_links to pybloomfiltermmap.egg-info/dependency_links.txt
reading manifest file 'pybloomfiltermmap.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'pybloomfiltermmap.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'pybloomfilter' extension
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/mmapbitarray.c -o build/temp.linux-x86_64-2.7/src/mmapbitarray.o
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/bloomfilter.c -o build/temp.linux-x86_64-2.7/src/bloomfilter.o
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/md5.c -o build/temp.linux-x86_64-2.7/src/md5.o
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/primetester.c -o build/temp.linux-x86_64-2.7/src/primetester.o
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/MurmurHash3.cpp -o build/temp.linux-x86_64-2.7/src/MurmurHash3.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for Ada/C/ObjC but not for C++ [enabled by default]
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/pybloomfilter.c -o build/temp.linux-x86_64-2.7/src/pybloomfilter.o
In file included from src/pybloomfilter.c:348:0:
/usr/include/python2.7/pythread.h:5:1: warning: ‘always_inline’ attribute ignored [-Wattributes]
src/pybloomfilter.c:828:1: warning: function declaration isn’t a prototype [-Wstrict-prototypes]
g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro build/temp.linux-x86_64-2.7/src/mmapbitarray.o build/temp.linux-x86_64-2.7/src/bloomfilter.o build/temp.linux-x86_64-2.7/src/md5.o build/temp.linux-x86_64-2.7/src/primetester.o build/temp.linux-x86_64-2.7/src/MurmurHash3.o build/temp.linux-x86_64-2.7/src/pybloomfilter.o -lcrypto -o build/lib.linux-x86_64-2.7/pybloomfilter.so
creating build/bdist.linux-x86_64/egg
copying build/lib.linux-x86_64-2.7/pybloomfilter.so -> build/bdist.linux-x86_64/egg
creating stub loader for pybloomfilter.so
byte-compiling build/bdist.linux-x86_64/egg/pybloomfilter.py to pybloomfilter.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying pybloomfiltermmap.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying pybloomfiltermmap.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying pybloomfiltermmap.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying pybloomfiltermmap.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
creating 'dist/pybloomfiltermmap-0.3.12-py2.7-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing pybloomfiltermmap-0.3.12-py2.7-linux-x86_64.egg
Copying pybloomfiltermmap-0.3.12-py2.7-linux-x86_64.egg to /home/andres/PycharmProjects/virtual-envs/pybloomfiltermmap/lib/python2.7/site-packages
Adding pybloomfiltermmap 0.3.12 to easy-install.pth file

Installed /home/andres/PycharmProjects/virtual-envs/pybloomfiltermmap/lib/python2.7/site-packages/pybloomfiltermmap-0.3.12-py2.7-linux-x86_64.egg
Processing dependencies for pybloomfiltermmap==0.3.12
Finished processing dependencies for pybloomfiltermmap==0.3.12
(pybloomfiltermmap)andres@eug:~/PycharmProjects/pybloomfiltermmap$ 

First I thought this was the issue reported here #29 , but I don't have Cython installed, thus the unlink lines are never executed in my environment.

I've downloaded the pybloomfiltermmap-0.3.12.tar.gz file from pypi, decompressed it and tried to find the pybloomfilter.c file inside. It's not there. It might be the case that you're renaming another file... but it is unlikely... most likely a bug in your setup.py that's forgetting to add the most important file 👎

Still experiencing issue re-opening large file sizes (Issue #21)

Hi,

I wanted to write here that I'm experiencing the same large Bloom filter issue that was previously closed under Issue #21 (full details in a comment there). I've had this happen on both OS X Mountain Lion and Fedora 17, both running the 64-bit Python 2.7.3.

Please let me know if there's anything I can do to help.

Best,
Boyd

Checksums are a good idea

We could have some sort of checksum logic to preserve integrity of data. I'm not sure how to do it quickly or if it would be a feature, but it's worth thinking about.

BloomFilter.copy method broken

BloomFilter.copy tries to instantiate a BloomFilter instance passing in a "mode" kwarg. But the constructor for BloomFilter does not have that kwarg, and instead has its own code to determine the mode.

Suggested fix: instead of trying to pass in a "mode" kwarg, the capacity argument sent to BloomFilter.init in BloomFilter.copy should just be "ReadFile".

Close files?

I haven't looked into it yet, but it seems like the files aren't closed properly when an object gets deleted. This can be an issue if you want to use a lot of bloomfilters during a program's execution but you will only use a few at a time.

Still getting a MemoryError on 0.3.11

Latest version still appears to give me MemoryErrors. Runs fine on an OSX box, but deploying to linux gives me:

Traceback (most recent call last):
  File "pybloomfilter.pyx", line 125, in pybloomfilter.BloomFilter.__cinit__ (src/pybloomfilter.c:2349)
MemoryError

immediately when creating a filter.

Also, not to mix things, but I get assertion errors for accuracy when running the tests (the same ones that are failing the travis build it appears) which is kind of worrying, considering how far off they are. Has anyone looked into those?

Add reserved space to bloomfilter metadata structure

You never know what else the bloom filter might want to store, in addition to things like the size, it may want to store the date of creation, last modification, etc.

In order to maintain backwards compatibility, the bloomfilter struct should probably have some reserved bytes.

Files

I like pybloomfilter a lot. I wish the file parameter was optional. If a file isn't provided then its memory resident. Due to memory mapping its tricky to clean up after the bloom filter because you cannot always delete the memory mapped file while python is running.

next_prime() throwing a: WTF!?!?!

So i totally got a WTF error on my first run with bloom filter. I tracked down the problem to primetester.c and it was checking if orig<=prime. So i just added it to the while loop. What is the correct solution to this problem?

Fix:
while (!is_prime(orig, 3) && (orig - prime) < 5000 && orig <= prime) {
orig += 2;
}

Can't install on MacOS X 10.6

$ sudo python setup.py install

Building from C
running install
running build
running build_ext
building 'pybloomfilter' extension
gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch ppc -arch x86_64 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 -c src/mmapbitarray.c -o build/temp.macosx-10.6-universal-2.6/src/mmapbitarray.o
/usr/libexec/gcc/powerpc-apple-darwin10/4.2.1/as: assembler (/usr/bin/../libexec/gcc/darwin/ppc/as or /usr/bin/../local/libexec/gcc/darwin/ppc/as) for architecture ppc not installed
Installed assemblers are:
/usr/bin/../libexec/gcc/darwin/x86_64/as for architecture x86_64
/usr/bin/../libexec/gcc/darwin/i386/as for architecture i386
src/mmapbitarray.c:424: fatal error: error writing to -: Broken pipe
compilation terminated.
lipo: can't open input file: /var/tmp//ccJcNReJ.out (No such file or directory)
error: command 'gcc-4.2' failed with exit status 1

BloomFilter malloc and from_base64()

It would be nice to be able to from_base64() to a malloc() instead of mmap().

Use case: I want to create a bloom filter of IP addresses. When a user tries to vote for something, we check their IP vs the bloom filter. If it doesn't pass, a +1 is added to the document. We store the updated bloom filter to_base64() to the same document.

On a web application, I don't want to persist these filters to disk but use a shared storage with the base64.

Number of bits set to one

I would like to obtain the number of bits set to one in the underlying bit array.

I'd like to use it to estimate the length of the bloom filter after a union or intersection using the formulae described on Wikipedia by Swamidass & Baldi (2007). Or have I got it wrong?

please use build flags

hi,

bloomfilter: mmapbitarray.* bloomfilter.*
        gcc -lm -O3 mmapbitarray.c md5.c MurmurHash3.cpp bloomfilter.c -o bf

mbarray: mmapbitarray.*
        gcc -lm -O3 -DMBAQUERY mmapbitarray.c -o mbaquery
        gcc -lm -O3 -DMBACREATE mmapbitarray.c -o mbacreate

do you have any reason for overriding system LDFLAGS, CFLAGS, CPPFLAGS during build?

Debian injects harderning flags, but they are strip during build...

thanks,

G.

Hash function cuasing false postives. (Nasty)

I noticed some bad behavior with the pybloomfiler to I wrote a simple set of tests. A ~22% failure rate is really bad. As a quick fix i am using the hash() function prior to passing the data to pyboomfiler. Usually the false positive rate with this function is 0, sometimes its really close to 0.

My simple test:
http://pastebin.com/C8KwWFxR

The output:
('false positive', 0.2251722)
('false negitive', 0.0)
with hash()
('false positive', 1e-07)
('false negitive', 0.0)
with md5()
('false positive', 0.0)
('false negitive', 0.0)
with md5() hexdigigest
('false positive', 0.0)
('false negitive', 0.0)

latest version crashing python on Mac OS X

Hi,

on Mac OS X,
latest version of pybloomfiltermmap
using python2.6 or python2.7
I get a crash using w3af :


Thread 10 Crashed:
0 pybloomfilter.so 0x0000000100d8ec76 __pyx_pw_13pybloomfilter_11BloomFilter_21add + 534
1 org.python.python 0x000000010088d1b1 PyEval_EvalFrameEx + 9185

Won't install with Cython since setup.py deletes src/pybloomfilter.c

It appears that the setup.py file is removing some required files with the following lines:

lines 35 and 36:

    os.unlink(os.path.join(here, 'src', 'pybloomfilter.c'))
    os.unlink(os.path.join(here, 'pybloomfilter.so'))
$ pip install pybloomfiltermmap

Downloading/unpacking pybloomfiltermmap
  Downloading pybloomfiltermmap-0.3.11.tar.gz (435kB): 435kB downloaded
  Running setup.py egg_info for package pybloomfiltermmap
    info: Building from Cython

Installing collected packages: pybloomfiltermmap
  Running setup.py install for pybloomfiltermmap
    info: Building from Cython
    building 'pybloomfilter' extension
    xcrun clang -fno-strict-aliasing -fno-common -dynamic -I/usr/local/include -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/mmapbitarray.c -o build/temp.macosx-10.8-x86_64-2.7/src/mmapbitarray.o
    src/mmapbitarray.c:114:18: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare]
        if (filesize < 0) {
            ~~~~~~~~ ^ ~
    1 warning generated.
    xcrun clang -fno-strict-aliasing -fno-common -dynamic -I/usr/local/include -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/bloomfilter.c -o build/temp.macosx-10.8-x86_64-2.7/src/bloomfilter.o
    In file included from src/bloomfilter.c:9:
    In file included from src/bloomfilter.h:5:
    src/mmapbitarray.h:115:16: warning: attribute 'always_inline' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]
    __attribute__((always_inline))
                   ^
    src/bloomfilter.c:177:5: warning: 'EVP_MD_CTX_init' is deprecated [-Wdeprecated-declarations]
        EVP_MD_CTX_init(&ctx);
        ^
    /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/include/openssl/evp.h:547:6: note: 'EVP_MD_CTX_init' declared here
    void    EVP_MD_CTX_init(EVP_MD_CTX *ctx) DEPRECATED_IN_MAC_OS_X_VERSION_10_7_AND_LATER;
            ^
    src/bloomfilter.c:179:5: warning: 'EVP_DigestInit_ex' is deprecated [-Wdeprecated-declarations]
        EVP_DigestInit_ex(&ctx, EVP_sha512(), NULL);
        ^
    /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/include/openssl/evp.h:555:5: note: 'EVP_DigestInit_ex' declared here
    int     EVP_DigestInit_ex(EVP_MD_CTX *ctx, const EVP_MD *type, ENGINE *impl) DEPRECATED_IN_MAC_OS_X_VERSION_10_7_AND_LATER;
            ^
    src/bloomfilter.c:179:29: warning: 'EVP_sha512' is deprecated [-Wdeprecated-declarations]
        EVP_DigestInit_ex(&ctx, EVP_sha512(), NULL);
                                ^
    /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/include/openssl/evp.h:677:15: note: 'EVP_sha512' declared here
    const EVP_MD *EVP_sha512(void) DEPRECATED_IN_MAC_OS_X_VERSION_10_7_AND_LATER;
                  ^
    src/bloomfilter.c:180:5: warning: 'EVP_DigestUpdate' is deprecated [-Wdeprecated-declarations]
        EVP_DigestUpdate(&ctx, (const unsigned char *)&hash_seed, sizeof(hash_seed));
        ^
    /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/include/openssl/evp.h:556:5: note: 'EVP_DigestUpdate' declared here
    int     EVP_DigestUpdate(EVP_MD_CTX *ctx,const void *d,
            ^
    src/bloomfilter.c:181:5: warning: 'EVP_DigestUpdate' is deprecated [-Wdeprecated-declarations]
        EVP_DigestUpdate(&ctx, (const unsigned char *)key->shash, key->nhash);
        ^
    /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/include/openssl/evp.h:556:5: note: 'EVP_DigestUpdate' declared here
    int     EVP_DigestUpdate(EVP_MD_CTX *ctx,const void *d,
            ^
    src/bloomfilter.c:182:5: warning: 'EVP_DigestFinal_ex' is deprecated [-Wdeprecated-declarations]
        EVP_DigestFinal_ex(&ctx, (unsigned char *)&result_buffer, NULL);
        ^
    /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/include/openssl/evp.h:558:5: note: 'EVP_DigestFinal_ex' declared here
    int     EVP_DigestFinal_ex(EVP_MD_CTX *ctx,unsigned char *md,unsigned int *s) DEPRECATED_IN_MAC_OS_X_VERSION_10_7_AND_LATER;
            ^
    src/bloomfilter.c:183:5: warning: 'EVP_MD_CTX_cleanup' is deprecated [-Wdeprecated-declarations]
        EVP_MD_CTX_cleanup(&ctx);
        ^
    /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/include/openssl/evp.h:548:5: note: 'EVP_MD_CTX_cleanup' declared here
    int     EVP_MD_CTX_cleanup(EVP_MD_CTX *ctx) DEPRECATED_IN_MAC_OS_X_VERSION_10_7_AND_LATER;
            ^
    8 warnings generated.
    xcrun clang -fno-strict-aliasing -fno-common -dynamic -I/usr/local/include -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/md5.c -o build/temp.macosx-10.8-x86_64-2.7/src/md5.o
    xcrun clang -fno-strict-aliasing -fno-common -dynamic -I/usr/local/include -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/primetester.c -o build/temp.macosx-10.8-x86_64-2.7/src/primetester.o
    xcrun clang -fno-strict-aliasing -fno-common -dynamic -I/usr/local/include -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/pybloomfilter.c -o build/temp.macosx-10.8-x86_64-2.7/src/pybloomfilter.o
    clang: error: no such file or directory: 'src/pybloomfilter.c'
    clang: error: no input files
    error: command 'xcrun' failed with exit status 1
    Complete output from command /Users/adam/.virtualenvs/alfie-reporting-service/bin/python -c "import setuptools;__file__='/Users/adam/.virtualenvs/alfie-reporting-service/build/pybloomfiltermmap/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/ym/bsf8r8n12ps96qk9qbt2sqv40000gn/T/pip-pThRKr-record/install-record.txt --single-version-externally-managed --install-headers /Users/adam/.virtualenvs/alfie-reporting-service/bin/../include/site/python2.7:
    info: Building from Cython

running install

running build

running build_ext

building 'pybloomfilter' extension

creating build

creating build/temp.macosx-10.8-x86_64-2.7

creating build/temp.macosx-10.8-x86_64-2.7/src

xcrun clang -fno-strict-aliasing -fno-common -dynamic -I/usr/local/include -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/mmapbitarray.c -o build/temp.macosx-10.8-x86_64-2.7/src/mmapbitarray.o

src/mmapbitarray.c:114:18: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare]

    if (filesize < 0) {

        ~~~~~~~~ ^ ~

1 warning generated.

xcrun clang -fno-strict-aliasing -fno-common -dynamic -I/usr/local/include -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/bloomfilter.c -o build/temp.macosx-10.8-x86_64-2.7/src/bloomfilter.o

In file included from src/bloomfilter.c:9:

In file included from src/bloomfilter.h:5:

src/mmapbitarray.h:115:16: warning: attribute 'always_inline' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]

__attribute__((always_inline))

               ^

src/bloomfilter.c:177:5: warning: 'EVP_MD_CTX_init' is deprecated [-Wdeprecated-declarations]

    EVP_MD_CTX_init(&ctx);

    ^

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/include/openssl/evp.h:547:6: note: 'EVP_MD_CTX_init' declared here

void    EVP_MD_CTX_init(EVP_MD_CTX *ctx) DEPRECATED_IN_MAC_OS_X_VERSION_10_7_AND_LATER;

        ^

src/bloomfilter.c:179:5: warning: 'EVP_DigestInit_ex' is deprecated [-Wdeprecated-declarations]

    EVP_DigestInit_ex(&ctx, EVP_sha512(), NULL);

    ^

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/include/openssl/evp.h:555:5: note: 'EVP_DigestInit_ex' declared here

int     EVP_DigestInit_ex(EVP_MD_CTX *ctx, const EVP_MD *type, ENGINE *impl) DEPRECATED_IN_MAC_OS_X_VERSION_10_7_AND_LATER;

        ^

src/bloomfilter.c:179:29: warning: 'EVP_sha512' is deprecated [-Wdeprecated-declarations]

    EVP_DigestInit_ex(&ctx, EVP_sha512(), NULL);

                            ^

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/include/openssl/evp.h:677:15: note: 'EVP_sha512' declared here

const EVP_MD *EVP_sha512(void) DEPRECATED_IN_MAC_OS_X_VERSION_10_7_AND_LATER;

              ^

src/bloomfilter.c:180:5: warning: 'EVP_DigestUpdate' is deprecated [-Wdeprecated-declarations]

    EVP_DigestUpdate(&ctx, (const unsigned char *)&hash_seed, sizeof(hash_seed));

    ^

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/include/openssl/evp.h:556:5: note: 'EVP_DigestUpdate' declared here

int     EVP_DigestUpdate(EVP_MD_CTX *ctx,const void *d,

        ^

src/bloomfilter.c:181:5: warning: 'EVP_DigestUpdate' is deprecated [-Wdeprecated-declarations]

    EVP_DigestUpdate(&ctx, (const unsigned char *)key->shash, key->nhash);

    ^

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/include/openssl/evp.h:556:5: note: 'EVP_DigestUpdate' declared here

int     EVP_DigestUpdate(EVP_MD_CTX *ctx,const void *d,

        ^

src/bloomfilter.c:182:5: warning: 'EVP_DigestFinal_ex' is deprecated [-Wdeprecated-declarations]

    EVP_DigestFinal_ex(&ctx, (unsigned char *)&result_buffer, NULL);

    ^

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/include/openssl/evp.h:558:5: note: 'EVP_DigestFinal_ex' declared here

int     EVP_DigestFinal_ex(EVP_MD_CTX *ctx,unsigned char *md,unsigned int *s) DEPRECATED_IN_MAC_OS_X_VERSION_10_7_AND_LATER;

        ^

src/bloomfilter.c:183:5: warning: 'EVP_MD_CTX_cleanup' is deprecated [-Wdeprecated-declarations]

    EVP_MD_CTX_cleanup(&ctx);

    ^

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/include/openssl/evp.h:548:5: note: 'EVP_MD_CTX_cleanup' declared here

int     EVP_MD_CTX_cleanup(EVP_MD_CTX *ctx) DEPRECATED_IN_MAC_OS_X_VERSION_10_7_AND_LATER;

        ^

8 warnings generated.

xcrun clang -fno-strict-aliasing -fno-common -dynamic -I/usr/local/include -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/md5.c -o build/temp.macosx-10.8-x86_64-2.7/src/md5.o

xcrun clang -fno-strict-aliasing -fno-common -dynamic -I/usr/local/include -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/primetester.c -o build/temp.macosx-10.8-x86_64-2.7/src/primetester.o

xcrun clang -fno-strict-aliasing -fno-common -dynamic -I/usr/local/include -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/pybloomfilter.c -o build/temp.macosx-10.8-x86_64-2.7/src/pybloomfilter.o

clang: error: no such file or directory: 'src/pybloomfilter.c'

clang: error: no input files

error: command 'xcrun' failed with exit status 1

----------------------------------------
Command /Users/adam/.virtualenvs/alfie-reporting-service/bin/python -c "import setuptools;__file__='/Users/adam/.virtualenvs/alfie-reporting-service/build/pybloomfiltermmap/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/ym/bsf8r8n12ps96qk9qbt2sqv40000gn/T/pip-pThRKr-record/install-record.txt --single-version-externally-managed --install-headers /Users/adam/.virtualenvs/alfie-reporting-service/bin/../include/site/python2.7 failed with error code 1 in /Users/adam/.virtualenvs/alfie-reporting-service/build/pybloomfiltermmap
Storing complete log in /Users/adam/.pip/pip.log
pip install pybloomfiltermmap  0.82s user 0.27s system 37% cpu 2.936 total

Item hashes differ between unicode and str, and 32 / 64 bits

I noticed that unlike the built-in containers, pybloomfiltermmap differentiates between unicode and str, eg

>>> from pybloomfilter import BloomFilter
>>> filter = BloomFilter(100000, 0.01, '/tmp/bloomfilter')
>>> filter.add(u'foo')
False
>>> 'foo' in filter
False
>>> u'foo' in filter
True
>>> 'foo' in set([u'foo'])
True

(The only documentation I can find for this is PEP-100, which makes no guarantees about hashes of unicode containing non-ascii characters.)

I also noticed that the default python hash on 32-bit python is the low 32 bits of the hash on 64-bit python. For example, on 32-bit:

>>> hex(hash("foo"))
'-0x2c217945'

vs 64-bit:

>>> hex(hash("foo"))
'-0x39f8634c2c217945'

This caused me some head-scratching: my workaround was to encode unicode strings via utf-8, so they'd go through the string hashing instead of python hashing, and it works splendidly now.

At the next file-format incompatibility, it might be nice to hash unicode strings via PyString_AsStringAndSize, and mask off the top 32 bits on 64-bit systems.

Error in opening big bloom filters

After doing:

OPEN PYTHON

bf = pybloomfilter.BloomFIlter(1000000000 , 0.000001 , '/mybloom.bloom')
bf.add('apple')
bf.sync()

CLOSE PYTHON

If I try:

OPEN PYTHON

pybloomfilter.BloomFilter.open( '/mybloom.bloom')

I get the following error:

Traceback (most recent call last):
File "", line 1, in
File "pybloomfilter.pyx", line 40, in pybloomfilter.bf_from_file (src/pybloomfilter.c:1142)
File "pybloomfilter.pyx", line 82, in pybloomfilter.BloomFilter.cinit (src/pybloomfilter.c:1617)
ValueError: Invalid Bloomfilter file: /mybloom.bloom

It works pretty well if I work with capacity up to 200 000 000, but not when bloom filter become too big.

Create magic footers

In addition to magic headers, magic footers ensure that the entire file was copied correctly.

Build on Windows

Trying to build on windows has been kind of hard because setuptools tries to use MSVC, and this compiler doesn't have mman.h and such.
Would someone please provide a binary build for windows x86? I would be of help to some.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.