fastfilter / xor_singleheader Goto Github PK

Header-only binary fuse and xor filter library

License: Apache License 2.0

Makefile 0.50% C 95.55% CMake 3.95%

xor_singleheader's Issues

Unroll the main loop

For extra performance, this look can be unrolled...

    for (size_t i = 0; i < size; i++) {
      uint64_t key = keys[i];
      xor_hashes_t hs = xor8_get_h0_h1_h2(key, filter);
      sets[hs.h0].xormask ^= hs.h;
      sets[hs.h0].count++;
      sets[hs.h1].xormask ^= hs.h;
      sets[hs.h1].count++;
      sets[hs.h2].xormask ^= hs.h;
      sets[hs.h2].count++;
    }

Do the computations first and then do the memory accesses.

Index overflow with Fuse8 during Rust port

While implementing Fuse8 in rust, I had the following error:

Benchmarking fuse8_populate: Collecting 100 samples in estimated 74.312 s (100 iterations)thread 'main' panicked at 'index out of bounds: the len is 10000001 but the index is 10000001', /home/
prataprc/myworld/devrs/xorfilter/src/fuse8.rs:345:23
stack backtrace:
   0:     0x563ed8b922c0 - std::backtrace_rs::backtrace::libunwind::trace::hdcf4f90f85129e83
                               at /rustc/5c029265465301fe9cb3960ce2a5da6c99b8dcf2/library/std/src/../../backtrace/src/backtrace/libunwind.rs:90:5
  ...
  18:     0x563ed89cd17d - <alloc::vec::Vec<T,A> as core::ops::index::Index<I>>::index::h92de800ab79df56c
                               at /home/prataprc/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:2427:9
  19:     0x563ed89cd17d - xorfilter::fuse8::Fuse8<H>::build_keys::h29347249efad0362
                               at /home/prataprc/myworld/devrs/xorfilter/src/fuse8.rs:345:23
  20:     0x563ed89d44d8 - xorfilter::fuse8::Fuse8<H>::build::hbcd0e61df247a22f
                               at /home/prataprc/myworld/devrs/xorfilter/src/fuse8.rs:294:9
  21:     0x563ed89d44d8 - fuse8_bench::bench_fuse8_populate::{{closure}}::{{closure}}::hd59d7c6b8d7c2c10
                               at /home/prataprc/myworld/devrs/xorfilter/benches/fuse8_bench.rs:58:13
  22:     0x563ed89d44d8 - criterion::Bencher<M>::iter::h64d37255ed521f5a

I got this while bench-marking the Fuse8 filter. Also I am not able to re-produce this again, since I am generating the keys randomly.

Question is, can startPos[segment_index]++; value at segment_index go beyond size+1 (where size is number of keys).

xor_singleheader/include/binaryfusefilter.h

Line 261 in 9c68073

startPos[segment_index]++;

PS: In my benchmark code I am not printing the seed value so that if it happens again I should be able to reproduce it.

low memory population inquiry

Hello - super interesting work on the binary fuse filters. I'm enjoying them a lot. I noticed the readme says:

The construction of a binary fuse filter is fast but it needs a fair amount of temporary memory: plan for about 24 bytes of memory per set entry. It is possible to construct a binary fuse filter with almost no temporary memory, but the construction is then somewhat slower.

I'm exploring application of binary fuse filters in search engines (using them as an ngram set lookup). I previously was investigating moving the memory allocations from the population code ot instead utilize mmaped system memory as a means to reduce the physical memory requirements of populating filters - but to hear you think it may be possible to accept a slower construction time with almost no temporary memory is extremely interesting to me.

I am curious if anyone has had more in-depth thoughts about how this would be approached, or tried an implementation of this? If not I will likely try my hand at it, just figured I'd ask in case anyone already had and I might be spared a few hours :)

Produce an optimized version for large inputs

For large inputs (that exceed the CPU cache), we should be more careful with cache and memory usage during the construction. It should be possible to gain 10% to 20% in performance.

Fingerprinting function

In the paper, I read that the fingerprint function maps each possible value from the universe to a word value, I wanted to see how that's implemented here and found:

xor_singleheader/include/binaryfusefilter.h

Lines 394 to 396 in 177cf03

 static inline uint64_t binary_fuse16_fingerprint(uint64_t hash) { 

 return hash ^ (hash >> 32); 

 }

What is the reasoning for this? I am a bit confused at how we arrived at this function.

Thanks!

False Positive Comparision Between fuse and xor16?

Hello,

I've built out Erlang bindings for the binary fuse filter: https://github.com/mpope9/efuse_filter and things look very positive when benchmarking, performance-wise. I was curious on how the binary fuse filter's false positive rate compared to the xor16?

Port to Go

Porting this code to Go ought to be quite easy.

It is not necessary to port everything, porting the xor8_populate and xor8_allocate functions would be a good start.

Typo in README.md

Should include binaryfusefilter.h instead in the C++ wrapper example.

xor_singleheader/README.md

Line 77 in 1f7e18b

#include "xorfilter.h"
The first parameter of AddAll() cannot be const, because it may be edited by function binary_fuse_sort_and_remove_dup().

xor_singleheader/README.md

Lines 90 to 92 in 1f7e18b

bool AddAll(const uint64_t* data, const size_t start, const size_t end) {

return binary_fuse8_populate(data + start, end - start, &filter);

}
Member variable name already changed from fingerprints to Fingerprints in commit b8268c9 .

xor_singleheader/README.md

Line 100 in 1f7e18b

o.filter.fingerprints = nullptr; // we take ownership for the data

Probability of filter construction failure

Has there been profiling done to estimate the probability of filter construction failure? I'm not an expert so I'd thought to ask :)

dynamically add element

If I allocate enough space at the beginning, can I add elements dynamically? I don't see an API for dynamically adding elements

Support for 32 bit fingerprints and xor maps with 8, 16, and 32 bit

Both are quite rare use case in my view. Xor maps would have, instead of a contain method, have a get method to get the stored data. Supporting these features would probably require to rewrite most of the code in the form of pre-processor statements. The advantage is, it would reduce code duplication. It would be harder to read and debug, but running the pre-processor could be a separate build phase, which generates the source code for xor_8, xor_16, xor_32, as well as xor_map_8, xor_map_16, xor_map_32, xor_map_64.

Concurrent creation and merging?

Do I understand your work correctly that it's not possible to combine multiple filters into one?

We are having a partitioned vector, which is prepared in multiple threads. We currently build BloomFilters and and them in the end.
Is something similar possible with the xor of binary fuse filters?

Wrong function called

xor_singleheader/README.md

Line 91 in 914c73a

return xor8_buffered_populate(data + start, end - start, &filter);

It probably should be:
return binary_fuse8_populate(data + start, end - start, &filter);

Function binary_fuse_sort_and_remove_dup() will drop first unique value

xor_singleheader/include/binaryfusefilter.h

Lines 20 to 30 in 1f7e18b

 static size_t binary_fuse_sort_and_remove_dup(uint64_t* keys, size_t length) { 

 qsort(keys, length, sizeof(uint64_t), binary_fuse_cmpfunc); 

 size_t j = 0; 

 for(size_t i = 1; i < length; i++) { 

 if(keys[i] != keys[i-1]) { 

 keys[j] = keys[i]; 

 j++; 

 } 

 } 

 return j+1; 

 }

In this function, why j starts from 0?
For example, if keys[0] < keys[1] , this function will assign the value of keys[1] to keys[0] , and keys[0] will be no longer available.
why return j+1; ?

Port to Swift

Should be easily portable to Swift.

Guard size <= 1 in binary_fuse8_allocate as the current behaviour is unsafe.

I've noticed with very small input sizes (0, 1, 10, 100) that binary_fuse8_allocate sometimes relies on some.. questionable behavior.

For example, if called with size=0, filter->Fingerprints is ultimately allocated with filter->ArrayLength = 786432 (768 KiB), which seems high given a filter size of zero.

With size=1, sizeFactor ends up being INFINITY, which makes:

xor_singleheader/include/binaryfusefilter.h

Line 153 in f190e5a

uint32_t capacity = (uint32_t)(round((double)size * sizeFactor));

result in effectively:

uint32_t capacity = (uint32_t)(round((double)0 * INFINITY));

https://godbolt.org/z/7vqY9ofY1

or more simply:

uint32_t capacity = (uint32_t)((double)INFINITY);

I think this cast may be undefined behavior, as in GCC this results in -1 while in clang it results in an undefined value:
https://godbolt.org/z/daxcT7n5K

All this to say, I think very small input sizes (specifically 0, 1, 10, 100) may fail during binary_fuse8_allocate in the worst case scenario and, in the best case scenario, allocate a perhaps larger set of fingerprints than needed.

Find rare elements using the filter

This is not an issue, but a question. Can I do the following query to the Xor filter: "does the given element appear definitely less than the threshold number of times"? Counting Bloom filter seems to be a traditional approach for that, but I wonder what is the current state-of-the-art library that allows that out of the box? I looked at cuckoofilter library and this operation is not supported (even though, I presume, it is possible: filter-tutorial). It would be great if Xor Filter allowed that. Otherwise, what would you recommend, maybe something from fastfilter_cpp? If that matters, I plan to put up to 3 billion queries to the filter and search for elements that were added less than ~10 times.

Thanks!

3x parallelization

Much of the code should be easily parallelizable three-way.

C++ wrapper in README is not working

Seems that the class name is Xor8, but the actual content is binary fuse filter.

Handle duplicated keys for the user

Currently the user is responsible to ensure that there are no duplicated keys. We should handle this for the user (it is relatively easy to do efficiently, without even sorting).

Cross language serialization support

Very interesting work.
I have some production use case where i want to use filters in different systems , is there any support for serialization of these filters or any serialization spec ? Any implementations you would be aware of will be helpful

Is it possible to reuse the filter after built?

Let say I have 1 million items. After the filter is built, it's saved to disk. Later I have some more items, can I reload the saved filter the incrementally add items without building the filter from scratch?
As far as I know, bloom filter can do that.

fuse filters algorithmic issue: support a small number of keys

See discussion in #20

multiple definition

I have 2 c++ files, which both include the binaryfusefilter.h header. And they are compiled by cmake together.
when I compile my project, there are errors like:

/usr/bin/ld: ../lib/libPSI.a(PsiSender.cpp.o): in function `binary_fuse_mulhi(unsigned long, unsigned long)':
PsiSender.cpp:(.text._Z17binary_fuse_mulhimm+0x0): multiple definition of `binary_fuse_mulhi(unsigned long, unsigned long)'; ../lib/libPSI.a(PsiReceiver.cpp.o):PsiReceiver.cpp:(.text._Z17binary_fuse_mulhimm+0x0): first defined here
/usr/bin/ld: ../lib/libPSI.a(PsiSender.cpp.o): in function `binary_fuse_max(double, double)':
PsiSender.cpp:(.text._Z15binary_fuse_maxdd+0x0): multiple definition of `binary_fuse_max(double, double)'; ../lib/libPSI.a(PsiReceiver.cpp.o):PsiReceiver.cpp:(.text._Z15binary_fuse_maxdd+0x0): first defined here
/usr/bin/ld: ../lib/libPSI.a(PsiSender.cpp.o): in function `binary_fuse8_populate(unsigned long const*, unsigned int, binary_fuse8_s*)':
PsiSender.cpp:(.text._Z21binary_fuse8_populatePKmjP14binary_fuse8_s+0x0): multiple definition of `binary_fuse8_populate(unsigned long const*, unsigned int, binary_fuse8_s*)'; ../lib/libPSI.a(PsiReceiver.cpp.o):PsiReceiver.cpp:(.text._Z21binary_fuse8_populatePKmjP14binary_fuse8_s+0x0): first defined here
/usr/bin/ld: ../lib/libPSI.a(PsiSender.cpp.o): in function `binary_fuse16_populate(unsigned long const*, unsigned int, binary_fuse16_s*)':
PsiSender.cpp:(.text._Z22binary_fuse16_populatePKmjP15binary_fuse16_s+0x0): multiple definition of `binary_fuse16_populate(unsigned long const*, unsigned int, binary_fuse16_s*)'; ../lib/libPSI.a(PsiReceiver.cpp.o):PsiReceiver.cpp:(.text._Z22binary_fuse16_populatePKmjP15binary_fuse16_s+0x0): first defined here
collect2: error: ld returned 1 exit status

I know when 2 files include a same header, this may happen. But I looked at binaryfusefilter.h, there is defines like:

#ifndef BINARYFUSEFILTER_H
#define BINARYFUSEFILTER_H

So why does this error still happen? How can I avoid it?

The dangers of dup keys

Hello! Love the library.

I don't feel like the README talks about the dangers of duplicate keys enough (the infinite loop). Not sure what the phrasing should be, but You should ensure that you have no duplicated keys. doesn't give the gravity of the situation correctly.

False Positives

Hi Daniel,

Thanks for making your fuse work available.

In your paper, " Binary Fuse Filters: Fast and Smaller Than Xor Filters", you say:
"The false positives may be later pruned after checking against the actual set."

Apologies if there is an obvious answer, but how would this pruning be achieved in a fuse8 filter?

Thanks.

Merging of 2 different bloom filters with same size

Hi @lemire was working on distributed bloom filters.

Is it possible to merge bloom filters like we do merging in regular bit based bloom filter by oring(|) bytes.

Save the filter

As you said the struct of bit wise filter is very simple, and can be saved in the disk. So I tried to copy a filter to another filter named filter_clone like this:

    uint64_t Seed_clone = filter.Seed;
    uint32_t SegmentLength_clone = filter.SegmentLength;
    uint32_t SegmentLengthMask_clone = filter.SegmentLengthMask;
    uint32_t SegmentCount_clone = filter.SegmentCount;
    uint32_t SegmentCountLength_clone = filter.SegmentCountLength;
    uint32_t ArrayLenth_clone = filter.ArrayLength;
    uint8_t *Fingerprints_clone;
    binary_fuse8_t filter_clone;
    filter_clone.Seed = Seed_clone;
    filter_clone.SegmentLength = SegmentLength_clone;
    filter_clone.SegmentLengthMask = SegmentLengthMask_clone;
    filter_clone.SegmentCount = SegmentCount_clone;
    filter_clone.SegmentCountLength = SegmentCountLength_clone;
    filter_clone.ArrayLength = ArrayLenth_clone;
    memcpy(filter_clone.Fingerprints, filter.Fingerprints, sizeof(*filter.Fingerprints));

However, Segmentation fault occured. I think it's my way of cloning filter.Fingerprints wrong. How can I do that?

hash index issue

Hi,
thank you for your great work.
I wonder if there is a way to get the index of the hash when I found it in the set.
for example.
I have a table like below:

7C4A8D09CA3762AF61E59520943DC26494F8941B:23174662
F7C3BC1D808E04732ADF679965CCC34CA7AE3441:7671364
B1B3773A05C0ED0176787A4F1574FF0075F7521E:3810555
5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8:3645804
3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220
7C222FB2927D828AF22F592134E8932480637C0D:2889079
6367C48DD193D56EA7B0BAAD25B19455E529F5EE:2834058
20EABE5D64B0E216796E834F52D61FD0B70332FC:2484157
E38AD214943DAAD1D64C102FAEC29DE4AFE9DA3D:2401761
8CB2237D0679CA88DB6464EAC60DA96345513964:2333232
01B307ACBA4F54F55AAFC33BB06BBBF6CA803E9A:2224432
601F1889667EFAEBB33B8C12572835DA3F027F78:2194818
C984AED014AEC7623A54F0591DA07A85FD4B762D:1942768
EE8D8728F435FD550F83852AABAB5234CE1DA528:1593388
7110EDA4D09E062AA5E4A390B0A572AC0D2C0220:1256907
B80A9AED8AF17118E51D4D0C2D7872AE26E2109E:1141300
B0399D2029F64D445BD131FFAA399A42D2F8E7DC:1081655
40BD001563085FC35165329EA1FF5C5ECBDBBEEF:1023001
AB87D24BDC7452E55738DEB5F868E1F16DEA5ACE:980209
AF8978B1797B72ACFFF9595A5A2A373EC3D9106D:968625

when i use xor8 query 7110EDA4D09E062AA5E4A390B0A572AC0D2C0220 hash and return it in the set.
how to get the index of this hash?(index=15)
thanks.

Port to Rust

Should be easily portable to Rust.

It is not necessary to port everything, porting the xor8_populate and xor8_allocate functions would be a good start.

SegmentLength max of 2^18

In the paper I understood the filter requires segment length with the power of two, however the following segment:

xor_singleheader/include/binaryfusefilter.h

Lines 439 to 441 in 177cf03

 if (filter->SegmentLength > 262144) { 

 filter->SegmentLength = 262144; 

 }

Why define a max? Are there problems that arises when we go over this amount?

BinaryFuse: reverseOrder is not fully cleared

I don't really understand much of the BinaryFuse datastructure yet, but while reading through the source I noted that the reverseOrder array is not cleared completely.

Here it is initialized: https://github.com/FastFilter/xor_singleheader/blob/master/include/binaryfusefilter.h#L208
Notice that it is a length of size +1.

Here it is cleared: https://github.com/FastFilter/xor_singleheader/blob/master/include/binaryfusefilter.h#L326
Notice that it s only up to size.

If the intention is to clear the whole array, the memset should be size +1.

Support for strings and floats

Currently I'm working on python bindings of this C implementation of xorfilter. I was wondering if there would be any support for strings and floats in future.

	static inline uint64_t binary_fuse16_fingerprint(uint64_t hash) {
	return hash ^ (hash >> 32);
	}

	bool AddAll(const uint64_t* data, const size_t start, const size_t end) {
	return binary_fuse8_populate(data + start, end - start, &filter);
	}

	static size_t binary_fuse_sort_and_remove_dup(uint64_t* keys, size_t length) {
	qsort(keys, length, sizeof(uint64_t), binary_fuse_cmpfunc);
	size_t j = 0;
	for(size_t i = 1; i < length; i++) {
	if(keys[i] != keys[i-1]) {
	keys[j] = keys[i];
	j++;
	}
	}
	return j+1;
	}

	if (filter->SegmentLength > 262144) {
	filter->SegmentLength = 262144;
	}

fastfilter / xor_singleheader Goto Github PK

xor_singleheader's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs