Comments (25)
Sure, but please wait a bit since I am doing some reengineering and if my work is fruitful, we could end up with a massive conflict from hell.
from croaring.
I am aware of the nasty conflicts it'll ensue and will wait for your work.
from croaring.
I'll know soon whether my idea helps or not... I am doing as fast as I can but it involves lots of code, sadly.
from croaring.
Some things to take into account...
- We'd like to keep the unity build in working order.
- The various wrappers,
https://github.com/Ezibenroc/PyRoaringBitMap
https://github.com/saulius/croaring-rs
https://github.com/RoaringBitmap/gocroaring
should be preserved in working order as well.
from croaring.
Ok. I am done. Go ahead!
from croaring.
Do you have any idea when you'd like to work on this? We have this other issue (#47) which would require touching quite a bit of code. I'd like to avoid conflicts... so if you want to clean the headers soon, then we will delay work on issue 47 till you are done, otherwise, we may proceed with issue 47 first.
from croaring.
I'll try to tackle it today. I will do this in a separate branch. It is worth noting that it will probably break bindings?
from croaring.
Should I take the opportunity to change the signature of some functions, e.g. const parameters usually preceed non-const void roaring_bitmap_andnot_inplace(roaring_bitmap_t *x1, const roaring_bitmap_t *x2);
would become void roaring_bitmap_andnot_inplace(const roaring_bitmap_t *x1, roaring_bitmap_t *x2);
.
from croaring.
It is worth noting that it will probably break bindings?
Hopefully, it won't break the unity build.
It is probably best to aim to break as few things as possible, as a general rule, right?
separate concerns (basic operations, advanced operations, serializations) in multiple headers, such the user can only import what he needs.
That's a good idea but this does not prevent you from having a roaring.h
file that offers the same API as it does now, just have it include all the subheaders so that users who do not need separation of concern can maintain the current convenience of having a single include.
hide definitions, does the user really need the full definition? We could probably hide in a forward declaration à la struct roaring_bitmap;
The inline definitions are a matter of performance. We had major performance issues due to lack of inlining. I did not move the definitions into header files for amusement.
Remove unnecessary public that should really be private stuff.
Moving headers away from view is great. We should ensure that it does not come at the expense of performance. Being able to inline definitions is really important in some instances.
Should I take the opportunity to change the signature of some functions, e.g. const parameters usually preceed non-const
I have to disagree on this point. When we do in-place operations, the first parameter is usually the mutable one.
Here is the equivalent using operators:
x &= y
See how the x occurs before the y? I am sure that there are languages that read backward, but we are in C.
(C does not have a "andnot" operator unlike, say, Go, but it is not important.)
Here is how you'd implement it in C++:
type operator&(type & lhs, const type& rhs) {
return lhs &= rhs;
}
See how the const
parameter appears last?
from croaring.
Another wrapper (C#): https://github.com/RogueException/CRoaring.Net
from croaring.
Aren't inlining concerns fixed by proper LTO support? This small peculiarity forces some private header to be public.
I've come to understand that for a proper C library, parameter order should be: input, input-output, output. This is a matter of opinion.
I'm looking ahead, and this feels like wasted time. I'm just gonna drop this issue.
from croaring.
Aren't inlining concerns fixed by proper LTO support?
Good point. I think that this assumes that library users will rely on LTO. Otherwise, they will simply get subpar performance.
I've come to understand that for a proper C library, parameter order should be: input, input-output, output. This is a matter of opinion.
When you copy something you go
`destination = source;``
So it makes sense that we have
copy(destination, source);
Say you have a hash table and you insert something in it. Intuitively, you'd write
set[key] = value
And thus, I think you'd write
insert(hash_table *table, const int key, const int value )
I could be wrong, of course... that's just my intuition.
(You have just inspired a blog post.)
I'm looking ahead, and this feels like wasted time. I'm just gonna drop this issue.
AH!!!! I hope it is not because of the points I raised.
I will take note that we are delaying this issue.
from croaring.
Function signature: how do you order parameters? http://lemire.me/blog/2016/09/06/function-signature-how-do-you-order-parameters/
from croaring.
I did some minor cleaning... in commit 4fda6ba but it lead to a significant performance degradation (yet I just renamed the defines). I think that my experience illustrates the need to be careful.
from croaring.
I will painfully undo the commit.
from croaring.
I have undone my cleaning work. I have read and re-read the cleaned code and I do not see where it could have changed the generated code, but obviously, it did degrade the binaries quite a bit.
from croaring.
Which machine are you testing this on? I did upgrade the default compiler on Skylake (and maybe Haswell), and it's possible that might have effects on performance.
More generally, I think this is a reminder that performance in a high level language is fragile, and at the whim of the compiler. If we want robust performance, we probably need some way to lock things in once we have what we want.
--nate
from croaring.
@nkurz Nah. It is not a compiler issue.
With the magic of git
, I can move back and forth in time and see exactly at which point the performance went down (using the same compiler and machine).
More generally, I think this is a reminder that performance in a high level language is fragile, and at the whim of the compiler. If we want robust performance, we probably need some way to lock things in once we have what we want.
My intention was just to rename the enums and the defines so that they are all prefixed by "ROARING_". This should not impact the generated binaries in the least. But it did something bad.
from croaring.
Here is the benchmark I use...
https://github.com/RoaringBitmap/CBitmapCompetition#results
(I published it so one could refer to it in the future.)
from croaring.
@nkurz There is one ill-effect from the compiler upgrade however... The sanitize mode is now broken. I doubt it is your doing per se, but I think it is related to the new compiler.
$ mkdir debug
dlemire@skylake:~/CVS/github/CRoaring$ cmake -DCMAKE_BUILD_TYPE=Debug -DSANITIZE=ON ..
CMake Error: The source directory "/home/dlemire/CVS/github" does not appear to contain CMakeLists.txt.
Specify --help for usage, or press the help button on the CMake GUI.
dlemire@skylake:~/CVS/github/CRoaring$ cd debug/
dlemire@skylake:~/CVS/github/CRoaring/debug$ cmake -DCMAKE_BUILD_TYPE=Debug -DSANITIZE=ON ..
-- The C compiler identification is GNU 6.2.0
-- The CXX compiler identification is GNU 6.2.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- BENCHMARK_DATA_DIR: /home/dlemire/CVS/github/CRoaring/benchmarks/realdata/
-- TEST_DATA_DIR: /home/dlemire/CVS/github/CRoaring/tests/testdata/
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- CMAKE_BUILD_TYPE: Debug
-- DISABLE_AVX: OFF
-- BUILD_STATIC: OFF
-- BUILD_LTO: OFF
-- SANITIZE: ON
-- CMAKE_C_COMPILER: /usr/bin/cc
-- CMAKE_C_FLAGS: -std=c11 -fPIC -march=native -Wall -Winline -Wshadow -Wextra -pedantic -fsanitize=address -fno-omit-frame-pointer -fsanitize=undefined
-- CMAKE_C_FLAGS_DEBUG: -ggdb
-- CMAKE_C_FLAGS_RELEASE: -O3
-- Configuring done
-- Generating done
-- Build files have been written to: /home/dlemire/CVS/github/CRoaring/debug
dlemire@skylake:~/CVS/github/CRoaring/debug$ make
Scanning dependencies of target roaring
[ 3%] Building C object src/CMakeFiles/roaring.dir/array_util.c.o
[ 6%] Building C object src/CMakeFiles/roaring.dir/bitset_util.c.o
[ 9%] Building C object src/CMakeFiles/roaring.dir/containers/array.c.o
[ 12%] Building C object src/CMakeFiles/roaring.dir/containers/bitset.c.o
[ 16%] Building C object src/CMakeFiles/roaring.dir/containers/containers.c.o
[ 19%] Building C object src/CMakeFiles/roaring.dir/containers/convert.c.o
[ 22%] Building C object src/CMakeFiles/roaring.dir/containers/mixed_intersection.c.o
[ 25%] Building C object src/CMakeFiles/roaring.dir/containers/mixed_union.c.o
[ 29%] Building C object src/CMakeFiles/roaring.dir/containers/mixed_equal.c.o
[ 32%] Building C object src/CMakeFiles/roaring.dir/containers/mixed_negation.c.o
[ 35%] Building C object src/CMakeFiles/roaring.dir/containers/mixed_xor.c.o
[ 38%] Building C object src/CMakeFiles/roaring.dir/containers/mixed_andnot.c.o
[ 41%] Building C object src/CMakeFiles/roaring.dir/containers/run.c.o
[ 45%] Building C object src/CMakeFiles/roaring.dir/roaring.c.o
[ 48%] Building C object src/CMakeFiles/roaring.dir/roaring_priority_queue.c.o
[ 51%] Building C object src/CMakeFiles/roaring.dir/roaring_array.c.o
Linking C shared library ../libroaring.so
/usr/bin/ld: unrecognized option '--push-state'
/usr/bin/ld: use the --help option for usage information
collect2: error: ld returned 1 exit status
make[2]: *** [libroaring.so] Error 1
make[1]: *** [src/CMakeFiles/roaring.dir/all] Error 2
make: *** [all] Error 2
from croaring.
@nkurz (Thankfully I can run the tests with the sanitizer on other machines so it is not bad.)
from croaring.
With the magic of git, I can move back and forth in time and see exactly
at which point the performance went down (using the same compiler and
machine).
Yes, I didn't know if you'd tested the new version with the old compiler to be sure, and vice versa.
It is not a compiler issue.
...
My intention was just to rename the enums and the defines so that
they are all prefixed by "ROARING_". This should not impact the
generated binaries in the least. But it did something bad.
When you rename a variable and the performance drops, I tend to think of that as a compiler issue.
There is one ill-effect from the compiler upgrade however...
The sanitize mode is now broken. I doubt it is your doing per se,
but I think it is related to the new compiler.
Seems like Ubuntu's doing rather than mine, but I think I patched it with d5a081a. Separately, I noticed that the "make test" has failed tests on Skylake, but I presume you are already aware of this.
from croaring.
Yes, I was working on fixing a minor issue.
from croaring.
- Header files which are not part of public interface (cannot be reached from
roaring.h
) should be moved frominclude/
intosrc/
directory. Where possible, public interface should be reduced. - Public interface should not depend on USEAVX and other similar definitions. For example, currently I see that run.h is referenced by roaring.h (roaring.h -> roaring_array.h -> containers.h -> run.h) and its body depends on USEAVX. Simple prerelase test:
grep -R USEAVX include/ >/dev/null && echo "error"
.
from croaring.
@andreigudkov PR invited!!!
But care is needed because we have an outstanding unmerged PR.
from croaring.
Related Issues (20)
- Roaring64 iterator cannot come back from the edge
- Roaring64 iterator's move_equalorlarger can only go forward
- Copying a Roaring64MapSetBitForwardIterator into an iterator from another map does not work HOT 1
- Feature request: intersection, union, and difference with the tail of another bitmap HOT 4
- Segfault with roaring64_bitmap_flip_inplace HOT 4
- roaring64_iterator_move_equalorlarger cannot come back from the end
- How to use `containsRange` in roaring64 HOT 1
- Improve lazyor in CRoaring HOT 1
- Compiling croaring as C++ fails on Windows
- No github release/tag for 3.0.0 HOT 1
- Compilation issues with oldest clang versions with 3.0.0 HOT 1
- Allow allocating container in memory pool HOT 6
- Add a function to move the iterator to be equal or smaller than a value HOT 5
- Implement roaring64_bitmap_statistics HOT 10
- Avoid calculating the sum of items in bitmap statistics HOT 3
- Roaring - OR Operation - High memory usage in Oracle Linux 8(OEL8) when compared to Oracle Linux 7 (OEL7) HOT 7
- access the nullptr in function "run_container_is_full" HOT 5
- Rewrite the C++ Roaring64Map implementation to use 64-bit croaring HOT 4
- Roaring64 doesn't handle out of memory HOT 9
- Roaring64 segfault empty bitmap intersect with range
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from croaring.