roaring_bitmap_contains and roaring_bitmap_add should be inlineable,about roaringbitmap/croaring

Comments (8)

lemire commented on August 26, 2024

from croaring.

nkurz commented on August 26, 2024

On Wed, Jul 27, 2016 at 12:31 PM, Daniel Lemire
[email protected] wrote:

Currently, the functions roaring_bitmap_contains and roaring_bitmap_add are not normally inlineable by a compiler such as GCC. Moreover, many of the functions they depend on are similarly not inlineable. This suggests that they will suffer from function-call overheads. Because these functions could be called thousands of times, this overhead could be a performance concern.

It's incredibly difficult to force a compiler to do something it
doesn't want to do. It's almost impossible to force all compilers to
do what you want unless it's explicitly required by the spec, and even
then it might take waiting years for bug reports to be dealt with, and
only new versions of the compiler will be fixed.

Inline is a now mentioned in the C99 spec, but I don't think there is
any standard way to specify that a function must be inlined --- it's
always just a suggestion. Different versions of C have different
ways of treating this suggestion, and the different versions of C++
have their own rules. Until quite recently, there were serious and
insidious bugs in the way that clang++ and g++ handled inline
functions: http://www.playingwithpointers.com/ipo-and-derefinement.html.

If one actually wants to guarantee a particular level of performance,
it cannot be done universally using inline hinting. You either need to
target a particular compiler/version/processor combination (or
multiple with ifdef's), or switch to a different (likely lower level)
language. This way lies either madness or lack of (performance)
portability.

While accepting lack of performance portability might be acceptable,
there is a working alternative. You can simply #define macros that
work at the text substitution level, which from the point of view of
the compiler are identical repeating the code in each function. While
they would be allowed to do so, I don't believe any current compilers
will de-inline a single function into multiple sub-functions.

I believe the only downside is that you will be mocked by those who
will tell you this is a "dangerous bad practice". I'm not sure what
the danger is, but I think it's related to "worshipping false idols"
and having insufficient faith in the omniscience of one's compiler.
Or insufficient faith in all compilers, presuming the purpose is
portability?

Anyway, if willing to tolerate the scorn, and unless there is a
compelling reason to the contrary, I'd suggest that the performance
benefit of inlining could be better handled with old-style macros
instead of non-stardard compiler hints.

--nate

from croaring.

lemire commented on August 26, 2024

@nkurz

Good points.

Inline is a now mentioned in the C99 spec, but I don't think there is any standard way to specify that a function must be inlined --- it's always just a suggestion.

I should point out that I did not have in mind to use the inline keyword. By inlineable, I mean "the definition is in the include file". Without a flag like -flto, I think we can be sure that the lack of definition in the header files ensure that a compiler like GCC will never using inlining. I tried playing with -flto, but it seems to be quite a bit more involved than I expected... see for example http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html To put it another way, it seems to make the build more involved and maybe quite a bit slower... I am told that Visual Studio is smarter in the sense that it has something like the -flto flag enables by default... but, in any case, we clearly cannot assume that people use LTO. This means that if a definition is not in a header file, we should assume that there won't be any inlining, ever.

Speaking for myself, I am not at all opposed to using macros to force code inlining, especially if it improves performance. Our code base does this already at places. I am pretty sure that the standard library uses macros for "inline" functions.

My concern here is that a function like roaring_bitmap_contains might draw in a lot of code. Recall that it does some work to determine whether a container needs to be queried, and then it branches on the container type... Do we inline everything? It is maybe too much.

We have lots of code that could be inlined, for example, the binary search function:

https://github.com/RoaringBitmap/CRoaring/blob/master/src/array_util.c#L14

... which gets called from the array container contains function...

https://github.com/RoaringBitmap/CRoaring/blob/master/src/containers/array.c#L179

... which gets called from container_contains

CRoaring/include/roaring/containers/containers.h

Line 452 in 1332e05

static inline bool container_contains(const void *container, uint16_t val,

... called from roaring_bitmap_contains...

CRoaring/src/roaring.c

Line 271 in 1332e05

bool roaring_bitmap_contains(const roaring_bitmap_t *r, uint32_t val) {

Though I might, hopefully, be wrong, there could be three distinct function calls in a raw embedded in a single call to roaring_bitmap_contains. Suppose someone is calling this function thousands of times?

What is the best design for performance? Complete inlining? Partial inlining?

Short of having any benchmarking, I'd rather trust compiler heuristics.

What I am proposing is to give GCC the option of inlining if GCC's heuristics conclude that it is the right thing. It is about giving the compiler the option to do the right thing... Of course, the compiler can get it wrong.

So I am saying that there are many functions, including some binary search functions, that should be moved to the header files... to give the compiler more options.

While they would be allowed to do so, I don't believe any current compilers will de-inline a single function into multiple sub-functions.

I believe you are right... but the downside of this amount of power is that if you are misusing macros, you can end up with larger binaries, longer compiler times, and worse performance.

Until quite recently, there were serious and insidious bugs in the way that clang++ and g++ handled inline functions: http://www.playingwithpointers.com/ipo-and-derefinement.html.

I have not read this reference, but inline from C++ is not quite the same as inline from C99 so the issue might or might not be the same. My impression is that the semantics of inline is weaker in C99 than in C++.

from croaring.

nkurz commented on August 26, 2024

On Wed, Jul 27, 2016 at 2:13 PM, Daniel Lemire [email protected] wrote:

I should point out that I did not have in mind to use the inline keyword. By inlineable, I mean "the definition is in the include file".

OK, that's reasonable. I personally feel that including C files can
occasionally be useful too.

Without a flag like -flto, I think we can be sure that the lack of definition in the header files ensure that a compiler like GCC will never using inlining.

While currently true, I wouldn't want to depend on this continuing to
be the case. If it's a legal optimization, we should expect at least
some compilers to start doing it by default.

I tried playing with -flto, but it seems to be quite a bit more involved than I expected...

I've had very good luck with it lately across clang, gcc, and icc.
Or at least, I've found it very easy to use although the gains are
often not very large.

in any case, we clearly cannot assume that people use LTO. This means that if a definition is not in a header file, we should assume that there won't be any inlining, ever.

I'd invert that: if it's textually included in the code, we can
assume that it is inlined. In all other cases, it will depend on the
user's choice of compiler and flags.

My concern here is that a function like roaring_bitmap_contains might draw in a lot of code. Recall that it does some work to determine whether a container needs to be queried, and then it branches on the container type... Do we inline everything? It is maybe too much.

I agree completely: inlining everything is not a good strategy
(although perhaps a better strategy than one might expect).

What is the best design for performance? Complete inlining? Partial inlining?

Short of having any benchmarking, I'd rather trust compiler heuristics.

Yes, but what this really means is that we need to do benchmarking if
we care about performance. And once we have a performance level we
find acceptable, we can either mandate that particular choice of
compiler and flags, or figure out some way to portably "lock in" that
level of performance.

What I am proposing is to give GCC the option of inlining if GCC's heuristics conclude that it is the right thing. It is about giving the compiler the option to do the right thing... Of course, the compiler can get it wrong.

And if two compilers (or set of flags) have different performance
levels, at least one of them "got it wrong", at least for that target.
Going farther, I'd assert that even if two compilers have the same
level of performance, it's likely that both of them "got it wrong".
Matching performance with non-binary identical code likely just means
that each made similar but different good and bad choices for
individual functions, rather than indicating that each made the same
choices. The "best of the best" approach will beat both.

So I am saying that there are many functions, including some binary search functions, that should be moved to the header files... to give the compiler more options.

I don't think this is a good idea, at least not purely for performance
reasons. Link-time-optimization is far enough along that it is a
better solution for all but the most sensitive loops. Even without
that, SQLite's "amalgamation" approach works about as well:
https://www.sqlite.org/amalgamation.html

The "no headers" approach with included C files strikes me as more appealing.

While they would be allowed to do so, I don't believe any current compilers will de-inline a single function into multiple sub-functions.

I believe you are right... but the downside of this amount of power is that if you are misusing macros, you can end up with larger binaries, longer compiler times, and worse performance.

I agree, although the longer compile times are the same (or worse)
when putting more code into headers. Macros would need to be used
sparingly, and only in cases where there is predetermined (and
preferably measured) benefit.

Really what I'd be looking for is a way to find the best performing
alternative for a given processor, and make it available. I think
this would require committing to a particular compiler/version/flags
choice, or relying primarily on binary or assembly distribution with
source compilation treated as a backup.

Until quite recently, there were serious and insidious bugs in the way that clang++ and g++ handled inline functions: http://www.playingwithpointers.com/ipo-and-derefinement.html.

I have not read this reference, but inline from C++ is not quite the same as inline from C99 so the issue might or might not be the same. My impression is that the semantics of inline is weaker in C99 than in C++.

Yes, although if we care about MSVC it's worth noting that our code
would be compiled with a C++ compiler. Microsoft no longer offers a
straight C compiler.

--nate

from croaring.

lemire commented on August 26, 2024

Even without that, SQLite's "amalgamation" approach works about as well: https://www.sqlite.org/amalgamation.html The "no headers" approach with included C files strikes me as more appealing.

Yes. It is appealing and would solve this issue as well as others.

Link-time-optimization is far enough along that it is a better solution

Your statement seems to imply we enjoy widespread LTO support.

Here is what StackOverflow says about CMake and LTO:

"according to source code there is no support for LTO for GCC/clang compilers. They have some support for Intel compiler only "

http://stackoverflow.com/questions/31355692/cmake-support-for-gccs-link-time-optimization-lto

It is possible that this information is out-of-date or incorrect... but the question is from a year ago. I checked myself the CMake source code and could indeed only find an Intel implementation...

https://github.com/Kitware/CMake/blob/1d4ab06a7045edf366c689ba5e29bbc35d08718e/Modules/Platform/Linux-Intel.cmake#L44-L48

Because CMake is extensible, it should be possible to extend LTO support...

I should point out that my doubts regarding LTO are falsifiable... Someone can prove me wrong simply by explaining how to modify the CMake build so as to have LTO working. Yes, I know that there is a flag (https://cmake.org/cmake/help/v3.0/prop_tgt/INTERPROCEDURAL_OPTIMIZATION.html)

Now, of course you could point out: let us not use CMake. But then you have to go down another rabbit hole where you work on providing portable build using some other means, possibly with a Makefile.

Yes, although if we care about MSVC it's worth noting that our code would be compiled with a C++ compiler. Microsoft no longer offers a straight C compiler.

I thought that Microsoft was switching to clang as a front-end?

http://www.theregister.co.uk/2015/10/21/microsoft_promises_clang_for_windows_in_november_visual_c_update/

from croaring.

nkurz commented on August 26, 2024

On Wed, Jul 27, 2016 at 5:09 PM, Daniel Lemire [email protected] wrote:

Your statement seems to imply we enjoy widespread LTO support.

Here is what StackOverflow says about CMake and LTO:

I'm mostly ignorant of CMake. I based my statement (among other
things) on being able to add -flto to the CFLAGS in the Makefile for
Gigablast (a mess of complicated C++) and having it compile without
problems. I'm pretty sure that this was true for g++, icpc, and
clang++, although maybe I needed something extra or slight variation
for one of them. I presume there's a way to add a command line flag
to CMake?

I should point out that my doubts regarding LTO are falsifiable... Someone can prove me wrong simply by explaining how to modify the CMake build so as to have LTO working.

I'll try to take a stab at it. Could you more explicitly tell me
which repository and branch I should try it for? And short directions
for how it should be compiled without LTO?

Now, of course you could point out: let us not use CMake. But then you have to go down another rabbit hole where you work on providing portable build using some other means, possibly with a Makefile.

I am cynical, and generally of the opinion that generic portability is
not fully compatible with performance, nor worth too much as a
tradeoff. Given my druthers, I'd prefer to make something that is
incredibly fast on current generation Intel running Linux to something
80% as performant but possible to compile on OS/2. This doesn't
exclude CMake, but I wouldn't let it stand in the way of maximizing
performance on the systems we are most interested in (whatever those
may be).

I thought that Microsoft was switching to clang as a front-end?

I guess I had heard that rumor, although I wasn't sure what it meant,
or how it affects the C/C++ compatibility. Given the subtle
differences between the two, though, I'd suspect that if they want
full compatibility with existing source (including "C" programs
currently compiled with MSVC) they will need to stay with interpreting
everything as C++. But quite possibly not.

--nate

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

from croaring.

lemire commented on August 26, 2024

@nkurz

I pretty much have the amalgamation option worked out. I am testing it out as we speak.

So if you agree that amalgamation is best then I would drop the LTO angle.

I really like amalgamation, personally. And, hey!, if it is good enough for sqlite, it is good enough for me!

I'm pretty sure that this was true for g++, icpc, and clang++, although maybe I needed something extra or slight variation for one of them. I presume there's a way to add a command line flag to CMake?

Adding command line flags to CMake is trivial. But CMake turns everything into libraries. The standard linkers (ar) do not support LTO. Various web pages mention using something called gcc-ar which I do not have on my Mac but which I find on Linux.

I'll try to take a stab at it. Could you more explicitly tell me which repository and branch I should try it for? And short directions for how it should be compiled without LTO?

Grab
https://github.com/RoaringBitmap/CRoaring

Then do

mkdir -p build
cd build
cmake ..
make

One can add flags here:

https://github.com/RoaringBitmap/CRoaring/blob/master/tools/cmake/FindOptions.cmake

With amalgamation, it is going to get even easier... ;-)

I am cynical, and generally of the opinion that generic portability is not fully compatible with performance, nor worth too much as a tradeoff.

I'm kind of neutral on such issues... but CRoaring uses CMake for the time being and there are many good reasons to keep this going.

I feel that with amalgamation, you get to ignore CMake (as a user), so that's great, no?

I guess I had heard that rumor, although I wasn't sure what it meant, or how it affects the C/C++ compatibility. Given the subtle differences between the two, though, I'd suspect that if they want full compatibility with existing source (including "C" programs currently compiled with MSVC) they will need to stay with interpreting everything as C++. But quite possibly not.

My interpretation is different. I think that Microsoft wants Linux code to compile "as is" on Microsoft Windows... so clang is used as an optional front-end to compile C and C++...

What you suggest would be the reverse... keeping the same front-end but using LLVM for backend compilation... That would keep the same Windows C++ but compile it differently... That's not what they are going for.

So I am very seriously expecting the current CRoaring code to soon build "as is" on Windows through Visual Studio. It won't compile to the same binaries as it would under Linux, of course.

from croaring.

lemire commented on August 26, 2024

I consider that this has been resolved by amalgamation...

5380189

People who disagree should reopen the issue.

from croaring.

roaring_bitmap_contains and roaring_bitmap_add should be inlineable about croaring HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs