GithubHelp home page GithubHelp logo

Comments (9)

mratsim avatar mratsim commented on May 24, 2024

One solution at your level would be something like

#define POINT_MULT_SCALAR_W5_IMPL(ptype) \
#if (__GNUC__ == 4 && __GNUC_MINOR__ == 8 && __GNUC_PATCHLEVEL__ == 5) \
__attribute__((optimize("no-tree-vectorize"))) \
#endif \
static void ptype##_gather_booth_w5(ptype *restrict p, const ptype table[16], \
                                    limb_t booth_idx) \
{ \
    size_t i; \
    limb_t booth_sign = (booth_idx >> 5) & 1; \
\
    booth_idx &= 0x1f; \
    vec_zero(p, sizeof(ptype)); /* implicit infinity at table[-1] */\
    /* ~6% with -Os, ~2% with -O3 ... */\
    for (i = 1; i <= 16; i++) \
        ptype##_ccopy(p, table + i - 1, i == booth_idx); \
\
    ptype##_cneg(p, booth_sign); \
} \

(assuming this is the problematic function and problematic compiler version)

from blst.

dot-asm avatar dot-asm commented on May 24, 2024

BLST is incompatible with the following GCC flag: -ftree-loop-vectorize

Why not other way around? :-):-):-) But on serious note, if specific compiler version fails to compile a piece of code, while others can, it speaks rather in favour of compiler bug. This is not to say that it necessarily means compiler bug, but it's first assumption to make.

One solution at your level would be something like

I for one am not big fan of compiler-specific workarounds, but suggestion is not the way to go. Because nested pre-processor directives don't work. But function can have separate declaration with designated attributes... Another way to solve it would be ... more assembly, so that compiler won't be in position to make the self-defeating assumptions...

On side note. Keep in mind that blst is not that dependent on optimization level, because most of the "magic" happens in assembly. In other words difference between -O2 and -O3 is effectively negligible, so you don't actually have to compile blst with -O3. Unless of course if your C code is sensitive to optimization level, and you want to compile everything in the same go...

from blst.

dot-asm avatar dot-asm commented on May 24, 2024

Can you confirm that compiling with -Drestrict= flag helps?

[Just in case for reference, this is not a suggested solution, just an attempt to pinpoint the problem.]

from blst.

mratsim avatar mratsim commented on May 24, 2024

Why not other way around? :-):-):-) But on serious note, if specific compiler version fails to compile a piece of code, while others can, it speaks rather in favour of compiler bug. This is not to say that it necessarily means compiler bug, but it's first assumption to make.

There is a related GCC bug that has been lurking for 10 years at least, for example x264 https://mailman.videolan.org/pipermail/x264-devel/2010-June/007462.html. Clang doesn't exhibit this which also supports a GCC bug.

Yes -Drestrict= makes blst_sk_to_pk_in_g1 behave

Unless of course if your C code is sensitive to optimization level, and you want to compile everything in the same go...

Yes that's the case, I don't compile BLST as a separate DLL but compile it at the same time as the rest of the Nim/C code.

from blst.

dot-asm avatar dot-asm commented on May 24, 2024

Why not other way around? :-):-):-)

There is a related GCC bug that has been lurking for 10 years at least,

In other words bug is so old that it's considered a feature:-) This is exactly why I'm not fond of compiler-specific workarounds, they effectively let compiler off the hook...

Either way, could you double-check vec_select_n? It's only x86_64 for the moment...

from blst.

dot-asm avatar dot-asm commented on May 24, 2024

vec_select_n is merged. Closing...

from blst.

mratsim avatar mratsim commented on May 24, 2024

Sorry for the late reply, I didn't have time to upgrade earlier .

Unfortunately I seem to still get wrong results with GCC -O3 with the master from yesterday (a8398ed) unless I pass fno-tree-vectorize, despite that branch being merged and f8a77bd

For now I'll keep using fno-tree-vectorize with that compiler.

from blst.

dot-asm avatar dot-asm commented on May 24, 2024

Still? Hmm... Since -Drestrict= helps, I assume it still does, can you test one thing? Drop the qualifiers from ptype##_ccopy in src/point.h.

from blst.

dot-asm avatar dot-asm commented on May 24, 2024

Just in case for reference. restrict qualifiers were added in order to eliminate arguably unjustified branches depending on outcome of pointer comparisons. It's not really a constant-time thing, but rather this-ought-to-complicate-binary-code-validation thing. But one should expect slightly better better performance as well...

from blst.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.