Sagnificant performance degradation for aggregate method in jblst about blst HOT 20 CLOSED

supranational commented on May 28, 2024

Sagnificant performance degradation for aggregate method in jblst

from blst.

Comments (20)

vikulin commented on May 28, 2024

Test v0.1.0-RELEASE:

350 (rand=DRBG[seed=13636363])  	  cur: 72,819 MB/s  	  avg: 72,819 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 137,163 MB/s  	  avg: 95,132 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 140,559 MB/s  	  avg: 106,618 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 146,083 MB/s  	  avg: 114,341 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 146,115 MB/s  	  avg: 119,54 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 144,471 MB/s  	  avg: 123,08 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 146,803 MB/s  	  avg: 125,988 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 149,258 MB/s  	  avg: 128,492 MB/s

Test v0.3.3-1:

Test:testSignatureAggregate
350 (rand=DRBG[seed=13636363])  	  cur: 3.663 MB/s  	  avg: 3.663 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 3.78 MB/s  	  avg: 3.72 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 3.787 MB/s  	  avg: 3.742 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 3.784 MB/s  	  avg: 3.753 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 3.754 MB/s  	  avg: 3.753 MB/s

Test BLS Mikuli old implementation:

Test:testSignatureAggregate
350 (rand=DRBG[seed=13636363])  	  cur: 2,372 MB/s  	  avg: 2,372 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 2,399 MB/s  	  avg: 2,385 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 2,406 MB/s  	  avg: 2,392 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 2,399 MB/s  	  avg: 2,394 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 2,419 MB/s  	  avg: 2,399 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 2,402 MB/s  	  avg: 2,4 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 2,414 MB/s  	  avg: 2,402 MB/s

Test BLS Mikuli recent implementation:

Test:testSignatureAggregate
350 (rand=DRBG[seed=13636363])  	  cur: 23.305 MB/s  	  avg: 23.305 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 28.213 MB/s  	  avg: 25.525 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 26.445 MB/s  	  avg: 25.825 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 27.998 MB/s  	  avg: 26.336 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 28.237 MB/s  	  avg: 26.695 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 28.558 MB/s  	  avg: 26.989 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 28.646 MB/s  	  avg: 27.214 MB/s

from blst.

Nashatyrev commented on May 28, 2024

Here is the aggrgegate() implementation in cpp binding:

    void aggregate(const P1_Affine& in)
    {   if (blst_p1_affine_in_g1(in))
            blst_p1_add_or_double_affine(&point, &point, in);
        else
            throw BLST_POINT_NOT_IN_GROUP;
    }

I bet blst_p1_affine_in_g1 slows down the things

from blst.

dot-asm commented on May 28, 2024

I bet blst_p1_affine_in_g1 slows down the things

Correct bet. The G1 group check is ~70 times the addition. In G2 the ratio is ~30. Either way, it appears that the report is misplaced. It's jblst that chooses and makes the calls, hence the report is something for jblst to address. Keep in mind that points coming from the network are to be group-checked. And it's not impossible to imagine that jblst could simply have moved the group-check from elsewhere. Which would mean that benchmark is not actually representative, one should benchmark complete stack, not just point additions.

from blst.

vikulin commented on May 28, 2024

I bet blst_p1_affine_in_g1 slows down the things

Correct bet. The G1 group check is ~70 times the addition. In G2 the ratio is ~30. Either way, it appears that the report is misplaced. It's jblst that chooses and makes the calls, hence the report is something for jblst to address. Keep in mind that points coming from the network are to be group-checked. And it's not impossible to imagine that jblst could simply have moved the group-check from elsewhere. Which would mean that benchmark is not actually representative, one should benchmark complete stack, not just point additions.

@dot-asm see the code in test:

					long begin = System.nanoTime();
					
					//BLS.aggregateVerify(pubKeys, messages, aggregatedSign);
					BLSSignature aggregatedSign = BLS.aggregate(s);
					byte[] out = aggregatedSign.toBytesCompressed().toArray();
					long end = System.nanoTime();

The aggregate is called with 1000 signatures and the operation re-check it in loop collecting time delta. Where is my mistake?

from blst.

vikulin commented on May 28, 2024

Can anyone create a test using C++ binding aggregate call and run it in a loop for comparison with previous implementation? I used 1000 messages with 350 bytes each for one cycle. Then I signed it, collected all signatures in a list then called BLS.aggregate(s). Collected time delta and so on.

from blst.

dot-asm commented on May 28, 2024

What I mean is that one should ask how many signatures whole application can process in unit of time, and do so securely, not any chosen loop. Again, the issue is something for jblst to resolve.

from blst.

vikulin commented on May 28, 2024

What I mean is that one should ask how many signatures whole application can process in unit of time, and do so securely, not any chosen loop. Again, the issue is something for jblst to resolve.

Not agree, the final performance depends of total payload size which might vary a lot. So benchmark should show IN MB/sec which reflecting actual bandwidth of BLS lib.
My question: what do you mean securely?

from blst.

dot-asm commented on May 28, 2024

I used 1000 messages with 350 bytes each for one cycle. Then I signed it, collected all signatures in a list then called BLS.aggregate(s)

In real life application wouldn't sign messages to just verify signatures by itself. Signatures would be passed to somebody else. That somebody else has to perform group check on each individual signature prior aggregating them. Since this operation will be dominated by the group checks, benchmarking point additions in isolation is not representative.

from blst.

vikulin commented on May 28, 2024

I used 1000 messages with 350 bytes each for one cycle. Then I signed it, collected all signatures in a list then called BLS.aggregate(s)

In real life application wouldn't sign messages to just verify signatures by itself. Signatures would be passed to somebody else. That somebody else has to perform group check on each individual signature prior aggregating them. Since this operation will be dominated by the group checks, benchmarking point additions in isolation is not representative.

This is why I splited out aggregateVerify and aggregate calls to determine where the slow down comes from. Also jblst confirmed fix which should be applied on bls native part. See the comment

Consensys/jblst#12 (comment)

from blst.

sean-sn commented on May 28, 2024

As @benjaminion mentions in the jblst issue, the point is where does the check go. It needs to be somewhere and how that impacts a synthetic benchmark is not of real concern as @dot-asm mentioned. @vikulin where do you propose the group check should go in the application? Do you have a different application than Teku?

from blst.

vikulin commented on May 28, 2024

As @benjaminion mentions in the jblst issue, the point is where does the check go. It needs to be somewhere and how that impacts a synthetic benchmark is not of real concern as @dot-asm mentioned. @vikulin where do you propose the group check should go in the application? Do you have a different application than Teku?

Yes, I have different app which requires aggregate call to be used separately. if I understand correctly aggregate is preparing aggregated signatures which can be used once for group check (aggregateVerify) but the group check can be executed somewhere else after the aggregate is done.

from blst.

sean-sn commented on May 28, 2024

If you are taking signatures off the wire then group checking them would be appropriate prior to adding them to an aggregated point. If you generate the signatures all yourself and trust the validity then you can get away with not group checking prior to aggregation. The blst cpp binding provides mechanisms to perform the group checks and additions independently. Or if one chooses to do both at the same time then a call to aggregate may be made. In blst the aggregate() member function would look exactly like the add() member function if the group check was not in there. Taking the check out of aggregate() would probably mean just getting rid of the function itself. Therefore I do not see any changes required within blst at this time. As @dot-asm mentioned, this is an issue that needs to be resolved within jblst or your application.

from blst.

vikulin commented on May 28, 2024

Or if one chooses to do both at the same time then a call to aggregate may be made.

@sean-sn what if I chose both at the same time? The performance is still went down. I'm sorry bit I still don't see any good arguments why aggregated call should not be fixed. As I mentioned comparison is clear: the aggregate call sagnificantly slowed down.

from blst.

benjaminion commented on May 28, 2024

@vikulin Forgive me for adding to the number saying that this is not Blst's problem. You are using Teku's implementation, which uses Jblst, which uses Blst. Nothing changed on the Blst or Jblst side, we just moved some things around in Teku.

The "fix" is easy: if you want to aggregate quickly without the group membership check (and you are confident that it is safe), use P2.add() - this is what we used to do in Teku. If you want to aggregate with the group membership check (as we now do in Teku), then use P2.aggregate().

from blst.

vikulin commented on May 28, 2024

@benjaminion , thanks. But I'm curios - which part of code has been changed to make the fix? Whether it's done?

from blst.

vikulin commented on May 28, 2024

There is no good explanation why the issue was closed. The performance fallen down comparing with v0.1.0. @dot-asm could not explain why this happened.

from blst.

dot-asm commented on May 28, 2024

I reckon that sufficient information was provided in the course of discussion. The fact that last question remained unanswered is not @supranational/blst's fault. As already said, the report is misplaced. It's not about blst implementation, but about choice between two methods, aggregate and add, made elsewhere.

from blst.

vikulin commented on May 28, 2024

@dot-asm add method has never been used since it's not a public method in blst.

from blst.

dot-asm commented on May 28, 2024

@dot-asm add method has never been used since it's not a public method in blst.

So following your logic, blst.hpp has no add method, hence aggregate had to be used all alone. But here is the problem, aggregate method didn't change since its initial implementation in blst.hpp, it always performed the expensive group-check...

Just in case, ellipsis at the end of previous paragraph is not an invitation for further discussion. In fact, I plan to abstain from further discussion, because it's getting circular. Get your logic straight! (But don't expect somebody else to straighten it up for you:-)

from blst.

vikulin commented on May 28, 2024

aggregate method didn't change since its initial implementation in blst.hpp, it always performed the expensive group-check...

@dot-asm you could close the issue right after it was created if you mentioned this and you would not spend so much time for the discussion. Now it's clear for me.

from blst.

Sagnificant performance degradation for aggregate method in jblst about blst HOT 20 CLOSED

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs