I am worried about the current architecture and the memory consumption it entails. I d

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Memory requirements for generating a full block proof about zkevm-circuits HOT 4 OPEN

ed255 commented on August 14, 2024 3

Memory requirements for generating a full block proof

from zkevm-circuits.

Comments (4)

rrtoledo commented on August 14, 2024

@ed255 do you know which FFT algo we are using?

With ‎Cooley–Tukey FFT algorithm, we pad the evaluations to the closest power of 2 before doing iFFTs (as the FFTs on power of 2s are much cheaper).

We should be using iFFTs directly for the commitment but also for polynomial multiplications and so the end polynomial degree can be much higher (unless you already posted that in the stats PR).

So memory-wise, we may have higher gain than what you said.

from zkevm-circuits.

hero78119 commented on August 14, 2024

Notice the benchmark are taking on k = 26 with 32 chunks.

What do you think about another straw-man idea by trading smaller k with larger chunks ?
Ideally if we reduce k in a magnitude, memory consumption during proof generation should be cut also nearly half.
Under current aggregate proof scheme, if only one prover generating all chunk proof, each chunk proof after generated can be discard.
If we adopt multiple prover, the overall latency should also being improved thanks to parallelism

from zkevm-circuits.

ed255 commented on August 14, 2024

@ed255 do you know which FFT algo we are using?

With ‎Cooley–Tukey FFT algorithm, we pad the evaluations to the closest power of 2 before doing iFFTs (as the FFTs on power of 2s are much cheaper).

In halo2 all polynomials are stored in vectors, and these vectors are always preallocated with sizes power of 2, so I would say the padding happens implicitly (it's just elements in the vector that are not assigned and have 0 by default).

We should be using iFFTs directly for the commitment but also for polynomial multiplications and so the end polynomial degree can be much higher (unless you already posted that in the stats PR).

The stats PR already considers the polynomials in the extended domain (which depends on the max expression degree). Is this what you mean? I believe the biggest source of memory consumption comes from the polynomials in the extended domain.

So memory-wise, we may have higher gain than what you said.

On a related note, the numbers of the stats utility are theoretical. In practice the memory usage of the process may be higher due to:

allocations that we didn't consider, for example coming from iterators? (but maybe we could study those if they appear and try to fix them)
things we may have missed?

Notice the benchmark are taking on k = 26 with 32 chunks.

What do you think about another straw-man idea by trading smaller k with larger chunks ? Ideally if we reduce k in a magnitude, memory consumption during proof generation should be cut also nearly half. Under current aggregate proof scheme, if only one prover generating all chunk proof, each chunk proof after generated can be discard. If we adopt multiple prover, the overall latency should also being improved thanks to parallelism.

Yes! I think that's something we could easily do now. On one hand we have to dimensions: memory and compute; and by changing the k (and thus the number of chunks) we have different memory and compute values (which may be a tradeoff; I assume at some point we can trade memory by compute and vice-versa). As you say, the good thing about compute is that we can parallelize or distribute the work (due to aggregation) and reduce memory, increase compute but not increase time (because we add more machines).

On the other hand, I think it would be great to find the sweet spot of the aggregation configuration:

If the circuit is too small, the aggregation overhead is too high overall
If the circuit is too big, the aggregation overhead is small (but maybe the aggregation proof takes longer?)
moreover, we can play with the k, and somehow with the number of advice columns.
If we have an aggregation tree, how many children should a node have?

from zkevm-circuits.

rrtoledo commented on August 14, 2024

The stats PR already considers the polynomials in the extended domain (which depends on the max expression degree). Is this what you mean? I believe the biggest source of memory consumption comes from the polynomials in the extended domain.

I was not sure it was the extended domain, but you clarified this. I agree with you!

On a related note, the numbers of the stats utility are theoretical. In practice the memory usage of the process may be higher due to:
allocations that we didn't consider, for example coming from iterators? (but maybe we could study those if they appear and try to fix them)
things we may have missed?
On top of my head, the usual biggest costs are iFFTs, commitment and computing the quotient polynomial. However, because of the nb of columns we have, I wouldn't be surprised is the permutation argument (even if we split the poly) is very costly too.

Also, one straightforward thing that we can do for the time being (before thinking of merging columns and so on) is check if the FFT algo we use is sparse-friendly.

from zkevm-circuits.

Memory requirements for generating a full block proof about zkevm-circuits HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs