Comments (4)
@ed255 do you know which FFT algo we are using?
With Cooley–Tukey FFT algorithm, we pad the evaluations to the closest power of 2 before doing iFFTs (as the FFTs on power of 2s are much cheaper).
We should be using iFFTs directly for the commitment but also for polynomial multiplications and so the end polynomial degree can be much higher (unless you already posted that in the stats PR).
So memory-wise, we may have higher gain than what you said.
from zkevm-circuits.
Notice the benchmark are taking on k = 26 with 32 chunks.
What do you think about another straw-man idea by trading smaller k with larger chunks ?
Ideally if we reduce k in a magnitude, memory consumption during proof generation should be cut also nearly half.
Under current aggregate proof scheme, if only one prover generating all chunk proof, each chunk proof after generated can be discard.
If we adopt multiple prover, the overall latency should also being improved thanks to parallelism
from zkevm-circuits.
@ed255 do you know which FFT algo we are using?
With Cooley–Tukey FFT algorithm, we pad the evaluations to the closest power of 2 before doing iFFTs (as the FFTs on power of 2s are much cheaper).
In halo2 all polynomials are stored in vectors, and these vectors are always preallocated with sizes power of 2, so I would say the padding happens implicitly (it's just elements in the vector that are not assigned and have 0 by default).
We should be using iFFTs directly for the commitment but also for polynomial multiplications and so the end polynomial degree can be much higher (unless you already posted that in the stats PR).
The stats PR already considers the polynomials in the extended domain (which depends on the max expression degree). Is this what you mean? I believe the biggest source of memory consumption comes from the polynomials in the extended domain.
So memory-wise, we may have higher gain than what you said.
On a related note, the numbers of the stats utility are theoretical. In practice the memory usage of the process may be higher due to:
- allocations that we didn't consider, for example coming from iterators? (but maybe we could study those if they appear and try to fix them)
- things we may have missed?
Notice the benchmark are taking on k = 26 with 32 chunks.
What do you think about another straw-man idea by trading smaller k with larger chunks ? Ideally if we reduce k in a magnitude, memory consumption during proof generation should be cut also nearly half. Under current aggregate proof scheme, if only one prover generating all chunk proof, each chunk proof after generated can be discard. If we adopt multiple prover, the overall latency should also being improved thanks to parallelism.
Yes! I think that's something we could easily do now. On one hand we have to dimensions: memory and compute; and by changing the k
(and thus the number of chunks) we have different memory and compute values (which may be a tradeoff; I assume at some point we can trade memory by compute and vice-versa). As you say, the good thing about compute is that we can parallelize or distribute the work (due to aggregation) and reduce memory, increase compute but not increase time (because we add more machines).
On the other hand, I think it would be great to find the sweet spot of the aggregation configuration:
- If the circuit is too small, the aggregation overhead is too high overall
- If the circuit is too big, the aggregation overhead is small (but maybe the aggregation proof takes longer?)
- moreover, we can play with the k, and somehow with the number of advice columns.
- If we have an aggregation tree, how many children should a node have?
from zkevm-circuits.
The stats PR already considers the polynomials in the extended domain (which depends on the max expression degree). Is this what you mean? I believe the biggest source of memory consumption comes from the polynomials in the extended domain.
I was not sure it was the extended domain, but you clarified this. I agree with you!
On a related note, the numbers of the stats utility are theoretical. In practice the memory usage of the process may be higher due to:
allocations that we didn't consider, for example coming from iterators? (but maybe we could study those if they appear and try to fix them)
things we may have missed?
On top of my head, the usual biggest costs are iFFTs, commitment and computing the quotient polynomial. However, because of the nb of columns we have, I wouldn't be surprised is the permutation argument (even if we split the poly) is very costly too.
Also, one straightforward thing that we can do for the time being (before thinking of merging columns and so on) is check if the FFT algo we use is sparse-friendly.
from zkevm-circuits.
Related Issues (20)
- Minor style fix in rw_counter in end_tx HOT 2
- Modified extension gadget missing constraints HOT 1
- Two `Word` types in zkEVM codebase HOT 3
- MPT extension node turns into branch issue detected by light client
- Add back valid tx check when the invalid tx feature is disabled
- Add mainnet block testing to the MPT codebase
- MPT StorageDoesNotExist
- feat(testool): support eip2930 / eip1559 txs
- [MPT - stack trie] fix witness generation HOT 2
- [MPT - stack trie] circuit implementation - tx leaf
- Implement `MCOPY` instruction HOT 1
- Implement `TLOAD` and `TSTORE` instructions
- Testool oneliner does not work
- [Testool] Panic_attempt to subtract with overflow
- [testool] Potential improvement HOT 2
- [proof-chunk] implement uncompleted features in bus-mapping and refactor
- A Typo detection CI automation HOT 2
- nondeterministic circuit generation in integration test HOT 2
- EVM Circuit: block.table_assignment introduces non fixed entries in fixed columns
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zkevm-circuits.