Comments (2)
@puma314 assign this to me
from sp1.
Comparison of deserialization strategies
Summary
During our performance testing and benchmarking of deserialization processes, we observed a consistent minimum of approximately 4.6 million x86 cycles consumed when using rkyv, regardless of the data size. We compare this with bincode results and explore the performance of a pointer cast operation.
Details
In our benchmarks using the rkyv and bincode libraries for deserialization, we've noted that the number of cycles consumed with rkyv does not scale with data size, showing a fixed minimum cycle count suggestive of zero-copy deserialization mechanics.
Cycle Count Comparison
The following table summarizes the cycle counts observed for deserialization using transmute, rkyv, and bincode across various data sizes:
Data Size | transmute | rkyv | bincode |
---|---|---|---|
Small (100KB) | 607 | 4,741,032 | 18,479,850 |
Medium (5MB) | 548 | 4,990,638 | 681,913,198 |
Large (50MB) | 762 | 4,638,140 | 6,730,678,512 |
Interpretation
- Transmute: Shows minimal cycle counts across all data sizes, indicating a highly efficient direct memory mapping process.
- rkyv: Displays consistent cycle counts around 4.6 million, supporting the hypothesis of a fixed minimum cycle count due to zero-copy mechanics.
- bincode: Cycle counts increase significantly with data size, indicating less efficiency for larger datasets.
Additional Findings: Pointer-Cast Deserialization
In another series of tests, deserializing data via a simple pointer cast to a struct with no heap-allocated data, we observed a dramatically reduced cycle count of roughly 600 cycles per 100,000 bytes, for the following simple struct:
#[derive(Debug)]
struct TestStruct {
int: u8,
string: [u8; 24], // Mocking a fixed-size array instead of a String
data: [u8; 100_000], // Mocking a fixed-size array instead of a Vec<u8>
}
This method is highly efficient but limited to scenarios where data structures are compatible with direct memory mapping. Despite also being cartoonishly dangerous, the prospect of completely zero-cost deserialzation in the zkVM may be an avenue worth exploring. See this for further reading and exploration of this technique.
Further, the include_bytes_aligned!
macro may also satisfy the requirement to read machine words with u32 alignment from bytes embedded within the RISCV binary.
Efficiency Considerations for Smaller Datasets
While rkyv demonstrates excellent efficiency for large datasets with its low and consistent cycle count, it's important to note that bincode may be more efficient for scenarios where deserialization consumes less than ~4.6 million cycles. This is particularly relevant for smaller types where the overhead of rkyv's zero-copy mechanics does not offset its benefits.
Proposed Strategy
Given the differences in performance characteristics, a possible strategy is to offer different ways to read and write data into the zkVM, best suited for the task at hand:
- Use bincode for smaller types and data where the lower absolute number of cycles leads to faster operations.
- Use rkyv for larger types and data or applications
This approach allows users to optimize their applications by selecting the most appropriate serialization method, potentially improving performance and efficiency across various use cases.
References
from sp1.
Related Issues (20)
- sp1up: set specific version HOT 2
- cargo prove --version showing commit hash, but not the more friendly tagged version (ex: testnet-1.0.4) HOT 1
- Fibbonacci problem (fresh from sp1-template) OOMs when proving HOT 1
- Plonk bn254 artifacts take extremely long time to load HOT 1
- Is normal that the program gets segmentfault when we run in debug mode? HOT 1
- Dockerfile for executing a sp1 script HOT 2
- Plonky3 EVM verifier? HOT 1
- v1.0.6-testnet: constraint #1685 is not satisfied HOT 1
- Question about the `secp256k1::verify_signature` in precompiles HOT 5
- EVM proving example fails with error HOT 4
- Add max cycles parameter to `ProverClient` `{execute, prove}` HOT 1
- Succinct Toolchain Installation Issue for the Sample Program HOT 1
- Denial of Service on Verifier - Verifier can panic due to assert_eq!() statement HOT 1
- mul_assign will panic if scalar is zero
- Unused variable `shard_main_datas` HOT 1
- The usage about the `secp256k1::ecrecover` in `zkvm::precompiles`
- BN254 Patched Crate
- [docs] Wrong default and path in quickstart guide and program does not allow for proofs for n larger 186
- Problems about `poseidon2_wide` in `recursion`
- error: could not compile `syn` (build script) due to 1 previous error HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sp1.