GithubHelp home page GithubHelp logo

Comments (2)

malik672 avatar malik672 commented on July 28, 2024

@puma314 assign this to me

from sp1.

drcapybara avatar drcapybara commented on July 28, 2024

Comparison of deserialization strategies

Summary

During our performance testing and benchmarking of deserialization processes, we observed a consistent minimum of approximately 4.6 million x86 cycles consumed when using rkyv, regardless of the data size. We compare this with bincode results and explore the performance of a pointer cast operation.

Details

In our benchmarks using the rkyv and bincode libraries for deserialization, we've noted that the number of cycles consumed with rkyv does not scale with data size, showing a fixed minimum cycle count suggestive of zero-copy deserialization mechanics.

Cycle Count Comparison

The following table summarizes the cycle counts observed for deserialization using transmute, rkyv, and bincode across various data sizes:

Data Size transmute rkyv bincode
Small (100KB) 607 4,741,032 18,479,850
Medium (5MB) 548 4,990,638 681,913,198
Large (50MB) 762 4,638,140 6,730,678,512

Interpretation

  • Transmute: Shows minimal cycle counts across all data sizes, indicating a highly efficient direct memory mapping process.
  • rkyv: Displays consistent cycle counts around 4.6 million, supporting the hypothesis of a fixed minimum cycle count due to zero-copy mechanics.
  • bincode: Cycle counts increase significantly with data size, indicating less efficiency for larger datasets.

Additional Findings: Pointer-Cast Deserialization

In another series of tests, deserializing data via a simple pointer cast to a struct with no heap-allocated data, we observed a dramatically reduced cycle count of roughly 600 cycles per 100,000 bytes, for the following simple struct:

#[derive(Debug)]
struct TestStruct {
    int: u8,
    string: [u8; 24],    // Mocking a fixed-size array instead of a String
    data: [u8; 100_000], // Mocking a fixed-size array instead of a Vec<u8>
}

This method is highly efficient but limited to scenarios where data structures are compatible with direct memory mapping. Despite also being cartoonishly dangerous, the prospect of completely zero-cost deserialzation in the zkVM may be an avenue worth exploring. See this for further reading and exploration of this technique.

Further, the include_bytes_aligned! macro may also satisfy the requirement to read machine words with u32 alignment from bytes embedded within the RISCV binary.

Efficiency Considerations for Smaller Datasets

While rkyv demonstrates excellent efficiency for large datasets with its low and consistent cycle count, it's important to note that bincode may be more efficient for scenarios where deserialization consumes less than ~4.6 million cycles. This is particularly relevant for smaller types where the overhead of rkyv's zero-copy mechanics does not offset its benefits.

Proposed Strategy

Given the differences in performance characteristics, a possible strategy is to offer different ways to read and write data into the zkVM, best suited for the task at hand:

  • Use bincode for smaller types and data where the lower absolute number of cycles leads to faster operations.
  • Use rkyv for larger types and data or applications

This approach allows users to optimize their applications by selecting the most appropriate serialization method, potentially improving performance and efficiency across various use cases.

References

from sp1.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.