0xpolygonzero / zk_evm Goto Github PK

License: Apache License 2.0

Rust 73.50% HTML 0.05% Assembly 25.99% Dockerfile 0.09% Shell 0.37%

zk_evm's Introduction

zk_evm

A collection of libraries to prove Ethereum blocks with Polygon Zero Type 1 zkEVM, powered by starky and plonky2 proving systems.

Directory structure

This repository contains the following Rust crates:

mpt_trie: A collection of types and functions to work with Ethereum Merkle Patricie Tries.
smt_trie: A collection of types and functions to work with Polygon Hermez Sparse Merkle Trees (SMT).
trace_decoder: Flexible protocol designed to process Ethereum clients trace payloads into an IR format that can be understood by the zkEVM prover.
evm_arithmetization: Defines all the STARK constraints and recursive circuits to generate succinct proofs of EVM execution. It uses starky and plonky2 as proving backend: https://github.com/0xPolygonZero/plonky2.
proof_gen: A convenience library for generating proofs from inputs already in Intermediate Representation (IR) format.
zero_bin: A composition of paladin and proof_gen to generate EVM block proofs.

Dependency graph

Below is a simplified view of the dependency graph, including the proving system backends and the application layer defined within zero-bin.

flowchart LR
    subgraph ps [proving systems]
    A1{{plonky2}}
    A2{{starky}}
    end

    ps --> zk_evm

    subgraph zk_evm [zk_evm]
    B[mpt_trie]
    C[evm_arithmetization]
    D[trace_decoder]
    E[proof_gen]

    B --> C
    B ---> D
    C ---> D
    C --> E
    D --> E

    F{zero-bin}
    C --> F
    D --> F
    E --> F
    end

Documentation

Documentation is still incomplete and will be improved over time, a lot of useful material can be found in the docs section, including:

sequence diagrams for the proof generation flow
zkEVM specifications, detailing the underlying EVM proving statement

Branches

The default branch for the repo is the develop branch which is not stable but under active development. Most PRs should target develop. If you need a stable branch then a tagged version of main is what you're after. It should be assumed that develop will break and should only be used for development.

Building

The zkEVM stack currently requires the nightly toolchain, although we may transition to stable in the future. Note that the prover uses the Jemalloc memory allocator due to its superior performance.

License

Licensed under either of

Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

zk_evm's People

Contributors

Stargazers

Watchers

zk_evm's Issues

Create workspace CHANGELOG

Once all libraries are migrated, we should add a CHANGELOG and properly track any changes across version bumps.

Reconfigure or disable `sonarcloud` bot

The current sonarcloud bot is pretty useless, and basically just adding noise to PRs without providing meaningful insights. I guess we could configure it differently if we still are interested in what it can bring, otherwise we could just remove it.

Refactor EVM prover code to support distinct statements

For the sake of maintaining both type-1 and type-2 provers (and possibly additional variants along the way), with minimal code duplication, as well as to facilitate any upgrade that would affect both versions, we need to prover to be modular enough and activate one instance over another through feature flags.

In particular, with respect to type-2, we need:

identify the distinct Kernel parts that need distinct handling
unify the entry / exit point of pieces that would have distinct logic (for instance mpt_trie vs smt_trie), i.e. identical stack modifications
tweak the proving code (cf extra PoseidonStark table and CTLs for instance). We may want to have a dedicated constant module with distinct versions inside, instead of the hardcoded (and possibly changing) values here and there (cf const NATIVE_INSTRUCTIONS: [usize; 13] for instance).

Support Transient Storage Opcodes (tstore/tload) EIP-1153

Improve block hashes fetching

We currently do around 128 RPC queries to get the list of previous block hashes to pass to the prover.
We may improve on this by persisting the previous block hashes in memory. Upon receiving the next block, we'd just need to rotate the vector and replace the item in position 255 with the parent hash.

Fairly minor improvement vs proving cost though.

Migrate `plonky2_evm`

Once https://github.com/0xPolygonZero/plonky2/pull/1503 is merged, we should migrate the plonky2_evm crate before we release it on crates.io, and rename it as well.

Seems the consensus was on evm_arithmetization?

Remove registers and merkle caps from continuation logic `PublicValues`, in the block layer

In continuations, we need to check that registers and memory are passed on correctly from one segment to the next. That means, however, that the final block proof also contains the final merkle caps for MemBefore and MemAfter, as well as the final registers. These values should be removed from the PublicValues in the final block proof.

Fix failing tests for `type2_cancun` branch

The following tests (121 in total) from the Ethereum test suite are currently failing on type2/hermez branch (on plonky2 side), as of commit be7c9a84e07ba00521e12901d66737fbfa6eef11:

Failed tests

Add Cancun tests

We should add a few unit tests for the newly introduced opcodes from Cancun hardfork.

selfdestruct integration test has already been modified based on the new behavior, but we may want to test that actual deletion occurs when the account has just been created.

Address all `TODO`s

Low-priority, but there are a handful of TODOs in the repo that should eventually be addressed. Some require a bit of work, and should really only be taken on when nothing else is that pressing.

Fix all warnings

Add global README

Once all libraries are migrated, we should add a global README to explain the workspace structure and whatnot.

Optimization ideas after initital Kernel benchmark

Here are the current table lengths for the ERC20 test:

Table	Length	Width
Memory	542 726	20
Cpu	262 144	112
Arithmetic	26 167	114
Keccak	13 392	2431
BytePacking	5 688	72
Logic	4 180	523
KeccakSponge	558	437

The CPU is halted after 172,006 cycles. Here is a breakdown of the most expensive operations:

Operation	Cycles
Bootstrapping	12 922
Load MPTs	1 155
Hash state trie	10 218
Hash txn trie	118
Hash receipts trie	115
Txn	124 634
- ECRECOVER	26 828
- RLP decode txn	3 169
- Process txn	4 898
- Load code (388B)	5 443
- Jumpdest check	7 567
- SLOAD	400
- CALL	64 572
-- Load code (1824B)	25 691
-- Jumpdest check	37 402
- SLOAD	427
- SSTORE	960
- SLOAD	450
- SSTORE	1 498
- LOG3	1 376
- RETURN	307
Process receipt	2 862
Hash state trie	13 436
Hash txn trie	801
Hash receipts trie	2 775

Some optimization ideas:

Batch ECDSA: We could batch all the ECRECOVER calls in a block and perform one batch ECDSA verification after the txn proofs. This would amortize the cost of ECRECOVER across the block.
Preinitialize memory segments: We could allow some memory segments to be preinitialized by the prover. That is, we would remove the constraint that memory is initialized at zero for these segments. This would remove the costly bootstrapping, MPT loading, and code loading, potentially saving ~45k cycles in this example.
Optimize Jumpdest check: We could cache the jumpdest analysis across the txn or block to avoid redundant work. For the type-2, we could add the jumpdest destination to the account data, or simply remove the check like in Polygon zkEVM.

Add Prometheus metrics support to the decoder

In the wake of plonky2 Issue #1513, add metrics support to the decoder as well.

Investigate an initial set of metrics we want to track
Add prometheus support, including configuration and sane defaults

Add additional metadata to decoder errors

It would be kind of nice to adjust the error type returned by processing a trace to wrap the actual error with additional metadata along these lines:

Block num: 19240118
Txn idx: 145
Address: 0xfa7093cdd9ee6932b4eb2c9e1cde7ce00b1fa4b9
Hashed address: 0x5f3894a1bacf9d3f41232eb5c05d070fce1a22b37da1d990e09f4c3df82bc9c6
Slot: 0x00000000000000000000000000000000000000000000000000000000000000fe
Slot hashed: 0x54075df80ec1ae6ac9100e1fd0ebf3246c17f5c933137af392011f4c5f61513a
Slot value written: 0x12594b0
Txn hash: 0x6641bb9ae913e1fecddf9e6005e78b57c7877f5fd7acbf66189b2c11728a218f

I'm often needing this information while debugging, and having to not manually extract every time would be worthwhile. Impl should be easy.

Not all of these fields will be available in every single circumstance, but we could sort of dynamically build it up each time based on what is.

Update EVM Tests to Support Type-2 SMT format

We need the test-runner to support the Hermez-style SMT layout in order to test the type-2 zkEVM.

I recall there was some work done in this aread, but there may need some tweaks following recent SMT design switch.

Supporting verifying KZG Commitments for Protodanksharding - EIP-4844

This may not be a priority

Check non-validity of invalid jumpdests

For now we don't check that an invalid jump is actually invalid. A malicious prover could set a valid jumpdest to 0, and the transaction would incorrectly revert.

Add checkpoints to continuation logic

When proving segment N of a transaction, the worker needs to run in the interpreter the entire transaction from the beginning up to the N-th segment. This can incur non-negligible latency when e.g. spawning a new worker after the N-1-th one signals the transaction is still not done.

We should allow the N-1-th worker to communicate the state after the N-1-th segment so that new workers don't have to start from the beginning.

Migrate `proof-protocol-decoder` crates

Move both crates from https://github.com/0xPolygonZero/proof-protocol-decoder to here, with renaming on the go.

protocol_decoder -> trace_parser / trace_decoder?
plonky_block_proof_gen -> proof_generator?

Support EIP-4788: Beacon block root in the EVM

For Cancun upgrade.

Add in trie dumping logic between txns

I currently have some logic that I hacked in to dump all tries before and after txns. It's been useful enough times that I think we should probably get this back into main that is enabled with a prog arg or a feature flag.

Help with customization and deployment to production

Hello, we want to understand who to contact to get help with deployment. The documentation is really low and we need some guidance to join the testing and dev. Proces as we want to be one of first to be running on type 1.

Thanks to pointing us to right direction

Update doc sequence diagrams

The diagrams in docs/ are a bit outdated. We can probably scrap out the old Edge-based setup, and only keep (with update) the one at the top that matches the current stack.

Implement continuation PoC

We want to split individual transactions into different segments, and prove them separately.

Currently debugging with @LindaGuiga.

Move the empty node encoding to pos 0

@ENCODED_EMPTY_NODE_POS is currently set at a very high value (around 2^32), which incurs a lot of padding rows in fill_gaps. We can move it to 0, and then start writing in the RLP segment from 1.

Update EVM Tests to Support Cancun

When the EVM tests repo supports cancun we will need to update 0xPolygonZero/evm-tests

Address over-aggressive sub-trie pruning

The issue was traversing hash nodes was overly aggressive pruning and checking that the extension & leaf nibbles matched the remaining part of the key when creating sub-tries. Even if the remaining key does not match, we still need to include the node.

There is still something else however where some nodes are being hashed out that shouldn’t be. Will update when I know more.

Consider using an automated tool for changelog generatation

I've been talking to @muursh & @Nashtare about potentially automating the generation of the changelog instead of manually updating it with each PR we merge back in. A few tools that were mentioned:

Low priority, but if someone runs out of work, this might be worth spending some time looking at.

Serialization optimizations for mpt_trie.

pub fn serialize_uint<S>(slice: &mut [u8], bytes: &[u8], serializer: S) -> Result<S::Ok, S::Error>
where
	S: Serializer,
{
	let non_zero = bytes.iter().take_while(|b| **b == 0).count();
	let bytes = &bytes[non_zero..];
	if bytes.is_empty() {
		serializer.serialize_str("0x0")
	} else {
		serializer.serialize_str(to_hex_raw(slice, bytes, true))
	}
}

Update `trace_decoder` for Cancun HardFork

Cancun hardfork will be requiring additional inputs on the prover side (either new fields on the block header or local values per transactions), which would need parsing at the trace_decoder level

Switch from git revisions to versions

We still rely on plonky2-related dependencies through git revisions.
Once we push the new versions on crates.io, we should update our dependencies here, especially as we cannot publish crates with git revisions.

Swap out the internal `U512` inside `Nibbles`

When I wrote mpt_trie, I thought the maximum key length that we needed to support was 64 nibbles long. However, during testing a while back, I realized that there is an edge case when converting to compact encoding (I think? I can't quite remember the scenario) where Nibbles required 260 bits. As a quick hack to keep moving, I changed mpt_trie to internally use U512 with the intent that we will fix this later.

So now that things are starting to stabilize, we should figure out what a proper fix would look like:

Support arbitrarily long keys with a byte vec --> Likely the worst for performance.
Only support 260 bit keys with:
- A byte vec
- A 5th u64
- A U256 + an extra byte to handle the overflow case

Just for some more context, ethereum-types U256s use [u64; 4] internally, while a H256 uses [u8; 32]. I should probably confirm with @dvdplm as I know he worked on this library, but I think if you can represent the internal state with u64s, then you can probably use 64-bit instructions/registers to change the state of multiple bytes as once. H256 are immutable, and since we don't need to support any operators for it, a byte vec probably makes more sense.

With this, unless we really don't care about performance (or we want to support arbitrary key lengths), we probably don't want to switch to a byte vec. My guess is that sticking with u64s is going to be best for performance. I also think that the performance of our tools that rely on this library are already fast enough and memory is also not an issue, so we might not have much to gain if we try to improve both of these aspects. Having said that, this is a library that is open for use by anyone, so with that alone we should probably aim to make this good for both performance and memory usage.

Improve documentation

In the line of #19, it would be beneficial to strengthen the documentation of the codebase, in particular on the protocol-decoder subcrate side, to allow both maintainers and external contributors an easier time when diving into the internals of proof IR generation.

Similarly to what was done in 0xPolygonZero/plonky-block-proof-gen#8, it may be nice to add sufficient meaningful documentation so that we can add

#![cfg_attr(docsrs, feature(doc_cfg))]
#![deny(rustdoc::broken_intra_doc_links)]
#![deny(missing_docs)]

at the top of the protocol-decoder crate without compilation error.

Probably not of highest priority for now, but adding a tracking ticket nonetheless.

Side note: The attributes above were added as part of 0xPolygonZero/plonky-block-proof-gen#8 but were removed during the merging with this repo. It could be re-enabled for the plonky-block-proof-gen subcrate when dealing with this issue, as the requirements are already satisfied.

Add a few QoL useability functions to the interface

There are a handful of functions that we can add to the API that will make working with the library a bit easier. Going to document them over time here so I don't forget later once I have a few cycles:

PartialTrie.contains(key)
Provide a way to access the internal bytes of Nibbles by reference
Add a From impl for usize

Remove unneeded contexts from memory

Followup to #25.

Contexts you return from will never be accessed ever again in the future for a given transaction. We can thus discard them to lighten the memory we propagate.

Include support for Cannonical Polygon Bridge

Implementation

The protocol for LxLy in zkEVM (and CDK chains) is to write the Global Exit Root (GER) from L1 directly into a known state slot before EVM execution commences.

In the prover this value can be taken as public input and written into state – since the resulting state hash will be different than expected if this value is not the same as was used during execution.

Example implementation from Erigon: https://github.com/0xPolygonHermez/cdk-erigon/blob/65fed5635db64f7fbc8aa55890b7cfa0feafb1be/eth/stagedsync/stage_execute.go#L176-L181

Supporting multi-transaction `GenerationInputs` in `eth-tx-proof`.

I have been trying to use eth-tx-proof to generate GenerationInputs for the Ethereum block 19240705 in a way that I can prove multiple transactions at once. For this, for each transaction, I tried considering all touched addresses (from all considered transactions) -- instead of only the current transaction's touched addresses -- when trimming the state trie. However, even though with normal execution, the proving passes, it hits a hash node with the modifications.

Consider potentially refactoring `mpt_trie` to use more `Result`s

@dvdplm @muursh @Nashtare

Currently there are a lot of operations in mpt_trie that will panic instead of returning a Result. This was an intentional design choice when I wrote the library, but it's come up enough in discussions that we probably should decide if it's worth moving to Results.

So when I'm trying to decide if a function/method should return a result or instead just panic when a "bad" state is reached, I tend to use this logic to decide:

If the caller can expect the input to always be valid and should never encounter this state if the implementation is "correct", then a panic is best since we should never be able to reach this state to begin with. We can't really continue in this invalid state.
If the caller can encounter a bad state even in a correct implementation (eg. anything with sockets), then a Result is best in order to be able to recover.

So really mpt_trie was making some assumptions about the caller I guess, which was always working with input from Ethereum tries and expecting them to always be valid. For example, with the logic in trace_decoder IIRC all calls to merge_nibbles should always work if the tries it got from the upstream node are valid.

But this might not be the case for all users of the library and maybe Results makes sense. If we decide to change this, we could either:

Make all of these calls return Results.
Make mirror versions of these functions that return a Result.

Improve Continuous Integration via Witness-generation only tests

official Ethereum tests in the zk_evm CI (witness generation only): either hardcoded or random
some concrete blocks (either L1 or John's chain) to test the decoder parsing logic + witness generation
setting up an infra for regular testing and regression catch (we'd need to define the frequency, the types of run, the tests to be run)

Update all sub-repos to use `mpt_trie` once `plonky2_evm` is merged

Once #6 is finished, we can update trace_decoder to use the local mpt_trie repo instead of pointing to the old upstream eth_trie_utils repo. If we do this before plonky2_evm is moved into this repo, we will get type mismatch compiler errors from differing version of the crates being used together.

Type 2: Support Hermez SMT Format in Type 2 branch

Adding for tracking even though this already started.

Fix `Missing key when creating sub-partial tries` error

Two blocks from the hard test chain, namely 19240095 and 19240361, are failing with error
Missing key {...} when creating sub-partial tries, hence during witness pre-processing before actual Kernel witness generation.

It seems to hint that the aggressive trie pruning approach may still have some incomplete edge cases?

Decouple `trace_decoder` and `proof_gen`

proof_gen actually only relies on trace_decoder for 2 aliases imports: TxnProofGenIR = GenerationInputs and BlockHeight = u64. It could be worth to decouple them, so that long-running upgrades like type2/continuations can more easily be developed, as well as allowing more flexibility between the two libraries.

Migrate all open issues from merged repos

This is only really relevant for proof_protocol_decoder (plonky2_evm is mixed in with a ton of other issues, and eth_trie_utils has none), but we should move these issues over:

Unless there's a better way, or maybe this is overkill, idk.

CDK-Erigon: Support for L2 EIP-1559: "Burn Target"

L2's supporting EIP-1559 need to transfer the tokens which are pending burn to a vault / address which is able to bridge them back to L1 before doing the burn.

This is because for many L2s the gas token is ETH which is native to L1 and so an L2 burn wouldn't actually decrease the total supply on L1.

To support this the L2 client (and prover) need to support the ability to configure a target address for transfer instead of burn.

Consider property-based testing for `evm_arithmetization` crate

LA auditors suggested we use proptest in different places of the codebase, to strengthen overall testing. It's a meaningful suggestion, although it doesn't apply everywhere, so we would need to define where this makes sense and integrate it.
Arithmetic tests for ArithmeticStark most likely would be a good place to start.

Support Hermez SMT format in protocol decoder.

Description

The zkEVM team in collaboration with Gateway.fm and @jerry Chen have been working on using Erigon as a node for the zkEVM stack (CDK Erigon). Included in this is the ability to generate a state witness using the same format as is currently being used in Zero, with some modifications.

Implementation

The proof protocol decoder needs to be modified to support the new Trie Leaf format introduced for the Sparse-Merkle Trie used in CDK Erigon and the block information required for a zkEVM proof (e.g. the LxLy global exit root).

Currently, each zkEVM block contains one transaction – this simplfies the work of the decoder. All it must do is recreate the trie as fetched from Erigon and pass it into the prover, it should not need to do further partial trie construction.

Details
A new SMT Leaf type is defined here.
An new operator rule for SMT leaf is added here.

Since SMT is essentially a binary tree, all branch nodes in SMT will only have two bits (right most) masked, unlike sixteen bits in MPT.

SMT witness will only have four types of nodes:

Branch node
Hash node
Code node
SMT leaf node

Reduce number of (re)allocations

There are several allocations / reallocations across the decoder that could probably be alleviated with some refactoring.

To be tackled after #19 probably...

Bring back multi-txn processing in Kernel

As we're currently proving blocks on a per transaction basis, we simplified the kernel, assuming we would only process a single transaction per prover input payload.

With continuations, this will change as we could even process an entire block payload, that would then be executed in one loop within the kernel (within distinct segments). Hence the need to bring back multi-txn support once continuations are implemented to speed-up proving further.

blocked by #25