scipr-lab / libiop Goto Github PK

View Code? Open in Web Editor NEW

144.0 144.0 26.0 749 KB

C++ library for IOP-based zkSNARKs

License: MIT License

CMake 1.65% C++ 97.51% Sage 0.84%

libiop's People

Contributors

Stargazers

Watchers

libiop's Issues

Remove instrumentation code boilerplate

Most of the instrumentation code is generic across all snarks. E.g. the parameters for Fractal

            ("help", "print this help message")
            ("log_n_min", po::value<std::size_t>(&log_n_min)->default_value(8))
            ("log_n_max", po::value<std::size_t>(&log_n_max)->default_value(20))
            ("security_level", po::value<std::size_t>(&security_level)->default_value(128))
            ("field_size", po::value<std::size_t>(&field_size)->default_value(192))
            ("heuristic_ldt_reducer_soundness", po::value<bool>(&heuristic_ldt_reducer_soundness)->default_value(true))
            ("heuristic_fri_soundness", po::value<bool>(&heuristic_fri_soundness)->default_value(true))
            ("make_zk", po::value<bool>(&make_zk)->default_value(false))
            ("hash_enum", po::value<std::size_t>(&hash_enum_val)->default_value((size_t) libiop::blake2b_type))             /* Find a better solution for this in the future */
            ("is_multiplicative", po::value<bool>(&is_multiplicative)->default_value(false))
            ("optimize_localization", po::value<bool>(&optimize_localization)->default_value(false));

are all applicable to every SNARK. (And moreover, should really include RS_extra_dimensions.)

We should de-duplicate this, and in a future PR de-duplicate/improve the command line UX of this, and ease of instantiating SNARK parameters

Refactor FRI verifier to process all queries together

The FRI verifier currently processes each query independently. We should refactor the FRI verifier to process all the queries together, and instead proceed in a round-by-round fashion. Refactoring this amounts to changing the order of the for loops in the existing implementation.

Doing this refactor will allow for batch inversion. Currently the FRI verifier inversions take ~50% of the non-recursive Fractal verifier time, and using batch inversion should reduce the number of inversions by around 30x.

"unknown file: Failure" when using Aurora

I have a simple r1cs that works well with Ligero but fails with Aurora. I tried the Aurora snark example (generate_r1cs_example, using alt_bn128 curve) with the same number of inputs and this example is also failing with the same error:
unknown file: Failure
Unknown C++ exception thrown in the test body.

My inputs to generate_r1cs_example are:
number of constraints = 3
number of inputs = 3
number of variables = 5
With the same parameters, Ligero is working fine. When I use 4 as the number of constraints, then it is working for Aurora.

Using FieldT vs FieldT& for constants

Throughout the code, classes often have functions to return constants. e.g. field_subset.generator(), field_subset.shift()

Does having these functions return FieldT, or FieldT& make any performance difference? If so, does it vary with field size? If not, then should we just default to FieldT?

Support a standard R1CS file format

It would be great to support a standardized r1cs file format. At the second zkproof workshop, J-R1CS was proposed as a candidate standard. If any such R1CS file format gains traction, it would be great to support that in libiop.

Store preprocessed codewords on disk

Fractal preprocesses the matrices, and the prover must store codewords output by the preprocessing. Currently the library keeps these in memory.

(Recall a codeword is the evaluation of a polynomial over a known domain, so in terms of serialization it is a vector of Field elements)

We should store the prover indexes codewords on disk. Additionally, if we store the raw polynomial the codeword represents, in addition to the codeword, then the prover should be able to get away with never loading these codewords into memory. Then the prover would only require random access to the codewords.

Then when constructing the proof, the codewords entries can be read directly from disk. (Maybe even via mmap if we want lower memory filecoin-project/rust-fil-proofs#986)

This will be crucial for large deployments of Fractal, due to the memory reduction. (Additionally, these codewords can be stored 'by a helper' for the helped setting)

make stuck at 35%

[ 2%] Built target zm
[ 9%] Built target benchmark
[ 25%] Built target ff
[ 32%] Built target iop
[ 34%] Built target gtest
[ 35%] Built target gtest_main
[ 35%] Building CXX object libiop/CMakeFiles/test_linking.dir/tests/snark/test_linking.cpp.o
c++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See file:///usr/share/doc/gcc-5/README.Bugs for instructions.
libiop/CMakeFiles/test_linking.dir/build.make:81: recipe for target 'libiop/CMakeFiles/test_linking.dir/tests/snark/test_linking.cpp.o' failed
make[2]: *** [libiop/CMakeFiles/test_linking.dir/tests/snark/test_linking.cpp.o] Error 4
CMakeFiles/Makefile2:777: recipe for target 'libiop/CMakeFiles/test_linking.dir/all' failed
make[1]: *** [libiop/CMakeFiles/test_linking.dir/all] Error 2
Makefile:159: recipe for target 'all' failed
make: *** [all] Error 2

Error during the installation

Hello,

I'm trying to install the library and encountered this error:

error: #error C++11 or greater detected. Should be C++03.
 #error C++11 or greater detected. Should be C++03.

OS Version: Ubuntu 18.04.3 LTS
GCC Version: 7.4.0

EDIT : When I removed these lines from the CMakeLists.txt , the problem solved:

set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)

Unsatisfied proof for very small circuits when num_inputs = 1

Hi,

I test a simple R1CS with Fractal's libiop and matrices of order 8

I face an unexplained behaviour.

The R1CS is satisfied if I gave 0, 3, 7 primary inputs.

In contrast, if I gave 1, the proof does not verify fully.

That is, I obtain this message:

IOP transcript valid: true
Full protocol decision predicate satisfied: false

In other words, this configuration works, for instance:

const size_t num_inputs =0;
const size_t num_variables = 7;
const size_t num_constraints = num_variables + 1;

This one does not:

const size_t num_inputs =1;
const size_t num_variables = 7;
const size_t num_constraints = num_variables + 1;

Can you explain this behaviour? Thanks in advance.

Feasibility of instantiating homomorphic encryption as SNARK circuit

Could someone give some opinions on the feasibility of instantiating a circuit for performing homomorphic function eval with known (encrypted) inputs as a SNARK?

That's because some of us from this world want some computational integrity guarantees in the multiparty setting when doing delegated, privacy-preserving computing.

The reason we would want homomorphic encryption is that we would like to obscure inputs from the server (prover) performing the computations. For this reason, there is not really a strong requirement for zero-knowledge, since all inputs could be considered public, since they are encrypted anyway (although in practice it may be better to keep them private to leak less information, in case a private key gets leaked, for instance). However, there may be parts of our schemes such as encryption and decryption that could possibly be worth verifying for a multiparty protocol.

We have deterministic circuits (no recursion and can unroll loops deterministically), operations involving modular arithmetic wrt prime moduli, number theoretic transform, data permutations. It is unclear that many of the optimisations seen in current implementations (SEAL, Lattigo, Palisade) such as RNS word size decompositions would be the right choices for a SNARK circuit.

I would assume we would want some efficiency during setup, since we are passing different inputs to the same circuit, but each output given inputs must be verified independently. We would also prefer not to encounter any toxic waste. I have to think further about whether proof sizes are an issue by comparing it with average data transmission size per atomic circuit.

Which of the available schemes might be best for our purposes?

Add method to supress FFT log messages

FFT messages currently clog many test outputs, we should add a way to suppress these (or more parts of instrumentation generally) within tests.

Is this best done via a boolean compiler flag, or by setting some sort of "instrumentation level" everywhere? (The idea being the same as log level)

Using Aurora/Ligero for Boolean Circuits

Hi!

We're interested in compiling a benchmark comparison between different ZK protocols. We'd like to include Aurora and Ligero in the comparison and were hoping we could use your library to do this.

Based on looking around the library a bit, there doesn't seem to be an obvious method for executing proof relations encoded as Boolean circuits. For example, we'd like to demonstrate that a prover knows the pre-image of an invocation of a call to SHA-256, where SHA-256 is encoded as a Boolean Circuit (e.g. http://stevengoldfeder.com/projects/circuits/sha2circuit.html).

(More precisely, the prover knows w such that H(w) = h where h is a publicly known value.)

Is there some way to achieve this with your library?

Any help would be appreciated!
David

Fractal - Benchmark, Parallelization and Memory usage

Hi,

I am working on the topic 'succinct blockchains' with the focus on a concrete prototype implementation. I am aware of the following three approaches that had published both a theory and a concrete source code on the subject of recursive proofs - i) Coda protocol ii) Halo and iii) Fractal.

The Coda protocol uses a cycle of MNT pairing-friendly elliptic curves with about 800 bits to achieve a security level of 128 bits due to low embedding degrees. The Halo paper argues that it is possible to amortize away the recursive proof verification and the cost of the inner vector product by leveraging a cycle of two non-pairing-friendly curves. The Fractal paper describes a probabilistic proof approach with preprocessing of SNARK and holographic IOP.

My goal is to look at the different approaches from a technical perspective. In particular, the topic of parallelization possibilities and memory footprint is crucial.

I've checked the runtime behavior from Fractal and Halo. I am aware that this is a banana vs. apple comparison. It is only a matter of basic principles.

1) Halo

I used the https://github.com/ebfull/halo/blob/master/examples/bitcoin.rs benchmark. I could not determine how many R1CS constrains equivalent the circuit for the Bitcoin example contains.

Single-threaded:

creating proof 1
done, took 61880.355779868s
verifying proof 1
done, took 3445.637548892s

creating proof 2
done, took 59251.202629717s
verifying proof 2
done, took 4363.412253224s

Multi-threaded (48 cores):

creating proof 1
done, took 1389.555774298s
verifying proof 1
done, took 137.547002293s

creating proof 2
done, took 1408.376340113s
verifying proof 2
done, took 167.663635282s

From the above runtimes figures, one can get about ~30x speedup due to parallelization. Interesting from the technical point of view is the required memory footprint. About max. 4 GiB RAM are needed during the proof generation. Due to the small RAM footprint, an efficient parallelization to a GPU could also be possible.

2) Fractal

The current implementation of Fractal is single-threaded. I tested a R1CS circuit with 2^22 constraints:

./instrument_fractal_snark --make_zk 1 --is_multiplicative 1 --field_size=181 --optimize_localization=1 --log_n_min=22 --log_n_max=22

$instrument_fractal_snark-memory_stats$

It turns out that Fractal needs about 250+ GiB RAM in the above test case. I would like to ask the following questions.

Q1: Can the RAM footprint be (significantly) reduced?
Q2: Is there a possibility for parallelization? See also #6

Thank you

/Jiri

MT: Caphash optimization

When you have many randomly queries to a single merkle tree, you can compute on you're own most nodes at the top. (e.g. with two queries, one on the left half, one on the right half, you don't need to provide any internal nodes on the 'first' layer of the MT proof)

At the moment we already prune such nodes, to save on proof size. However we can further save on computation by having only a single higher arity hash be executed at the top of the tree. So we want to support a 'cap hash', that computes a single hash of 2^k inputs for the top layers of the tree.

Standard merkle tree:

                       root
                       / \
                     /     \
                   /         \
                 /             \
                *               *
               / \             / \
              /   \           /   \
             /     \         /     \
            *       *       *      *
           / \     / \     / \    / \
         0   1  2  3  4 5   6 7

A cap_hash_size =2^2, tree:

                       root
                      / /  \  \
                 /    /      \   \
             /       /        \     \
            *       *       *      *
           / \     / \     / \     / \
          0 1  2  3  4  5  6   7

This should be done by add a new type called "cap hash" to hashing.hpp, which has a single template parameter for input & output type. (hash_digest_type), and it intakes a vector of hash_digest_type and outputs a single hash_digest_type. The we should update the merkle_tree.hpp file to use this, and update the proof format accordingly. the cap hash size should be a parameter input to the merkle tree, and also be updated in BCS.

For now, I suggest we first add this to the merkle tree standalone and test it, with the BCS transform defaulting to use cap hash size of 1. Then once that works we add this cap hash parameter to the BCS parameters.

This notably helps save on circuit costs when we go to recurse a SNARK (e.g. Fractal)

Wrong calculation of `actual_sum` in univariate sumcheck

libiop/libiop/protocols/encoded/sumcheck/sumcheck.tcc

Lines 367 to 381 in a2ed2ec

 #ifdef DEBUG 

 polynomial<FieldT> g = h_and_g.second; 

 FieldT actual_sum; 

 if (this->field_subset_type_ == affine_subspace_type) { 

 /* The second component is actually g + sum * x^{H_size-1}, but 

  we immediately remove its highest degree term */ 

 actual_sum = g[this->summation_domain_size_-1]; 

 g.set_degree(this->summation_domain_size_-2, true); 

 } else if (this->field_subset_type_ == multiplicative_coset_type) { 

 /* The second component is actually x*g + sum, but 

  we immediately remove its lowest degree term */ 

 actual_sum = g[0]; 

 g.remove_term(0); 

 } 

 #endif // DEBUG

Here, actual_sum should be g[0] * this->summation_domain_size instead of g[0].
Please see https://github.com/arkworks-rs/bcs/blob/2ddd1db3838cbf393a8d8d7eafe6a004aa584c61/sumcheck/src/util.rs#L16-L27 for a possible fix

BCS: SNARK transcripts - print oracle names by round

Semi-recently, we added "names" to every oracle that is registered. We should print these out in our "print detailed transcript data" method, to further aid debugging.

The following steps are needed to do this:

Add a method get_oracle_handles_by_round to the IOP interface. This method will have to read oracle_registrations_, and num_oracles_at_end_of_round in order to do this. (It should use the oracle registrations to get the handle, and figure out which round each handle is in using num_oracles_at_end_of_round.
Make print_detailed_transcript_data take in an IOP, and have it call get_oracle_handles_by_round. Use that, and the associated method on IOP for getting an oracle name from the handle, in order to print out all the oracle names in each round.
Update the instrumentation code in profiling, to make the calls to print_detailed_transcript_data take in the IOP. (The IOP is created there anyways).

Parallelization

Hi,

I have a quick question. I noticed in the paper for Fractal, the benchmarks are done single-threaded. When I was looking at a benchmark on my machine, I saw that the multiplicative_FFT_wrapper often took the most time, so I looked at the code. It seems like it's taken from libfqfft--is there any reason that libfqfft wasn't directly used to take advantage of its OpenMP support? Would it be (somewhat) easy to drop in a call to libfqfft?

Thanks!

PS: thanks for sharing all this. It's really awesome stuff.

prime fileld example in tests

Could you give an example of using prime field in tests/test_aurora_snark.cpp? In addition to GF(64)?

using num_input not equal to power of two minus one causes exception

Might be a trivial question -- in test_aurora_snark if I change the num_input to something which is not a power of two minus one, an invalid_argument is thrown. Is this due to the proof itself or just an implementation thing?

Change test_bcs_transformation to use default parameters

The test_bcs_transformation test currently uses dummy BCS parameters. We should instead just use the actual BCS parameters for hashchain, merkle tree hash, proof of work, etc. The dummy algebraic hashes should be removed entirely and replaced with the real Poseidon hashes.

Move/deduplicate util functions to libff

Some functions in libiop are duplicated in libff. They should be removed from libiop and all usages should refer to the corresponding functions in libff. This includes the functions in profiling.hpp and the UNUSED macro.

Some other functions are present only in libiop and not libff, but make more sense in libff. This includes the functions in field_utils.cpp as well as power and enable_if. These should be moved accordingly.

Note: Some of this duplication may also be present in libfqfft but that is outside of the scope of this issue.

Lagrange basis optimization to Aurora's Lincheck

In Aurora, part of the verifier work is evaluating p_alpha^(1) at the query point. Due to the current choice of p_alpha^(1), (which is evaluations of powers of alpha), the verifier has to do an IFFT to get p_alpha^(1) as a polynomial, and then do a linear time evaluation procedure. Alternatively it can do lagrange interpolation to evaluate this at the point. Both of these procedures are pretty slow though.

Within succinct Aurora, part of how Succinct lincheck achieves succinctness is by having the verifier sample the random polynomial from the lagrange basis. This is because this random polynomial can be efficiently sampled and evaluated in polylog operations. We should integrate this into Aurora for p_alpha^(1), to save the verifier O(H) operations per query for interpolating p_alpha^(1).

** OUTDATED Benchmarks, may or may not still hold:
This is currently implemented on the branch lagrange_basis_aurora_speedup, and it achieves a 20% verifier time speedup in the multiplicative case, and a 30% verifier time speedup in the additive case, and noticeable improvement in the prover time. The verifier should speedup even more after we add a method to check if subsets are equal.

However this is only implemented in the case where |constraint domain| >= |variable domain|. When |variable domain| > |constraint domain|, we must have p_alpha^(1) evaluate to 0 for x \in (variable domain / constraint domain) . I believe that the degrees work out correctly if we instead set p_alpha^(1) to be this lagrange sampled polynomial multiplied by Z_{variable domain} * Z_{constraint domain}^{-1}. We should add this extra computation in the case where |constraint domain| >= |variable domain|.

We also need to add some documentation around this, as its not specified in the Aurora paper.

Consolidate field_subset.offset and .shift

There are two types of domains of a field that we currently support, affine subspaces, and multiplicative cosets.

Currently .offset is only used in the additive subspace case, and .shift is only used in the multiplicative subgroup case. It'd be best to consolidate into only using such method for code consolidation/clarity.

I suggest using .shift or affine_shift

Stark Implementation?

the Eurocrypt paper mentions the presence of a Stark implementation for libiop. Is this still planned content for the project?

Upgrade to CMake 3.X series

We should update to the latest major version of CMake. We are currently on cmake 2.8, which is several years old now.

In cmake 3.1 (once it is set to use C++ 14) we currently run into problems with linking libsodium. We likely need to introduce some logic to locate the sodium header file, in order to link to,

BCS - Refactor hashchain logic, separate transcript into (BCS proof, BCS transcript)

Currently BCS transcripts dually mean both proofs and transcripts, in a way that is very confusing.

We should de-conflate these, and have the proof system output just a proof. The verifier then runs a method run_hashchain(proof) and gets a transcript.

Additionally, the logic for dealing with hashchains is quite scattered at the moment. We should package it together, to maximize code re-use and readability.

The verifier should just run the single run_hashchain method to get the transcript. The prover should have a method for hashchain_absorb_round(mutative hashchain state, prover messages, MT roots), hashchain_squeeze_outputs(mutative hashchain state, num outputs) and hashchain_squeeze_query_positions(mutative hashchain state, num_positions, domain_size_in_bits).

run_hashchain then calls these as a sub-routine.

Organizationally, the sub-routine methods should all go in bcs_common, while the run_hashchain method should go in the verifier. At the moment, the equivalent of run_hashchain in the verifier is in the seal_interactive_registrations method.

Streaming oracles for better memory usage

Memory usage is a bottleneck that we hit for large deployments.

The peak memory usage at the moment is during the LDT Reducer. The LDT reducer takes a random linear combination of all constraints over the large codeword domain (L in papers) from the protocol. The current API is that it evaluates all constraints in full, and then takes a random linear combination of them all. Instead, we should stream these oracles into the LDT reducer one by one. The LDT Reducer would maintain some 'current state' and keep updating this with every streamed oracle.

This should significantly lower the memory requirement, and if we eventually use a 'codeword pool' it would likely improve prover time.

MT: Support higher arity MT's

Currently we are restricted to binary merkle trees. (Two children at each node)

We should generalize to allow for non-binary merkle trees. Theres good evidence for this being advantageous in terms of reducing constraint complexity of the merkle trees with SNARK friendly hashes.

Separate the hash_enum BCS parameter into hashchain enum and merkle tree enum

The bcs_transformation_parameters object has an attribute named hash_enum. However, this is incorrect, since the BCS transformation is allowed to have different hash types for its hashchain and merkle tree. This attribute should be split into two different ones, perhaps called hashchain_hash_enum and merkle_hash_enum.

The hash_enum parameter is used to determine the default hash type, as well as for debugging purposes.

Fractal verifier constraint system

One more question: This might be naive (as in I may have just missed it), but is the constraint system for the Fractal verifier that was used in the paper available here or somewhere? If not, are there plans to make it available? Recursive proof composition is a super cool feature.

Thanks!

Add methods to print out the names of the oracle in each round

In the latest update, we added a name to all oracles upon registration. We should get methods to retrieve the name of an oracle from its handle in iop/iop.hpp.

Further, we should print out the names of the oracles per round when we print out detailed transcript info in the bcs transform

	#ifdef DEBUG
	polynomial<FieldT> g = h_and_g.second;
	FieldT actual_sum;
	if (this->field_subset_type_ == affine_subspace_type) {
	/* The second component is actually g + sum * x^{H_size-1}, but
	we immediately remove its highest degree term */
	actual_sum = g[this->summation_domain_size_-1];
	g.set_degree(this->summation_domain_size_-2, true);
	} else if (this->field_subset_type_ == multiplicative_coset_type) {
	/* The second component is actually x*g + sum, but
	we immediately remove its lowest degree term */
	actual_sum = g[0];
	g.remove_term(0);
	}
	#endif // DEBUG

scipr-lab / libiop Goto Github PK

libiop's People

Contributors

Stargazers

Watchers

Forkers

libiop's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs