input-output-hk / mithril Goto Github PK
View Code? Open in Web Editor NEWStake-based threshold multi-signatures protocol
Home Page: https://mithril.network
License: Apache License 2.0
Stake-based threshold multi-signatures protocol
Home Page: https://mithril.network
License: Apache License 2.0
Tasks to do:
openapi.yaml
file at the root of the repositoryWe are planning to reward signers in case of successfully participating in the process, right? (in order to incentivise). If that’s the case what should happen in the following scenario. Suppose we have a threshold k
. Now, suppose that there exists l > k
valid signatures submitted by the participants. Should we reward all l
signers? Or should we only reward the “first” k
?
It is of interest to understand what are the changes of producing valid certificates given a set of parameters. To this end, it would be of interest to produce a matrix that determines what are the chances of succeeding in the certificate generation given different values of m
, k
, phi_f
and nr of participants (and their stake).
We need to create a local devnet for testing purpose of the Mithril Network components:
Modify the Cardano Node such that it can produce deterministic snapshots.
This subject was previously addressed with a POC in #84
The aggregation function first selects one signature per index, and only then verifies the signature. Currently, signatures are first dismissed, without verification. This means that we might select an invalid signature, and dismiss a valid one for the same index. Then, signatures are verified, and if a single one fails, then the aggregation fails. Again, this is not necessary, as invalid signatures can be submitted, without the need of the signature failing. We only need k
valid signatures.
This would allow an adversary to invalidate aggregation.
We need to implement a lightweight Mithril Client: (At this stage no certificate verification)
Msp::aggregate_sigs
has an unused msg
input (code), because the paper includes it. We think Msp is not intended to be a general API, just a dependency for Mithril, so it makes sense not to have msg
as an argument. What do you think?
It looks like a test from the cryptographic library mithril-core is flaky:
stm::tests::test_invalid_proof_path
test in cargo test
(associated trace)Implement hash-to-curve as given in this spec
The Mithril Aggregator should be accessible on a public address:
An aggregate signature requires individual signatures over k
different indices.
Should we make k
a generic parameter? We seemed to remember (but couldn't find) that we wanted to support multiple settings of k
. What is the best way to achieve that?
Regarding StmInitializer
, @pyrros observes
"We might want to separate key generation vs registration. Keeping them separate might be simpler for reusing keys across epochs, keeping them together might simplify registration error handling."
I think this sounds like a good idea. It seems to me that perhaps what we want is an StmInitializer
that can produce multiple StmSigner
s (currently the API has a more linear flavor, consuming the StmInitializer
via the finish()
method). Based on the above though, I think the API ought to:
StmInitializer
StmInitializer
StmInitializer
with the registration serviceStmSigner
(currently this is split into a build_avk()
step and a finish()
step), but without consuming the StmInitializer
, so that it can easily be used to re-register.Tasks to do:
Should be possible to abstract the pointer sizes from the caller. Otherwise, we should check that the given size of the pointer is as expected by the library.
The UC security modelling in https://eprint.iacr.org/2021/916.pdf makes use of a PartyId.
Clarify how registration happens and the role of PartyId.
KeyReg
while the latter could be a hash of the public key.Ideally, we would test the rust library and the C api with a single, all powerful cargo test
. Super ideally, we could document how to use the C API with the single, all powerful cargo doc
. Seems there might be ways to achieve it. Otherwise, a Makefile would be just fine (but not as cool).
Currently, the Merkle tree used for the AVK is balanced which implies that signatures produced by any two parties will have the same length. Since parties with high stake are expected to produce signatures more often, it will be beneficial to place them in short branches. Vice versa, parties with low stake may be placed deeper with little cost (if amortized over the frequency of signing).
A variant of Huffman coding may be the optimal choice for this, but allowing the path length to vary completely might be too complex for circuit based systems. As an inbetween solution, consider a root with two subtrees: a “short” tree with the 2^Y largest stakeholders, and a “long” tree with everybody else.
On June 08, Cardano had 2800 stake pools. This rounds up to a tree with 2^12 leaves and paths of length 12.
From these pools, the Top 512, held 92% of the stake. A subtree for these users would have paths of length 9 (+1 due to the subtree). The other users would lie in a tree with paths of length 12 (+1 due to the subtree). This implies that ~92% of signatures would have length 2 shorter than before, and ~8% would be one longer. On average, a signature would be shorter by 1.76 steps (~14%).
Yes. Verification must be able to handle paths of varying lengths (might be implicitly supported in the current version). Aggregate key generation will need to be changed to produce unbalanced trees.
Changes in node logic:
None.
Limited. Circuit-based systems do not work well with conditional branches as we often need to represent both branches in the circuit and thus pay the cost of both options (or the more costly of the two if we can fold them together). This implies that a single circuit approach will obtain no benefit from the above.
For the basic short/long option, we might opt to prepare a number of preset circuits, each having a different mix of short and long branches opting to use the smallest circuit that is appropriate at each point. This will require some amount of effort and limit the benefits to a degree.
None. (Also see Scaling Potential).
Incentive structures in place may keep the size of the stakeholder pool close to its current value.This implies that the savings percentage will likely not increase further. On the other hand, the above proposal can maintain performance against malicious/capricious users who wish to operate a significant number of stake pools with insignificant stake.
There is an interest from other projects within IOG to use capabilities provided by Mithril library (eg. ATMS for EVM-as-a-Sidechain project by @dzajkowski). We need to:
It should be possible to generate a clerk without the existence of a signer. Therefore, we need to present that explicitly in the examples and c tests.
The MerkleTree struct contains all nodes, and not only the root. This is not efficient to describe the aggregate key avk
.
To make benchmarks and tests over different curves, we could make the benchmarks and test functions generic.
Having a CLI available to do various operations in Mithril would be helpful for documenting, explaining and experimenting. Every language and environment can easily spawn processes or run shell commands so this would be provide a crude but simple integration point.
Run benchmarks for sizes with several 1000s parties, and include the results in the README file.
We need to create and store snapshots archive to a publicly accessible address:
ulong
on linuxulonglong
on Mac OSmithril.h
is interpreted depending on underlying arch.Currently, mithril operates with a single set of f/m/k parameters. The parameter f determines the probability of success, and the protocol then requires k successes over m indices. For security and liveliness, k and m are chosen so that an adversarial stake will have negligible probability of success while an honest one will have a significant one. Importantly, this choice assumes that (1) the adversarial stake will refuse to cooperate with honest users and that (2) the adversarial stake is the maximum allowed by the definition.
If we relax the two above assumptions, we are able to be much more aggressive in our choice of parameters, with no security impact against adversarial forgeries. There is however considerable impact in the possibility for denial of service/ liveliness. This can however be overcome:
We can select two* pairs (k_a,m_a) and (k_c,m_c) with the first one being aggressively parametrized, and the second one conservatively. Signers operate as normal. Aggregators attempt to create an aggressively parametrized certificate before a conservative one, and verifiers prefer aggressive certificates to normal ones.
This solution realizes the space savings of the aggressive parametrization if the adversary is weaker than allowed, but retains liveliness versus a maximal adversarial stake. Against forgeries, the adversary gains a small benefit, but the overall gain is in the order of 1 bit of security i.e from one chance at 2^{-100} to (less than) two chances at 2^{-100}.
One current (k/m) pair is 357 out of 2643. Shorter certificates can be set to 228/1400 (36% smaller) with a fallback to the initial values if we are unable to locate 228 sigs in the first 1400 indices.
Low to None (assuming k,m not hardcoded or embedded in sigs/certs --also see #9 ).
Yes, limited. Nodes need to be aware of both values and use them in the correct order.
Yes, the benefits will be similar. Need a “short” version of the circuit to handle short proofs, but no structural changes are needed.
Negligible. We can quantify the advantage of the adversary to less than 2x of the original. We can either accept that, or re-parametrize for 2^{-101} base advantage which will still be bounded by 2^{-101} after doubling.
A Clerk or a Signer should only be initialised once the registration is closed. This can be enforced at type level. For instance, rather than creating an StmSigner
out of an iterator of RegParty
s, we should create it out of a ClosedKeyReg
instance. Moreover, this creation should consume the StmInitializer
, so that, in case the stake or participating parties change, we enforce the creation of a new StmInitializer
.
This opens a question. Do we want to allow a transition from an StmSigner
back to an StmInitializer
? Or are we OK in just initializing from scratch an StmInitializer
every time that stake/participants change?
I think the former makes sense, so that we can use the StmSigner
to "keep" the consistent data (such as the party_id
, sk
, pk
). What are your thoughts?
Note: this is a breaking change for the C-API.
cc: @abailly-iohk @algurin
The tool db-analyser should provide a good way to create deterministic snapshots of the ledger state:
It is of interest to include benchmarks which show the behaviour of STMs, MSPs, with different number of participants and parameters. Similarly, of interest to include benchmarks on how the size of the proof grows, as the number of participants grow.
We need to enhance the documentation:
rand_core
or rand_chacha
are dependencies of rand
. And, as far as I can see, we only need those two, so probably best to explicitly import rand_core
and rand_chacha
instead of rand
We used SHA because the Blake2b library we use does not support output lengths we need - error at runtime for length 300+.
We could maybe work around it by just using length extension like H(x,1) || H(x,2) || H(x,3)
. Do you think this is kosher @iquerejeta?
We want to assert and verify (roughly) the following (Safety/liveness) property:
We could use https://github.com/input-output-hk/cardano-node/tree/master/cardano-testnet, possibly pulled through nix-shell?
When I try to cargo build
inside the rust
directory, I get the following error:
error: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /home/curry/mithril/rust/target/debug/deps/libzeroize_derive-f42e2238bc0119fb.so)
--> /home/curry/.cargo/registry/src/github.com-1ecc6299db9ec823/zeroize-1.4.1/src/lib.rs:220:9
|
220 | pub use zeroize_derive::Zeroize;
| ^^^^^^^^^^^^^^
error: could not compile `zeroize` due to previous error
warning: build failed, waiting for other jobs to finish...
error: build failed
I have tried googling and found issues related to static linking suggesting musl should be used instead but even removing the staticlib
target does not fix it.
Currently the stake related to an StmInitializer
or StmSigner
cannot be updated using the C API. Expose such functions.
This issue tracks our progress on the following tasks:
From<Statement>
in mithril_proof
Rc
is used in mithril_proof::Statement
Witness::verify
to use a Result
type so we can know more about what failed than false
.Apache License 2.0
same as Cardano Node
)latest
tag on Docker images#![warn(missing_docs)]
in lib.rs
and main.rs
files)Required Approval
(see)On one hand, the output of Msp::ev
should be an element of Zp with a real proof system. On the other hand, with trivial concatenation, it is an output of blake2b. Finally, it must be compared against the output of phi
, which is a real in the range [0,1]. The paper does not define how to do this conversion. We are currently just dividing 64 bit output from the hash by 2^64 and comparing that with the output of phi
. What should we do in general here?
As the actual value of Mithril lies in the capability it gives to cardano-node users to boot their node faster, in minutes instead of hours, we want to understand how to produce reliable and reproducible snapshots from a node's DB.
Tasks to do:
- the whole
immutable
folder of the database (required)- the
protocolMagicId
file (required)- the latest ledger state snapshot file in the
ledger
folder (optional)
Tests successfully ran on the first 20 chunks of the
immutable
folders (macOS, Ubuntu, Ubuntu on Docker, Windows on 3 separate computers) 🟢
Best option is Immutable + Ledger State, but this implies to modify the Cardano Node
Mainnet
Data | Node | Full | Archive | Snapshot | Upload | Download | Restore | Startup |
---|---|---|---|---|---|---|---|---|
Immutable Only | standard | 43GB | 24GB | ~28m | ~45m | ~25m | ~12m | ~420m |
With Ledger State | modified | 45GB | 25GB | ~28m | ~45m | ~25m | ~12m | ~65m |
Testnet
Data | Node | Full | Archive | Snapshot | Upload | Download | Restore | Startup |
---|---|---|---|---|---|---|---|---|
Immutable Only | standard | 9.5 GB | 3.5 GB | ~7m | ~5m | ~3m | ~2m | ~130m |
With Ledger State | modified | 10 GB | 3.5 GB | ~7m | ~5m | ~3m | ~2m | ~6m |
Host: x86 / +2 cores / +8GB RAM / +100GB HDD
Network: Download 150Mbps / Upload 75Mbps
Compression: gzip
Cardano Node: not running during snapshotting
Commit: abailly-iohk/ouroboros-network@ae552cc
This task will be finalized in a separate issue: #100
Question: Do we need to stop the node when the snapshot is done?
Answer: We can keep it running
Commit: 8fcbce9
Removing the requirement of pairing friendly curves might facilitate its usage.
The API now takes as input to proof
and verify
the ProvingKey
and VerificationKey
, which are part of the env
. Hence, with the latter it is sufficient to pass these values.
Currently, mithril certificates involve verifying a few hundred paths over a Merkle tree. For the parts of the paths that are close to the root, there will be large amounts of overlap. For example, there are exactly two children under the root. We thus expect ~half of the paths to use one child and the rest to use the other. This implies that both children appear multiple times in the certificate.
A simple approach to mitigate this is to select the X-th layer of the tree and expose it. This will require 2^X hashes in space. However, it will also allow us to truncate paths at the X+1 layer, rather than the root, saving k * X hashes. For X ~ log(k), this is a clear gain.
For k~256, we set X=8. We use 256 hashes to represent the exposed layer, but save 8 steps from each path. Effectively, each path is shortened by 7 steps.
Trees of 2^ [10/20/30] leaves have paths of length 10/20/30. Additionally, leaf preimages, ev values, signatures and path data add the equivalent of ~5 steps for each party. Thus, the approximate savings are: 7/15, 7/25, 7/35, i.e. 46%, 28% and 20%. Respectively.
For the current pools size of ca 2^12, this gives a 7/17 i.e. 42% size reduction.
Potentially. We can either change certificate aggregation and verification to expect compressed paths or add a pair of compression/decompression functions to convert between the two representations.
Minimal. Nodes might need to store the “exposed” layer rather than only the AVK in order to aggregate.
Yes. The benefits will be similar albeit smaller if the circuit uses a higher arity for the tree. The circuit will be hardcoded to expect a compressed representation of the tree, but the end result will be a more compact circuit. NB:there is no need for multiple versions of the tree here, we only need the compressed one.
None.
We can try and use off the shelf compression to exploit the overlap. This will likely provide smaller benefits as a compression function can not infer the value of a parent from the values of the children (and thus omit all parents after the exposed layer), but will be much simpler to implement. This low-effort approach does not work for circuit-based proofs however.
Might be of interest to expose in the API an extraction function of the key pair (see discussion in #43 ). Initial decision was to not do this to avoid misuse of the keys, but might be more useful than dangerous.
Goal is to put in place ETE test infrastructure consisting in:
All of the above will obviously be mostly stubbed or faked atm.
Once #45 is merged, we can include the C api.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.