facebook / akd Goto Github PK
View Code? Open in Web Editor NEWAn implementation of an auditable key directory
License: Apache License 2.0
An implementation of an auditable key directory
License: Apache License 2.0
Putting state_map into storage so that large streams of data do not need to be read when reading a node.
Currently, when AKD is initialized, it will fetch the Azks
record in storage. If it is unable to find the record, it will attempt to write the root node (and its epoch 0 state) along with the Azks
record.
In a distributed deployment with dedicated readers and writers, we do not want the readers to perform this write on startup. Instead, they should fail on being unable to retrieve the Azks
record.
Currently we have a timed cache with unbounded space. Lifetime of the items in the cache is 30 seconds.
With the growing tree size and increasing number of requested operations (e.g., lookups), the size of the cache may become a limiting factor for how many operations we can handle. In that case, we will need to manage the cache more effectively.
To this end, some initial ideas are:
batch_get
operations.cc @afterdusk
winter_crypto
and associated libs (_math
, _utils
) seem to have released a new version, with better handling of bytes and big namespace cleanups. It'd be worthwhile to update to the latest version for these improvements
We want to use a VRF to compute labels in seemless_directory.rs
and then verify these VRFs in lookup_verify
, key_history_verify
etc.
See: https://crates.io/crates/vrf for one option. Below is a starting checklist of things to consider:
NodeLabel
type. We may want to change the NodeLabel type as a result, or implement a trait instead.config
file, so that the server can generate VRF values.Some applications may be amenable to a lower privacy level, such as certificate transparency. For these examples, we do not need a VRF and can thus be more efficient. We should allow both private and non-private modes for this crate. Perhaps using features.
Functions to be implemented and tested:
publish
lookup
lookup_verify
key_history
key_history_verify
audit
audit_verify
Lookup and key history verifications encounter failures with small trees. This manifests with the Blake3
Hasher as well as with Sha3_256
.
Repros:
We need to write an AIR that verifies an Append-only proof, so that the client doesn't actually need to receive nodes or rely only an auditor.
Due to the high likelihood that client's will verify the proofs with limited resources, it would be helpful to have a separate crate with as few dependencies as possible for verification of the proofs provided from the AKD. This would then simply be a dependency of the main AKD crate or could be maintained independently.
Adding documentation for all API calls.
We need to strip the repo of any panicking calls, such as unwrap
. This may also include changing return types for various messages.
Right now, there are way too many functions / structs that are pub, which I think do not need to be. We should reduce this as much as possible, preferring pub(crate).
In MembershipProof
, the sibling hashes and sibling labels are a 1-1 mapping and therefore should be a Vec<(NodeLabel, Digest)>
rather than 2 separate properties.
Same thing in a NonMembershipProof
with longest_prefix_children*
, they can be merged into a single property of a tuple.
This syntax is warranted as it's already used in AppendOnlyProof
with the inserted and unchanged nodes.
To improve the efficiency of bulk/multi lookups, we could call lookups async and await on the results of all lookups -- rather than awaiting on them individually.
Right now, if you insert an element that is already a member of azks, we get a HistoryTreeNodeErr(DirectionIsNone)
error. Perhaps it would be best to clarify that it is a duplicate insertion error?
We can also add a test for this error.
The akd_client
crate should also support the other validation functions that are in the base akd
client.
Initially we should start with the history proof, and eventually we'll add audit too.
Once #17 is closed, we should write benchmarks for its functions and various component parts so we have some sort of profile on what, if anything, needs to be optimised further and how this compares to the Java implementation in the paper.
I ran into an issue with running tests in parallel over shared state. This needs to be fixed and the state_map field of HistoryTreeNode removed.
HistoryTreeNode has an epochs field that is used to identify the epochs which an update took place.
As it stands now, the field can grow to an unbounded size. This will pose an issue when the corresponding DB record hits either soft or hard limits imposed by the DB schema or architecture.
Epochs are u64 = 8 bytes.
The associated epochs for a HistoryTreeNode can be obtained directly from the HistoryNodeState table given the node label, since each update corresponds to a HistoryNodeState.
Accumulating the data from separate tables will incur some computation time on the database server and we will need to profile and assess the impact. However, I think this is necessary to avoid the unbounded record size issue. Are there any other potential solutions I've missed?
At the end of two runs of azks.batch_insert_leaves(insertion_set.clone());
on different insertion sets, the root node does not seem to update properly. This results in the append-only proofs failing.
In batch_lookup
we can batch get_user_state
as well.
Now that we have a PR implementing the VRFs, we should find a way to securely generate, store and access the VRF secret key and to publish a VRF public key.
The vrf
crate could probably support key generation but still it needs to be stored somewhere.
At the moment the values being committed in the tree for prototyping are all dummies. We would like to replace the value_to_bytes
function in directory.rs
to return actual commitments instead. This involves 3 main steps:
value_to_bytes
to return a byte array derived from a digest instead.We have errors that only include the error string in the error. When these errors surface up, they don't have any context on where and why they occurred (e.g., a MySQL error). We should improve these by adding
GetData
error in batch_get
should mention that the operation was batch get.Cargo bench was added in 51b756c, however it is no longer in use. We should remove it from CI workflows.
Currently, the membership and non-membership proof generation and verification both include membership proof operations as sub-routines. However, this is never actually used in the non-membership verification (see: https://github.com/novifinancial/akd/blob/b0a10e8e9cd17199a890686b36696f5da8f9d820/akd/src/client.rs#L82). This needs to be fixed and will likely require fixing get_membership_proof_and_node
in append_only_zks.rs
.
Many of the functions in history_tree_node.rs
as overly complicated to avoid passing mutable references. We need to restructure and simplify code to avoid this and for it to be easier to build on.
The types AkdKey
and Values
should be standardized in their naming, definition and usage. They're a little hazard at the moment. (one is Akd...
the other is just ...
, etc).
Additionally we should migrate to binary (&[u8]
) instead of strings to save on storage sizing for at least the value, if not the key also. All our operations are done on the binary values, so it's just for readability that it's strings. This will induce data-layer changes in the akd_mysql
crate as well, as we'll be changing data-layer types.
In a distributed deployment, the cache may contain stale records. We want the cache to be invalidated when a new Azks
record with a higher epoch number is detected, so that stale HistoryTreeNodes
are not used to generate proofs. This involves remembering the epoch of the Azks
record in the cache, and at TTL expiration checking it against the newly fetched Azks
record to see if the epoch number changed. If it has, the entire cache should be dumped.
There's a lot of error codes in the errors.rs
library which are incorrectly utilized or not utilized at all. These codes should be cleaned up to only what's necessary and not duplicated in multiple structures.
For big-sized trees, mysql_tests
are failing in batch_get
due to errors similar to
ERROR HY000 (1366): Incorrect integer value: '??%????\u{3}?\\??N??\u{4}a??H??~~S??!DU9D' for column 'label_val' at row 1
Profiling using a Rust profiler, such as those documented: https://nnethercote.github.io/perf-book/profiling.html
Currently the SeemlessDirectory
struct stores a hashmap for UserData
. This needs to be moved to the storage layer.
Add examples for how to use SEEMlessDirectory's various operations in the examples directory.
Due to the change in the storage contract, there are delays in the time between publishing akd
to crates.io and the new version being available to cargo to build akd_mysql
.
This happens because when breaking the contract between akd
and akd_mysql
, akd_mysql
will rely on the latest version of akd
which doesn't exist in crates.io until it's published. For example, if both crates are at v1.0.0, and we change the contract, and publish v2.0.0, akd_mysql
depends on akd
v2.0.0 which was _just_published. There appears to be a race in crates.io that the package isn't instantly available after publish, so the cargo publish --dry-run
balks saying akd
v2.0.0 can't be found but since we don't have a multi-publish, akd
v2.0.0 is published and cannot be reverted. Therefore we're in a "broken" state where half of the crates are published.
The current workaround is we need to publish akd
with the update, akd_mysql
will fail, and we version bump both and publish again. The 2nd time crates.io has updated and the index is updated, akd_mysql
can build and publish.
See PRs tagged with publish_race
to identify instances where this is needed.
This is ugly and just clutters crates.io with partial versions of akd
with sometimes matching akd_mysql
crates.
The append only proof causes stack overflow in case we start from a point where the root had no children, for example.
In production, under high-load we might need the ability to perform "bulk" lookup proofs. This means that we should do the same breadth-first-search for a collection of users as is done under the Directory::publish
operation (as is done in append_only_zks.rs:191
).
This means, when doing a lookup we should calculate all the prefixes and pre-load (via BFS walk of batch-retrievals) all the nodes along the path into the cache. Then we can generate the proofs without doing many individual retrievals.
To prevent a split-view attack the root hash of every epoch needs to have a signature signed by a private key which cannot be leaked from the quorum (via shared-secret methodology). This quorum participates to independently validate the append-only proof of the key directory and each participant provides their partial shard of the quorum signing key and when enough participants agree, the changes are signed off on and stored in persistent storage.
That way a proof only needs to give the root hash, and the signature on it to ascertain the quorum has agreed on the changes, and the AKD (or any other 3rd party) cannot generate its own signatures.
Eventually these auditors can be participants from external entities who can participate in the quorum vote.
A Quorum Key is generated at the beginning of the life of the quorum, and the private key is broken into "shards" via Shamir secret sharing.
These shards are transmitted to the quorum participants who hold them in secure storage. The shards are generated with the following properties
The collection of nodes in this quorum receive commands from an external 3rd party (key directory epoch notification or admin interface for example).
The messages they can receive are the following
For any of these messages, whichever node receives the request is denoted as the leader. We do not need the full RAFT
protocol, since we have no need for a persistent leader and the nodes are effectively stateless in between operations.
The temporary request leader is responsible for communicating with the other quorum members and gathering
their votes, reconstructing the shards, and either signing the commitment or enrolling the new member (re-generating shards and transmitting them).
Since the public key is available to anyone who cares, external parties can read directly from the storage layer to minimize I/O to the quorum, as they
will likely be resource constrained with validating the commitments between epochs and processing membership changes.
NOTE: If the storage layer is mutated directly, then the quorum will start failing commitments and signatures will not match so the system fault will be detectable.
Writes can be made more efficient if we don't write dummy nodes -- seemingly this won't affect functionality.
The code in history_tree_node.rs
and append_only_zks.rs
can be further simplified. A few suggestions:
unwrap()
.We would like for the Storage
to always get string values and get string values. The specific implementation of SEEMless could then serialize and deserialize accordingly.
enum
that encompasses all the types of data a SEEMless directory may ever need to store.With adding the limited_key_history
call in directory.rs
we now need the ability to select a series of user values given a minimum epoch (start_epoch
). This should be a new call on the storage contract which doesn't exist today.
For usage, see directory.rs:L366-368
.
We'd like to batch reads and writes so that there is a balance between the amount of required serialization and the amount of cache used to store deserialized structs in memory.
Every pub struct
should implement Serialize and Deserialize from serde.
At the very least, in https://github.com/novifinancial/akd/blob/main/akd/src/proof_structs.rs, all of these structs should also #[derive(Serialize, Deserialize)]
.
Move the tree_repr_set
and tree_repr_get
to Storable trait to cache all Storables at any time.
In a production system, data in storage is pruned to manage storage size and adhere to privacy commitments. For AKD, the only prunable record is ValueState
, which contains the data payload (key material for E2EE messaging).
Today, operations fail when ValueState
cannot be retrieved. We need to modify the library to handle the scenario gracefully and propagate the information to the caller.
Right now, when trying to migrate over state maps to use Storage, we are running into an issue that I think might be caused by this line:
https://github.com/novifinancial/SEEMless/blob/main/src/history_tree_node.rs#L173-L175
Here, we are potentially copying a child, implicitly cloning its state map. This gets problematic when we move to global storage. This could be solved by ensuring that set_node_child_without_hash
doesn't return anything, and we find a better way to structure the code to achieve the same thing (without having to clone a child).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.