tf-encrypted / moose Goto Github PK
View Code? Open in Web Editor NEWSecure distributed dataflow framework for encrypted machine learning and data processing
License: Apache License 2.0
Secure distributed dataflow framework for encrypted machine learning and data processing
License: Apache License 2.0
isort
should be applied with the --recursive
flag in the makefile. A first attempt of simply adding the flag caused it to misrecognize certain subdirectories, including compiler
. Fixing this might actually be that easy once #25 is completed.
In #59, we found that CI would fail when the new patch of python (3.8.6) was installed. There were two specific problems encountered, and it's unclear what triggered each of them:
venv/bin/python
was not found, suggesting that there's a broken symlink as a result of the difference in python versionsIt was not straightforward to fix, so I left it alone. We can consider fixing this if it becomes enough of a problem.
do codeblocks work
should be reported by cape-bot
Ben: It is a little vague what we actually want to do here. We know CPP usually has to restart their worker at the beginning of working sessions but not much more. In the absence of having strong deliverables I would suggest we timebox this to a few days and do whatever we can to make the workers more resilient.
Original Clickup: https://app.clickup.com/t/f90mp2
Look into Python's AST following the idea of mlir-npcomp
Computations are currently (loosely) defined by Python classes and serialized using JSON/Marshal. Both of these are bad, especially the fact that Marshal is not a secure way of exchanging data.
This issue is about deciding on how we represent computations in a more general format (protobuffers? flatbuffers? something else?) and how we serialize them.
Hello again gang. It looks like during the Rust migration of Moose, the storage trait/base class lost its query parameter. From the Python code, we can see that there is a query parameter which is a json encoded string.
import abc
class DataStore:
@abc.abstractmethod
async def load(self, session_id, key, query):
pass
@abc.abstractmethod
async def save(self, session_id, key, value):
pass
What we have in Rust looks like this. We no longer have a parameter for the query.
#[async_trait]
pub trait AsyncStorage {
async fn save(&self, key: &str, val: &Value) -> Result<()>;
async fn load(&self, key: &str, type_hint: Option<Ty>) -> Result<Value>;
}
My question is, do we want to have this query be a part of the Rust trait? If so, do we want to continue using a json string, or should we leverage Rust's expressive type system to encode a query? My vote would be to use Rust types, but I can see the appeal either way. I also think that we should include a way when saving a tensor to optionally specify the column names of the tensor, this way when we load the tensor, it will be more consistent with any other CSV, and we can select specific columns to load.
We need access to Elk one way or another. For now we might simply assume that it is available on the system. Concretely, this issue is about make that valid for the docker image.
This issue is about implementing a mean operation for replicated placements. The suggested strategy is to introduce std::mean
, fixed::mean
, replicated::mean
, and ring::mean
. Mid-term we might have to more things (operations) in place that could allow the unroll to happen earlier in the layers (not even sure why that would be relevant?), but waiting with this means we don't have to introduce public/clear replicated tensors etc. Note that ring::mean
is essentially a fused kernel.
fixed_ring_mean
)fixed_ring_mean
, calling ring_sum
and ring_mul
but with custom code for computing the (fixedpoint) denominatorPassing data should go from i -> i+1
I suggest that we move all source files under a subdirectory, to more clearly separate them from examples and docker files. If we had a name for the runtime yet it would be natural to use this for the subdirectory, but lacking it I suggest codename moose
. This would leave us with the following directory structure:
/docker
: everything related to building docker images, currently only for the dev worker/examples
: self-contained examples which assume that Moose has already been installed on the system/moose
: everything source code related, including setup.py
and requirements-dev.txt
/main.py
: moves into examples, and either becomes an example or works as a template for creating a new example/README.md
: stays putwdyt @jvmncs @yanndupis @rdragos?
Instead of packaging the runtime independently it might make more sense to essentially consider it as a library from which runtimes can easily be built (kind of like LLVM/MLIR). In particular, projects like TFE and Cape Federated can build small wrappers around the library to each have their own specific runtime with for instance custom operations and kernels. If we indeed go for this then it may make less sense to distribute a pip package, but rather offer only a source code release or a rust crate.
Runtime computations currently only reference Python native types (e.g. float/string/int/etc.) across different Placements in the EDSL. We are already seeing the need for ndarrays/tensors and Keras models to be used this way as well, and it's likely that we will continue to add types in the future. For clarity, it would be helpful to have a dedicated interface/abstraction for these, so that there are a well-established set of "traits" that such objects must implement in order to be recognizable & useable by our @computation
decorator.
We don't use gRPC for tests, so this isn't covered by just testing ops for correctness in the TestRuntime
This includes READMEs.
Original clickup ticke: https://app.clickup.com/t/dx2y33
For graph visualization (and debugging) we would like to have support for name scoping. These should have no effort at runtime but simply add a bit of superficial meta-structure to computations that can be very useful in certain situations. See for instance https://www.tensorflow.org/api_docs/python/tf/name_scope.
An alternative to name scopes would be to have layered graphs, where hints are added to the computation to allow upper layers to function as an interpretation of lower layers, including interpretations for higher level tensors based on lower level tensors. The tricky part here seems to be that of what to do when we optimize a graph by either merging or pruning nodes. This means we will have to deal with composite higher level nodes (eg add+add+add) and partial higher level edges (with some component values not being computed).
Another related concept is that of sub-graphs and/or sub-computations, although these have an effort at runtime. We should also figure out how relevant name scopes are given we have support for sb-graphs and sub-computations.
Currently we hard-code the path of the MP-SPDZ directory in the kernels to /MP-SPDZ
. This is fine for the docker images but won't work if Moose is being executed natively.
Set up some combination of tools like Github Advanced Security, dependabot, or similar to scan for vulnerabilities in the tf-encrypted code base.
Copying note from other ticket: maybe there is some inspiration via libsodium
Morten:
Here's what I set up already, but waiting for us to move out of the rust root directory: https://github.com/tf-encrypted/runtime/blob/main/.github/workflows/audit.yml
Been wanting to look further to what's done for libsodium as well https://libsodium.gitbook.io/doc/internals#static-analysis
Thor:
That's great! I'm not familiar with any projects that do static analysis on rust code. I've heard of a few thesis projects to implement something but haven't even heard about any runnable results being made available.
Regarding cargo-audit: I'm pretty sure that's also what dependabot uses, so you might be able to get PRs that run tests automatically when there are updates available to remediate problems.
The audit action which is the most reasonable near-term solution can't find the rust code to scan because it's in a nested directory. (Specifically, Cargo.toml is under rust/
rather than at the root of the repo.)
Once the rust worker is live in prod, the rust dir will be promoted to replace the python code at the root of the repository, and this action can be enabled.
Lex:
My previous company open sourced this: https://github.com/sonatype-nexus-community/cargo-pants It accepts a path to Cargo file.
Original Clickup: https://app.clickup.com/t/pd6qgu
Original clickup ticket: https://app.clickup.com/t/d311eu
This is a collection of tasks related to debugging the runtime, which should be addressed as part of migration.
Original clickup tickets: https://app.clickup.com/t/f90mkm
Run some symbolic links with sess_id/invoc_key/Player-Data.
We also need to figure out the port number (which should be a hash of the two keys).
We should retest benchmarks as we migrate to Rust and also test on a variety of worker setups (different size of workers different amount of memory for workers)
KJ: do you think this is trivial with the benchmarking script you made? as in, will it work automatically with the new runtime (I imagine?) if so (or not!) can you add a time estimate here?
Yann: Correct. Having the new runtime won't impact the benchmarking script since it interacts only with pycape (create datasets with different sizes, add dataviews, run a job etc.). So it will be trivial to re-benchmark. What could be more time consuming is if we want to benchmark with different machines (need launch new workers with new config etc.) but that's not a problem.
Original Clickup: https://app.clickup.com/t/f908fa
changing encode type
Right now we only have a very basic handling of errors during async execution. So to-be-decided efforts should be made to improve upon this for both development and application.
Per #29 it is currently only ApplyFunctionExpression
s that go through the placement's compiler, yet the placement should have a saying in any type of expression assigned to it, including potentially rejecting certain expressions that it doesn't support (say, non-arithmetic expressions on the MpspdzPlacement
).
we get some warning in the test scripts due to using the name TestRuntime
The runtime currently doesn't check nor use the identities of connecting peers, it only uses mTLS to ensure that peers have valid certificates. This issue is about figuring out where we want to employ ACLs and implement it.
The key part for extracting identity is context.peer()
and context.peer_identities()
using the context
passed into servicer methods.
I suggest we move each of the three current examples into their own subdirectory, and move docker/dev/docker-compose.yaml
into one or more of them (and adapt it according to each example). We should also add a small README.md to the examples directory containing the relevant parts of docker/dev/README.md
(ie including the parts about Docker Compose and excluding the parts about how to build the worker dev image)
Hi gang,
If we take a look at the following line: https://github.com/tf-encrypted/runtime/blob/7254b6230fdf0af1061e3848e419b2514a5875cc/rust/moose/src/storage.rs#L13
The save and load methods on the storage traits take exclusive ownership of the data passed to them. I propose that instead of taking ownership, we instead borrow the values in these methods.
What do you think?
Currently, if you write something like this, you get a compiler error because the variable key
is owned by the first function that it gets passed to, storage.save.
let expected = Float64Tensor::from(
array![[1.0, 2.0], [3.0, 4.0]]
.into_dimensionality::<IxDyn>()
.unwrap(),
);
let key = "my-object".to_string();
let storage = S3SyncStorage::default();
storage.save(key, Value::from(expected));
let loaded = storage.load(key);
Compiler error message:
error[E0382]: use of moved value: `key`
--> src/cape/storage.rs:205:35
|
202 | let key = "my-object".to_string();
| --- move occurs because `key` has type `std::string::String`, which does not implement the `Copy` trait
203 | let storage = S3SyncStorage::default();
204 | storage.save(key, Value::from(expected));
| --- value moved here
205 | let loaded = storage.load(key);
| ^^^ value used here after move
pub trait SyncStorage {
fn save(&self, key: &String, val: &Value) -> Result<()>;
fn load(&self, key: &String) -> Result<Value>;
}
#[async_trait]
pub trait AsyncStorage {
async fn save(&self, key: &String, val: &Value) -> Result<()>;
async fn load(&self, key: &String) -> Result<Value>;
}
Read and write functionality from MP-SPDZ I/O files.
As a data scientist, I would like to know if my job will fail beforehand due to a shape mismatch. If possible, this shoudl happen before the job actually runs and tell me the error. If it is not possible to check beforehand, we should surface helpful error messages that explain expected and real shapes.
Original Clickup: https://app.clickup.com/t/jazxk1
do it in a separate issue?
Originally posted by @mortendahl in #49 (comment)
See eg #39 and its comments. Main question is: how can we avoid the runtime depending on TensorFlow yet still allow strong support for it in eg Cape Federated?
It could be easier to install. Here's a few suggestions:
requirements-dev.txt
when a dev install is detected (what could we do there?)make build
Goal is to compile MLIR to MPC using Elk, and compile MPC to bytecode using MP-SPDZ. Result must be assigned to call MP-SPDZ operations during eDSL processing.
Idea is that specification of computation is given in the form of Rust structs, and we use serde to serialize into eg JSON or MsgPack.
See https://crates.io/crates/rmp-serde and https://pypi.org/project/pyserde/
clickup ticket: https://app.clickup.com/t/d2wpyw
A failing test_run_program
will currently return the latest result from test_op
, perhaps due to the global state used to support placements as context managers.
To avoid global state, all eDSL operations could be improved to take a placement as an optional argument, so that
with plc:
z = add(x, y)
is really just syntactic sugar for
z = add(x, y, placement=plc)
where the former can be used for convenience and the latter must be used in tests.
Add support for output types to all expressions and operations.
This issue is about going through the runtime code base and gather a list of tasks that could improve the resilience for the runtime, including recovering from an error during computation evaluation.
This includes making sure Moose doesn't silently error when an argument is missing.
Need to make sure we parse in the correct ids/workers from the yaml files. See how this was done for the MP-SPDZ example.
CI is currently using ubuntu-latest
, yet as soon as we add support for eg MP-SPDZ it would be nice to instead use either the worker docker image or a new dev docker image (that the worker image could potentially be derived from) where all dependencies are already installed.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.