tf-encrypted / tf-encrypted Goto Github PK

A Framework for Encrypted Machine Learning in TensorFlow

License: Apache License 2.0

Python 88.36% Shell 0.77% Dockerfile 0.03% Makefile 1.17% C++ 9.41% Starlark 0.25%

confidential-computing cryptography deep-learning machine-learning privacy secure-computation tensorflow

tf-encrypted's Introduction

TF Encrypted is a framework for encrypted machine learning in TensorFlow. It looks and feels like TensorFlow, taking advantage of the ease-of-use of the Keras API while enabling training and prediction over encrypted data via secure multi-party computation and homomorphic encryption. TF Encrypted aims to make privacy-preserving machine learning readily available, without requiring expertise in cryptography, distributed systems, or high performance computing.

See below for more background material, explore the examples, or visit the documentation to learn more about how to use the library.

Now, TF Encrypted is based on tensorflow 2 !

TF1 execute computation by building a graph first, then run this graph in a session. This is hard to use for a lot of developers, especially for researchers not major in computer science. Therefore, TF1 has very few users and is not maintained any more. Since TF Encrypted based on TF1, it face the same problem that TF1 has encountered. So we update TF Encrypted, to rely on TF2 to support eager execution which makes development on TF Encrypted more easier. At the same time, it also supports building graph implicitly by tfe.function to realize nearly the same performance as TF Encrypted based on TF1. Unfortunately, after updated, TF1 features like session and placeholder are not supported by TFE any more. For those developers who want to use TF1 like TFE, we suggest them to use version 0.8.0.

Installation

TF Encrypted is available as a package on PyPI supporting Python 3.8+ and TensorFlow 2.9.1+:

pip install tf-encrypted

Creating a conda environment to run TF Encrypted code can be done using:

conda create -n tfe python=3.8
conda activate tfe
conda install tensorflow notebook
pip install tf-encrypted

Alternatively, installing from source can be done using:

git clone https://github.com/tf-encrypted/tf-encrypted.git
cd tf-encrypted
pip install -e .
make build

This latter is useful on platforms for which the pip package has not yet been compiled but is also needed for development. Note that this will get you a working basic installation, yet a few more steps are required to match the performance and security of the version shipped in the pip package, see the installation instructions.

Usage

The following is an example of simple matmul on encrypted data using TF Encrypted based on TF2, you could execute matmul in eager mode or building a graph by @tfe.function.

import tensorflow as tf
import tf_encrypted as tfe

@tfe.local_computation('input-provider')
def provide_input():
    # normal TensorFlow operations can be run locally
    # as part of defining a private input, in this
    # case on the machine of the input provider
    return tf.ones(shape=(5, 10))

# provide inputs
w = tfe.define_private_variable(tf.ones(shape=(10,10)))
x = provide_input()

# eager execution
y = tfe.matmul(x, w)
res = y.reveal().to_native()

# build graph and run graph
@tfe.function
def matmul_func(x, w)
    y = tfe.matmul(x, w)
    return y.reveal().to_native()

res = matmul_func(x, w)

For more information, check out the documentation or the examples.

Performance

All tests are performed by using the ABY3 protocol among 3 machines, each with 4 cores (Intel Xeon Platinum 8369B CPU @ 2.70GHz), Ubuntu 18.04 (64bit), TensorFlow 2.9.1, Python 3.8.13 and pip 21.1.2. The LAN environment has a bandwidth of 40 Gbps and a RTT of 0.02 ms, and the WAN environment has a bandwidth of 352 Mbps and a RTT of 40 ms.

You can find source code of the following benchmarks in ./examples/benchmark/ and corresponding guidelines of how to reproduce them.

Benchmark 1: Sort and Max

Graph building is a one-time cost, while LAN or WAN timings are average running time across multiple runs. For example, it takes 152.5 seconds to build the graph for Resnet50 model, and afterwards, it only takes 19.1 seconds to predict each image.

	Build graph (seconds)	LAN (seconds)	WAN (seconds)
Sort/Max (2^10)¹	5.26	0.14	10.37
Sort/Max (2^16)¹	14.06	7.37	41.97
Max (2^10 $\times$ 4)²	5.63	0.01	0.55
Max (2^16 $\times$ 4)²	5.81	0.29	1.14

¹ Max is implemented by using a sorting network, hence its performance is essentially the same as Sort. Sorting network can be efficiently constructed as a TF graph. The traditional way of computing Max by using a binary comparison tree does not work well in a TF graph, because the graph becomes huge when the number of elements is large.

² This means 2^10 (respectively, 2^16) invocations of max on 4 elements, which is essentially a MaxPool with pool size of 2 x 2.

Benchmark 2: Neural Network Inference

We show the strength of TFE by loading big TF models (e.g. RESNET50) and run private inference on top of it.

	Build graph	LAN	WAN
VGG19 inference time (seconds)	105.18	49.63	139.63
RESNET50 inference time (seconds)	150.47	19.07¹	84.29
DENSENET121 inference time (seconds)	344.55	33.53	151.43

¹ This is currently one of the fastest implementation of secure RESNET50 inference (three-party). Comparable with CryptGPU , SecureQ8, and faster than CryptFLOW.

Benchmark 3: Neural Network Training

We benchmark the performance of training several neural network models on the MNIST dataset (60k training images, 10k test images, and batch size is 128). The definitions of these models can be found in examples/models.

	Accuracy (epochs)	Accuracy (epochs)	Seconds per Batch (LAN)	Seconds per Batch (LAN)	Seconds per Batch (WAN)	Seconds per Batch (WAN)
	MP-SPDZ	TFE	MP-SPDZ	TFE	MP-SPDZ	TFE
A (SGD)	96.7% (5)	96.5% (5)	0.098	0.167	9.724	4.510
A (AMSgrad)	97.8% (5)	97.4% (5)	0.228	0.717	21.038	15.472
A (Adam )	97.4% (5)	97.4% (5)	0.221	0.535	50.963	15.153
B (SGD)	97.5% (5)	98.6% (5)	0.571	5.332	60.755	18.656
B (AMSgrad)	98.6% (5)	98.7% (5)	0.680	5.722	71.983	21.647
B (Adam)	98.8% (5)	98.8% (5)	0.772	5.733	98.108	21.130
C (SGD)	98.5% (5)	98.7% (5)	1.175	8.198	91.341	27.102
C (AMSgrad)	98.9% (5)	98.9% (5)	1.568	10.053	119.271	66.357
C (Adam)	99.0% (5)	99.1% (5)	2.825	9.893	195.013	65.320
D (SGD)	97.6% (5)	97.1% (5)	0.134	0.439	15.083	5.465
D (AMSgrad)	98.4% (5)	97.4% (5)	0.228	0.900	26.099	14.693
D (Adam)	98.2% (5)	97.6% (5)	0.293	0.710	54.404	14.515

We also give the performance of training a logistic regression model in the following table. This model is trained to classify two classes: small digits (0-4) vs large digits (5-9). Dataset can be found in examples/benchmark/training/lr_mnist_dataset.py

	Accuracy (epochs)	Seconds per Batch (LAN)	Seconds per Batch (WAN)
LR (SGD)	84.8% (5)	0.010	0.844
LR (AMSgrad)	85.0% (5)	0.023	1.430
LR (Adam)	85.2% (5)	0.019	1.296

Roadmap

High-level APIs for combining privacy and machine learning. So far TF Encrypted is focused on its low-level interface but it's time to figure out what it means for interfaces such as Keras when privacy enters the picture.
Tighter integration with TensorFlow. This includes aligning with the upcoming TensorFlow 2.0 as well as figuring out how TF Encrypted can work closely together with related projects such as TF Privacy and TF Federated.
Support for third party libraries. While TF Encrypted has its own implementations of secure computation, there are other excellent libraries out there for both secure computation and homomorphic encryption. We want to bring these on board and provide a bridge from TensorFlow.

Background & Further Reading

Blog posts:

Introducing TF Encrypted walks through a simple example showing two data owners jointly training a logistic regression model using TF Encrypted on a vertically split dataset (by Alibaba Gemini Lab)
Federated Learning with Secure Aggregation in TensorFlow demonstrates using TF Encrypted for secure aggregation of federated learning in pure TensorFlow (by Justin Patriquin at Cape Privacy)
Encrypted Deep Learning Training and Predictions with TF Encrypted Keras introduces and illustrates first parts of our encrypted Keras interface (by Yann Dupis at Cape Privacy)
Growing TF Encrypted outlines the roadmap and motivates TF Encrypted as a community project (by Morten Dahl)
Experimenting with TF Encrypted walks through a simple example of turning an existing TensorFlow prediction model private (by Morten Dahl and Jason Mancuso at Cape Privacy)
Secure Computations as Dataflow Programs describes the initial motivation and implementation (by Morten Dahl)

Papers:

Privacy-Preserving Collaborative Machine Learning on Genomic Data using TensorFlow outlines the iDASH'19 winning solution built on TF Encrypted (by Cheng Hong, et al.)
Crypto-Oriented Neural Architecture Design uses TF Encrypted to benchmark ML optimizations made to better support the encrypted domain (by Avital Shafran, Gil Segev, Shmuel Peleg, and Yedid Hoshen)
Private Machine Learning in TensorFlow using Secure Computation further elaborates on the benefits of the approach, outlines the adaptation of a secure computation protocol, and reports on concrete performance numbers (by Morten Dahl, Jason Mancuso, Yann Dupis, et al.)

Presentations:

Privacy-Preserving Machine Learning with TensorFlow, TF World 2019 (by Jason Mancuso and Yann Dupis at Cape Privacy); see also the slides
Privacy-Preserving Machine Learning in TensorFlow with TF Encrypted, O'Reilly AI 2019 (by Morten Dahl at Cape Privacy); see also the slides

Other:

Privacy Preserving Deep Learning – PySyft Versus TF Encrypted makes a quick comparison between PySyft and TF Encrypted, correctly hitting on our goal of being the encryption backend in PySyft for TensorFlow (by Exxact)
Bridging Microsoft SEAL into TensorFlow takes a first step towards integrating the Microsoft SEAL homomorphic encryption library and some of the technical challenges involved (by Justin Patriquin at Cape Privacy)

Development and Contribution

TF Encrypted is open source community project developed under the Apache 2 license and maintained by a set of core developers. We welcome contributions from all individuals and organizations, with further information available in our contribution guide. We invite any organizations interested in partnering with us to reach out via email.

Don't hesitate to send a pull request, open an issue, or ask for help! We use ZenHub to plan and track GitHub issues and pull requests.

Individual contributions

We appreciate the efforts of all contributors that have helped make TF Encrypted what it is! Below is a small selection of these, generated by sourcerer.io from most recent stats:

Organizational contributions

We are very grateful for the significant contributions made by the following organizations!

Project Status

TF Encrypted is experimental software not currently intended for use in production environments. The focus is on building the underlying primitives and techniques, with some practical security issues postponed for a later stage. However, care is taken to ensure that none of these represent fundamental issues that cannot be fixed as needed.

Known limitations

Elements of TensorFlow's networking subsystem does not appear to be sufficiently hardened against malicious users. Proxies or other means of access filtering may be sufficient to mitigate this.

Support

Please open an issue, or send an email to [email protected].

License

Licensed under Apache License, Version 2.0 (see LICENSE or http://www.apache.org/licenses/LICENSE-2.0). Copyright as specified in NOTICE.

tf-encrypted's People

Contributors

Stargazers

Watchers

Forkers

jsn5 adrianlsk morgangiraud iamtrask arjunbahuguna ivantha jvmncs bendecoste gavinuhma domenicrosati codeaudit deepak-n replomancer mbrukman dgotrik abarak-biu tracycuiq jdetras benzei v7t-codes robert-wagner fengyinyang sreekarm redshiftzero kamathhrishi saurabh23 sotte sysudengle mirsci jaytoday jyericlin littlebeanfang yanndupis zicofish voidxb capeprivacy idsdarg avitalsh justin1121 vms-6511 jopasserat stjordanis shammishailaj vampsj francescasrc nocoldbob jaykimbravekjh evonneh ajnovice cmc1023 niklausliu shyamalschandra nikhil3456 backwardn mounikapratapa midokura-silvia chinmay007007 burhanusman amirstudy syyunn renjie-liang xuzhang5788 pingcsu udemirezen paulxiong dankot12 firmanm jhjiangcs arrowsides yugandhartripathi qizhi-zhang 0xqq littlestonelover alibaba-gemini-lab vincehong fionnoif shaweiwu e7dal arash-afshar llcurious jazken smarthi bojienicoleyang zhaoyang626 francisobiagwu schmons bruinxiong allandproust arnoldandersson123 imranayari pkuliuliu bobsonlin26 mkfhe-ado aibharata smashra eshnil2000 zeta1999 sunhaibo2004 carlzhangweiwen anigasan

tf-encrypted's Issues

Setup circle CI

with build tag on readme!

Allow decoding as tf.Tensor for OutputReceiver

Add flag to specify enabling/disabling debug output

This way we don't write output unless we need to!

Different tensorflow protobuf formats

There are a few different tensorflow protobuf formats. They are:

GraphDef (https://www.tensorflow.org/extend/tool_developers/)
MetaGraphDef (https://www.tensorflow.org/api_guides/python/meta_graph)
SavedModel (https://www.tensorflow.org/guide/saved_model)

For importing we currently only support GraphDef which we're assuming it has the weights frozen directly in it. There are also cases where the GraphDef might have the weights stored in separate checkpoint files.

We need to investigate whether its better to support all three versions when converting a pre-trained model to and MPC graph or if supporting only GraphDef is sufficient.

My gut tells me we can get away with supporting only GraphDef for a while and once a MPC graph is built we can export that to a SavedModel for use with tensorflow serving (if we figure out that we can make use of tensorflow serving).

Fix config.py

rename 'master' node

We should consider renaming the 'master' node to 'api' or similar to avoid the master/slave dichotomy that is now considered politically incorrect.

More robust conv2d

The current convolution implementation isn't as robust as the built in tensorflow one. If we're going to be able to import any tensorflow graph we'll need to match the robustness of that implementation.

Two main issues I've come across:

tensorflow takes 2 or 4 strides for a 2d convolution. I think right now we only support 1 stride
tensorflow can support both NCHW and NHWC format whereas we only support NCHW

There are probably other edge cases and at some point we should probably test all the different edge cases.

Explicit caching operation

The cache operation allows for reuse of values between session runs by basically storing tensors in variables behind the scene. This code was present in tensorspdz but have not been ported yet.

Note that the cache update strategy used in tensorspdz might not be fine-grained enough for future applications and should probably be reconsidered.

Unit testing framework

Make it easy to add and run unit tests; introduce some as well.

Clean up Int100Tensor by moving to CrtTensor

averagePooling2d SAME padding issue

Currently, AveragePooling2D works when the input is "tiled" by the kernel/stride settings, (i.e. a 2x2 kernel over a 4x4 input with stride 2), and this is true when padding is SAME or VALID. It also works for more general cases when padding is VALID. However, when padding is SAME and the pool_size requires zero padding in order to complete the last pooling patch, the behavior in TFE diverges from TF. This bug relates to how the count_include_pad argument works in pytorch.

In TFE, it's as if we've chosen count_include_pad=True, so that the zero padding gets included in our calculation of the average.
In TF, pooling with SAME padding works as if count_include_pad=False, so that the zero padding does not get included in the calculation of the average.

Ideally, we'd just switch, but our current AveragePooling2D implements its average by summing over the correct pooling sections, constructing a public/private/masked tensor from the result of that sum, and then multiplying with the inverse of pool_height*pool_width. However, this means that the zero padding (if it exists) must necessarily be included in the average.

The only way to fix it within the current paradigm (still doing a private-public multiply after the sum operation) would be to separate the summed tensor into patches where the zero padding is included, multiply those with pool_height*pool_width - number_of_padded_zeros instead, and then recombine them all into the correct output tensor with padding not included. the major problem here is that we'd likely lose any speed we gain from using im2col in the first place, and a cleaner solution would be to do everything we need straight from the output of tf.extract_image_patches.

Definitely open to suggestions here! We should consider the trade-offs of overengineering this piece for complete compatibility versus documenting the discrepancy and advising users to stick with certain padding settings when doing pooling.

Unencrypted protocol

Implement an tfe.protocol.Unencrypted protocol for debugging purposes.

int64 native support in TF

Using int64 natively in TF, would allow to gain substantial speed up by leveraging the ring it creates.

We need to investigate how much work should be done to use int64 operations in TF.

related tensorflow/tensorflow#21253

Setup Continuous Integration

Tests, linters, etc., should run on every pull request before they get merged.

We could look at Circle CI or Travis CI to achieve this :)

How does tf-encrypted integrate with TF Serving?

Tensorflow Serving is the defacto serving environment for predicting on a model trained in tensorflow.

We should figure out how this framework could/would integrate with that ecosystem component.

Remove unneeded dependencies on Pond

conv2d backward

Implement and test backward function in Ponds conv2d layer. This function computes the backpropagated error d_x and the weightupdate d_w.

New Benchmark using tfe

Goals:

Benchmarking tfe computation time locally with 3 players using a 5 layers conv2d + sigmoid models
Benchmarking tfe computation time remotely with 3 players on AWS around North America with 5 layers conv2d + sigmoid

Train suite of models with Tensorflow to be benchmarked

Minor loss in precision

There is a minor loss in precision

cc @morgangiraud

Implement replicated secret sharing (From ABY3)

Load a pretrained model for prediction

Details to come

securing TF itself

As pointed out in this README: https://github.com/tensorflow/tensorflow/blob/master/SECURITY.md

TF is not secure by default because it is intended to run on a controlled cluster. We won't have control over the cluster obviously, so we need to find out how to actually secure TF efficiently.

Get logreg estimator up and running again

manual log reg in pond
wrap up in estimator again

Compiling graphs

Look into whether this gives any significant performance boosts:

Explicit typing

Currently there's a lot of type assertions in the code to fail fast in case the wrong type is being passed to a method. Most if not all of these could be replaced by type annotations and the mypy type checker.

One session run, multiple predictions

We are currently using TF session.run() for each and every prediction. this is known to be sub-efficient.

In our case, it's a killer because TF spend sometimes optimizing the graph for each run and this time can be pretty big.

In production, served models should be loaded once and kept hot, listening for the next input to come. The goal here is to find an "in-between" solution to avoid tf.serving for now and yet be efficient with a python flask server.

Document project status

Right now the project is pre-alpha, it's pretty much a proof of concept! We don't have any guarantees around stability and it's far from a polished or working product.

We should advertise this fact in the README.

Split Pond conv2d into im2col and dot product

In the current implementation of conv2d in pond the im2col operation is not a node in our graph. When we split conv2d into two separate nodes, we can re-use the output of im2col in the backward phase, which is computationally more efficient.

Connection attempt never times out

If for some reason we can't connect to one or more nodes, the program just sits stuck and never times out ever. We should add some reasonable defaults (say 30s)?

Allow options to be passed to GCP scripts

create should put $@ at the end to allow for eg passing in different machine types
change README to mention this in case CPU quota is reached
make 1 cpu default and suggest 4 cpu if allowed by account?
also change order in delete

PyPI Registration

Placeholder issue to register tf-encrypted with PyPI in the future for managing releases and pip installation. We've decided to hold off on this while we work on new features and figure out other UX/deployment issues first.

Python package

Make the library available as an easy to use package.

Support more configutations

Re-enable configuration support for non-local execution.

See for instance tensorspdz.

Easier Config objects

Finish conv2d

Some combinations for Pond's conv2d is missing.

Parse remote config objects files

Save and load tfe.RemoteConfig objects from (json) files for easier deployment of servers.

Use TFRecordDataset to send binary data (eg garbled circuits)?

Could we use TFRecordDataset to transfer garbled circuits from one host to another?

Implement SecureNN protocol

The 3/4-party SecureNN protocol claims good performance as well as offering additional activation functions such as ReLU without switching GC (using existing protocols that seems adaptable to the 2-party setting as well).

Flatten operation import conversion

To be able import a flatten operation we need to be able to run the following operations on the underlying tensors:

Pack
Reshape
StridedSlice
Shape

64bit tensors with natural mod reductions

For some applications 64bit might be enough, in which case we can not only avoid the use of the CRT also potentially switch to "native modulo reductions" in the form of natural wrap-arounds as opposed to explicit % operations. This combined setting is for instance used in SecureML.

Setup linting

We should add a linter to the code base so our code is conformant!

support transpose conversion without perm specified

Transpose without a perm specified is a little more complicated and requires another op conversion to be implemented. This op is Rank which just returns the number of dimensions of the tensor.

Curated onboarding experience for contributors

We want to make it easy for anyone to get up and running with tf-encrypted and help them contribute to the project.

It'd be nice for us to setup the following:

Write up contributor guidelines including how to get started with development, expectations around issue management, pull requests templates, etc.
A code of conduct which states our expectation for how everyone who works on the project interacts with one another.
A slew of issues labeled as help wanted / good first issue that are clear, concise, and actionable to make it easy for someone to jump and start moving the ball forward.

Thats a few of the things we can do to improve/create an onboarding experience for contributors to the project.

I think these things would make it easier for us to do our work as well!

Easy deployment to AWS/GCP

Make it easy to quickly get up and running on AWS or GCP. Could take the form of documentation or code (perhaps along these lines)

License file

Add Apache2 license file

improve linting

We have currently satisfied the pycodestyle linter but we would like to check for more different errors.
so we are adding pyflakes.

Many other tools could also be used like flake8 or pylint.

It would be interesting to know if those tools are complementary or redundant.
My current understanding is

pyflakes and flake8 are redundant but they are complementary with pycodestyle and pylint .
pylint seems very pedantic and might be too much for a linter

to_native not working in some cases

Seems like to_native on int100 tensor isn't working in some cases.

@jvmancuso ran into this issue and I did as well, not 100% sure what causes it because it generally works.

this can reproduce it, though

input = np.array((1,0,1,0, 0,1,0,1, 1,0,1,0, 0,1,0,1)).reshape(1, 4, 4, 1)
pool_input = prot.define_public_variable(input)
x = pool_input.reshape(1, 1, 4, 4)
x = x.im2col(whatever)
print(f'wow! {x}') # this will call `__repr__` which will call `to_native`

First steps for neural networks

Implement Keras-like layers and models for easily express (sequential) neural networks along the lines of what's done in pond.

Add additional matching example.

Not responding to signal handlers

You can't Ctrl+C or Ctrl+D (send sigint or similar) to a running tf-encrypted process and have it exit and cancel the job.