tensorflow / privacy Goto Github PK
View Code? Open in Web Editor NEWLibrary for training machine learning models with privacy for training data
License: Apache License 2.0
Library for training machine learning models with privacy for training data
License: Apache License 2.0
Running the default tutorial with no arguments in Python 3.5, TF 1.12 gives the following error
File "tutorials/mnist_dpsgd_tutorial.py", line 184, in <module>
tf.app.run()
File "/home/ncarlini/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "tutorials/mnist_dpsgd_tutorial.py", line 169, in main
mnist_classifier.train(input_fn=train_input_fn, steps=steps_per_epoch)
File "/home/ncarlini/.local/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/ncarlini/.local/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/ncarlini/.local/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1237, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/home/ncarlini/.local/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1195, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "tutorials/mnist_dpsgd_tutorial.py", line 86, in cnn_model_fn
train_op = optimizer.minimize(loss=vector_loss, global_step=global_step)
File "/home/ncarlini/.local/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py", line 410, in minimize
name=name)
File "/home/ncarlini/.local/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py", line 570, in apply_gradients
raise ValueError("No variables provided.")
ValueError: No variables provided.
Congrats on today's launch! I'm involved in a project to generate a synthetic version of a restricted dataset, to preserve privacy of the original data. A few months ago, we reviewed some of the research around differential privacy and synthetic data, and determined that it was too nascent to try to apply. Instead we've been synthesizing the dataset using simpler ML models like random forests to sequentially sample from the conditional distribution (our data is flat, and basically only has continuous features).
However, TF Privacy hitting the shelves seems like an advance worth us reconsidering our abandonment of DP. Has TF Privacy been used for synthesizing data before? To do it the way we've done it so far, we'd need to be able to predict quantiles, which we've seen some ways to do like this blog post, and TF Probability seems promising too. I've used TF a bit, but we're not experts by any means, so we'd appreciate any guidance you can offer. Thanks!
I know you fixed the tf.nest problem. But tf 2.0 also removed tf.train.AdagradOptimizer. Could you please do an update for this one as well?
here is the error log:
Traceback (most recent call last):
File "mnist_dpsgd_tutorial_keras.py", line 55, in
from privacy.optimizers.dp_optimizer import DPGradientDescentOptimizer
File "/data/repositories/privacy/privacy/optimizers/dp_optimizer.py", line 187, in
DPAdagradOptimizer = make_optimizer_class(tf.train.AdagradOptimizer)
AttributeError: module 'tensorflow._api.v2.train' has no attribute 'AdagradOptimizer'
Hi all,
The code seems to implement Abadi et al.'s work. However, the way you compute the overall epsilon does not follow the Abadi's paper. I was wondering if you can give me some references about how you calculated the epsilon.
Thanks,
4 hours ago a change was made to privacy/privacy/init.py. I'm using Google Colab for a project and this change leads to a module error: 'No module named "tensorflow_privacy' ". Removing all the "tensorflow_privacy." corrects it.
Because the Optimizer class was moved and does not have a compute_gradients
method anymore:
the check for unchanged compute_gradients
does not work anymore on tf2
This will most likely mean we have to make a few changes to the calls we make (e.g., when we call compute_gradients for each microbatch). I haven't investigated this yet
I implemented DPGradientDescentGaussianOptimizer in pytorch, however, I cannot get the similar accuracies as you mentioned in the tutorial. Using the default parameters in the current file I get about 94% accuracy I tried to tune the parameters I got about 95% accuracy at most . Do you have idea why this is happening?
I tried to modify "tensorflow/python/keras/engine/training_utils.py" according the keras_tutorial:def get_loss_function():
...
name=loss_fn.__name__,
reduction=losses_impl.Reduction.NONE)
I found out only tensorflow>1.12 has the function "get_loss_function(loss):".
And only tensorflow2.0 has the line of code"return losses.LossFunctionWrapper(loss_fn, name=loss_fn.name)". So I guess the tutorial use tensorflow2.0.0?
But When import privacy.analysis. privacy_ledger.py, there are lots of error. like "nest = tf.contrib.framework.nest" since tf2.0 does not have module"contrib".
My question is which tensorflow version should I use if I try to use keras.
Thanks
I have just run the mnist_dpsgd_tutorial_keras.py file and I got this error.
kindly reply asap.
thanks in advance
InvalidArgumentError Traceback (most recent call last)
in ()
----> 1 tf.app.run()
c:\users\tom-16s1\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\platform\app.py in run(main, argv)
123 # Call the main function, passing through any arguments
124 # to the final program.
--> 125 _sys.exit(main(argv))
126
in main(unused_argv)
49 epochs=FLAGS.epochs,
50 validation_data=(test_data, test_labels),
---> 51 batch_size=FLAGS.batch_size)
52
53 # Compute the privacy budget expended.
c:\users\tom-16s1\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, max_queue_size, workers, use_multiprocessing, **kwargs)
878 initial_epoch=initial_epoch,
879 steps_per_epoch=steps_per_epoch,
--> 880 validation_steps=validation_steps)
881
882 def evaluate(self,
c:\users\tom-16s1\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py in model_iteration(model, inputs, targets, sample_weights, batch_size, epochs, verbose, callbacks, val_inputs, val_targets, val_sample_weights, shuffle, initial_epoch, steps_per_epoch, validation_steps, mode, validation_in_fit, **kwargs)
327
328 # Get outputs.
--> 329 batch_outs = f(ins_batch)
330 if not isinstance(batch_outs, list):
331 batch_outs = [batch_outs]
c:\users\tom-16s1\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\keras\backend.py in call(self, inputs)
3074
3075 fetched = self._callable_fn(*array_vals,
-> 3076 run_metadata=self.run_metadata)
3077 self._call_fetch_callbacks(fetched[-len(self._fetches):])
3078 return nest.pack_sequence_as(self._outputs_structure,
c:\users\tom-16s1\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\client\session.py in call(self, *args, **kwargs)
1437 ret = tf_session.TF_SessionRunCallable(
1438 self._session._session, self._handle, args, status,
-> 1439 run_metadata_ptr)
1440 if run_metadata:
1441 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
c:\users\tom-16s1\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\framework\errors_impl.py in exit(self, type_arg, value_arg, traceback_arg)
526 None, None,
527 compat.as_text(c_api.TF_Message(self.status.status)),
--> 528 c_api.TF_GetCode(self.status.status))
529 # Delete the underlying status object from memory otherwise it stays alive
530 # as there is a reference to status from this from the traceback due to
InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [250,10] and labels shape [2500]
[[{{node loss/dense_1_loss/CategoricalCrossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]
in the pate_2017, when i try to train teachers ,a error occured : in the deep_cnn.py , there is a module "from differential_privacy.multiple_teachers import utils", i thought it should be changed into "from pate_2017 import utils" since folder have changed , if i am wrong ,please let me know ,thx.
When I run the mnist_dpsgd_tutorial_keras.py, get a error "ValueError: Dimension size must be evenly divisible by 250 but is 1 for 'training/TFOptimizer/Reshape' (op: 'Reshape') with input shapes: [], [2] and with input tensors computed as partial shapes: input[1] = [250,?]."
Hi all,
I'm training tensorflow privacy on APS dataset.
The code ran without error for non private mode.
But when I set flag for dpsgd == True. It shows error. I think it must be something with the vector_loss which is the only different between the 2.
The errors are listed below:
Traceback (most recent call last):
File "aps_log_reg.py", line 210, in
app.run(main)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "aps_log_reg.py", line 198, in main
model.train(input_fn, steps=step_per_epoch)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1154, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1112, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "aps_log_reg.py", line 145, in model_fn
global_step=global_step)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow/python/training/optimizer.py", line 403, in minimize
grad_loss=grad_loss)
File "/Users/rachelton/DP/privacy/privacy/optimizers/dp_optimizer.py", line 170, in compute_gradients
_, sample_state = tf.while_loop(cond_fn, body_fn, [idx, sample_state])
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3556, in while_loop
return_same_structure)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3087, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3022, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/Users/rachelton/DP/privacy/privacy/optimizers/dp_optimizer.py", line 168, in
body_fn = lambda i, state: [tf.add(i, 1), process_microbatch(i, state)] # pylint: disable=line-too-long
File "/Users/rachelton/DP/privacy/privacy/optimizers/dp_optimizer.py", line 149, in process_microbatch
sample_params, sample_state, grads_list)
File "/Users/rachelton/DP/privacy/privacy/dp_query/dp_query.py", line 159, in accumulate_record
preprocessed_record = self.preprocess_record(params, record)
File "/Users/rachelton/DP/privacy/privacy/analysis/privacy_ledger.py", line 250, in preprocess_record
return self._query.preprocess_record(params, record)
File "/Users/rachelton/DP/privacy/privacy/dp_query/normalized_query.py", line 74, in preprocess_record
return self._numerator.preprocess_record(params, record)
File "/Users/rachelton/DP/privacy/privacy/dp_query/gaussian_query.py", line 100, in preprocess_record
preprocessed_record, _ = self.preprocess_record_impl(params, record)
File "/Users/rachelton/DP/privacy/privacy/dp_query/gaussian_query.py", line 96, in preprocess_record_impl
clipped_as_list, norm = tf.clip_by_global_norm(record_as_list, l2_norm_clip)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/clip_ops.py", line 278, in clip_by_global_norm
constant_op.constant(1.0, dtype=use_norm.dtype) / clip_norm)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/math_ops.py", line 812, in binary_op_wrapper
return func(x, y, name=name)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/math_ops.py", line 912, in _truediv_python3
(x_dtype, y_dtype))
TypeError: x and y must have the same dtype, got tf.float64 != tf.float32
my code:
`
from future import absolute_import
from future import division
from future import print_function
from absl import app
from absl import flags
from distutils.version import LooseVersion
from privacy.analysis import privacy_ledger
from privacy.analysis.rdp_accountant import compute_rdp_from_ledger
from privacy.analysis.rdp_accountant import get_privacy_spent
from privacy.optimizers import dp_optimizer
import matplotlib.pyplot as plt
import tensorflow as tf
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.preprocessing import MaxAbsScaler
if LooseVersion(tf.version) < LooseVersion('2.0.0'):
GradientDescentOptimizer = tf.train.GradientDescentOptimizer
else:
GradientDescentOptimizer = tf.optimizers.SGD # pylint: disable=invalid-name
FLAGS = flags.FLAGS
flags.DEFINE_boolean(
'dpsgd', True, 'If True, train with DP-SGD. If False, '
'train with vanilla SGD.')
flags.DEFINE_float('learning_rate', 0.1, 'Learning rate for training')
flags.DEFINE_float('noise_multiplier', 1.1,
'Ratio of the standard deviation to the clipping norm')
flags.DEFINE_float('l2_norm_clip', 1.0, 'Clipping norm')
flags.DEFINE_integer('batch_size', 128, 'Batch size')
flags.DEFINE_integer('epochs', 1, 'Number of epochs')
flags.DEFINE_integer('num_classes', 2, 'Number of classes')
flags.DEFINE_integer('microbatches', 128, 'Number of microbatches ''(must evenly divide batch_size)')
flags.DEFINE_string('model_dir', None, 'Model directory')
class EpsilonPrintingTrainingHook(tf.estimator.SessionRunHook):
"""Training hook to print current value of epsilon after an epoch."""
def __init__(self, ledger):
"""Initalizes the EpsilonPrintingTrainingHook.
Args:
ledger: The privacy ledger.
"""
self._samples, self._queries = ledger.get_unformatted_ledger()
def end(self, session):
orders = [1 + x / 10.0 for x in range(1, 100)] + list(range(12, 64))
samples = session.run(self._samples)
queries = session.run(self._queries)
formatted_ledger = privacy_ledger.format_ledger(samples, queries)
rdp = compute_rdp_from_ledger(formatted_ledger, orders)
eps = get_privacy_spent(orders, rdp, target_delta=1e-5)[0]
print('For delta=1e-5, the current epsilon is: %.2f' % eps)
def get_data():
df_train = pd.read_csv('data_original/aps_failure_training_set.csv')
df_test = pd.read_csv('data_original/aps_failure_test_set.csv')
#
df_train.replace('na','-1', inplace=True)
df_test.replace('na','-1', inplace=True)
# categorical for label: 0: neg, 1: pos
df_train['class'] = pd.Categorical(df_train['class']).codes
df_test['class'] = pd.Categorical(df_test['class']).codes
# split data into x and y
Y_train = df_train['class'].copy(deep=True)
X_train = df_train.copy(deep=True)
X_train.drop(['class'], inplace=True, axis=1)
Y_test = df_test['class'].copy(deep=True)
X_test = df_test.copy(deep=True)
X_test.drop(['class'], inplace=True, axis=1)
# strings to float
X_train = X_train.astype('float64')
X_test = X_test.astype('float64')
# scale the dataset
scaler = MaxAbsScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
return X_train, Y_train, X_test, Y_test
def linear_layer(x_dict):
x = x_dict['images']
out_layer = tf.keras.layers.Dense(FLAGS.num_classes).apply(x)
return out_layer
def model_fn(features, labels, mode):
logits = linear_layer(features)
# vector loss: each component of the vector correspond to an individual training point and label.
# Use for per example gradient later.
vector_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
labels=tf.cast(labels, dtype=tf.int64))
scalar_loss = tf.reduce_mean(vector_loss)
print('*******************')
print(vector_loss.dtype)
print(scalar_loss.dtype)
if mode == tf.estimator.ModeKeys.TRAIN:
if FLAGS.dpsgd:
ledger = privacy_ledger.PrivacyLedger(
population_size=60000,
selection_probability=(FLAGS.batch_size / 60000))
optimizer = dp_optimizer.DPGradientDescentGaussianOptimizer(
l2_norm_clip=FLAGS.l2_norm_clip,
noise_multiplier=FLAGS.noise_multiplier,
num_microbatches=FLAGS.microbatches,
ledger=ledger,
learning_rate=FLAGS.learning_rate)
training_hooks = [
EpsilonPrintingTrainingHook(ledger)
]
opt_loss = vector_loss
else:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=FLAGS.learning_rate)
opt_loss = scalar_loss
training_hooks = []
global_step = tf.train.get_global_step()
train_op = optimizer.minimize(loss=opt_loss,
global_step=global_step)
return tf.estimator.EstimatorSpec(mode=mode,
loss=scalar_loss,
train_op=train_op,
training_hooks=training_hooks)
elif mode == tf.estimator.ModeKeys.EVAL:
pred_classes = tf.argmax(logits, axis=1)
acc_op = tf.metrics.accuracy(labels=labels, predictions=pred_classes)
return tf.estimator.EstimatorSpec(mode=mode,
loss=scalar_loss,
eval_metric_ops={'accuracy':acc_op})
#if mode == tf.estimator.ModeKeys.PREDICT:
def main(unused_argv):
tf.logging.set_verbosity(tf.logging.INFO)
if FLAGS.dpsgd and FLAGS.batch_size % FLAGS.microbatches != 0:
raise ValueError('Number of microbatches should divide evenly batch_size')
# get data: train_data, train_label, test_data, test_label
x_train, y_train, x_test, y_test = get_data()
# print(x_train.shape, y_train.shape, x_test.shape, y_test.shape)
# Init estimator
# model_fn, model_dir
model = tf.estimator.Estimator(model_fn)
# define train input
input_fn = tf.estimator.inputs.numpy_input_fn(
x={'images':x_train},
y=y_train,
batch_size=FLAGS.batch_size,
num_epochs=None,
shuffle=True)
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
x={'images': x_test},
y=y_test,
batch_size=FLAGS.batch_size,
shuffle=False)
step_per_epoch = 60000 // FLAGS.batch_size
# train model on train input
for epoch in range(FLAGS.epochs):
model.train(input_fn, steps=step_per_epoch)
if name == "main":
app.run(main)
`
It'd be useful to have the first few lines of the tutorial's output and include some target accuracy numbers.
It makes no difference to the diagnostic message (below) whether the training is done on CPU vs GPU.
INFO:tensorflow:loss = 3.3560193, step = 0
ERROR:tensorflow:Model diverged with loss = NaN.
...
tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.
rdp_accountant.py contains an assert statement assert isinstance(alpha, (int, long))
. However, Python3 doesn't have a long
type anymore.
Error message:
Traceback (most recent call last):
File "mnist_dpsgd_tutorial.py", line 184, in <module>
tf.app.run()
File "/home/leon/anaconda3/envs/privacy2/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "mnist_dpsgd_tutorial.py", line 178, in main
eps = compute_epsilon(epoch * steps_per_epoch)
File "mnist_dpsgd_tutorial.py", line 161, in compute_epsilon
orders=orders)
File "/home/leon/Ferrum/machine_learning_prototypes/privacy_prototype/privacy/privacy/analysis/rdp_accountant.py", line 261, in compute_rdp
for order in orders])
File "/home/leon/Ferrum/machine_learning_prototypes/privacy_prototype/privacy/privacy/analysis/rdp_accountant.py", line 261, in <listcomp>
for order in orders])
File "/home/leon/Ferrum/machine_learning_prototypes/privacy_prototype/privacy/privacy/analysis/rdp_accountant.py", line 240, in _compute_rdp
return _compute_log_a(q, sigma, alpha) / (alpha - 1)
File "/home/leon/Ferrum/machine_learning_prototypes/privacy_prototype/privacy/privacy/analysis/rdp_accountant.py", line 145, in _compute_log_a
return _compute_log_a_int(q, sigma, int(alpha))
File "/home/leon/Ferrum/machine_learning_prototypes/privacy_prototype/privacy/privacy/analysis/rdp_accountant.py", line 89, in _compute_log_a_int
assert isinstance(alpha, (int, long))
NameError: name 'long' is not defined
Edit: Looks like there's a PR for this: #1
I have run mnist_dpsgd_tutorial_keras.py without changing any parameters, except the epoch is set to 1, but the result is like "3000/3000 [==============================] - 153s 51ms/sample - loss: 2.1090 - acc: 0.0000e+00". And its test accuracy is 10%. If I only train it on digit 0 and digit 1, the training accuracy is around 80%, but if I train on other digits, the training accuracy is 0.
I'm working through the tutorial in walkthrough.md, and there's currently no explanation of how 'orders' are generated, in the section where the RDP is computed. I'd appreciated it if you could add a definition for 'orders' and how to calculate them for new datasets. Thanks!
We are waiting for Keras' compile
method to accept a non-scalar loss so the corresponding tutorial can run without making changes to the tensorflow library.
See b/124011218
There is a new issue after applying the fix in #4 . After finishing the first epoch:
Test accuracy after 1 epochs is: 0.715
Traceback (most recent call last):
File "mnist_dpsgd_tutorial.py", line 184, in <module>
tf.app.run()
File "/home/leon/anaconda3/envs/privacy2/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "mnist_dpsgd_tutorial.py", line 178, in main
eps = compute_epsilon(epoch * steps_per_epoch)
File "mnist_dpsgd_tutorial.py", line 156, in compute_epsilon
orders = [1 + x / 10. for x in range(1, 100)] + range(12, 64)
TypeError: can only concatenate list (not "range") to list
Edit: There's a PR for this: #3
As far as I know, pip
install requires a setup.py
file.
Running the command given in the "Installing TensorFlow Privacy" (pip install -e ./privacy
)
throws the following error:
Directory './privacy' is not installable. File 'setup.py' not found.
Am I missing something here?
Does TF Privacy support tree-based classification models for example: tf.estimator.BoostedTreesClassifier?
Since, as per my understanding the existing DP optimizers are all Gradient Descent based.
I have a question which might be a bit naive. Is the epsilon calculated using the moments accountant the privacy guarantee of the full model or only a single entry of the weight matrix? I have read the Abadi et al but I couldn't find where the privacy costs of each of the noisily updated weights is composed (not across time steps but across the model). Since computing a label for a new datapoint would require the release of the full model, will the privacy guarantee be a product of the calculated epsilon with the number of parameters of the model?
Hi all,
I was wondering whether there was a reference for the Rényi DP accountant, more specifically, the code applied when sub-sampling is involved? I saw this issue: #13
and was wondering whether the paper had been released yet. If not, would it be possible to share a draft?
Best,
Replace softmax_cross_entropy_with_logits_v2 in cnn_model_fn (mnist_vizier.py) with sparse_softmax_cross_entropy_with_logits.
It does not change the numerical value of the loss function, but it simplifies the code (no need to transform labels into one-hot vectors) and it is conceptually a better fit for the task at hand (since the classes are disjoint).
'Query'
which sounds like an operation in database to accumulate gradients?GaussianSumQuery
class, there is variable called 'global_state'
, but during debug stage, it is always None?Some algorithms like DP Federated Averaging need to support weighted DP means. The "weight" could be an argument to preprocess_record, but not all DPQueries need to support weighted records. An abstract subclass of DPQuery called DPAverageQuery could add that argument. As another advantage, algorithms that require an average could check the type of DPQuery to ensure that it is a subclass of DPAverageQuery.
Sorry, it's probably a stupid question. If I understand correctly reducing number of microbatches from 256 to 32 will result in total norm of the sum to be less than 32S, whereas for 256 batches it should be less than 256S. Applying same amount of noise zS on a vector of a smaller norm results in a poorer performance. Can't we treat the gradient of each microbatch as a sum of gradients for k individual images and therefore clip it by kS, instead of mean and clip it by S?
@npapernot Hi, I try to run pate_2017 code.
I have succesfully trained the teacher model, but when I train the student model using command "python train_student.py --nb_teachers=100 --dataset=mnist --stdnt_share=5000"
However, there is an error like this:
Traceback (most recent call last):
File "train_student.py", line 208, in
tf.app.run()
File "C:\Users\eleva\AppData\Local\conda\conda\envs\PATE\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "train_student.py", line 205, in main
assert train_student(FLAGS.dataset, FLAGS.nb_teachers)
File "train_student.py", line 177, in train_student
stdnt_dataset = prepare_student_data(dataset, nb_teachers, save=True)
File "train_student.py", line 111, in prepare_student_data
test_data, test_labels = input.ld_mnist(test_only=True)
File "C:\Users\eleva\privacy\research\pate_2017\input.py", line 386, in ld_mnist
train_data = extract_mnist_data(local_urls[0], 60000, 28, 1)
File "C:\Users\eleva\privacy\research\pate_2017\input.py", line 274, in extract_mnist_data
return np.load(file_obj)
File "C:\Users\eleva\AppData\Local\conda\conda\envs\PATE\lib\site-packages\numpy\lib\npyio.py", line 416, in load
magic = fid.read(N)
File "C:\Users\eleva\AppData\Local\conda\conda\envs\PATE\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 132, in read
pywrap_tensorflow.ReadFromStream(self._read_buf, length, status))
File "C:\Users\eleva\AppData\Local\conda\conda\envs\PATE\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 100, in _prepare_value
return compat.as_str_any(val)
File "C:\Users\eleva\AppData\Local\conda\conda\envs\PATE\lib\site-packages\tensorflow\python\util\compat.py", line 107, in as_str_any
return as_str(value)
File "C:\Users\eleva\AppData\Local\conda\conda\envs\PATE\lib\site-packages\tensorflow\python\util\compat.py", line 80, in as_text
return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 0: invalid start byte
I'm new to DP, it's hard for me to understand some functions especially '_compute_rdp' , '_compute_log_a'
is there some papers i can reference? Thanks!:)
When I run 'train_teachers.py' under Python3 for the first time,
python train_teachers.py --nb_teachers=100 --teacher_id=0 --dataset=mnist --max_steps=1000
datas and labels are downloaded and loaded successfully.
But when run for a second time,
python train_teachers.py --nb_teachers=100 --teacher_id=1 --dataset=mnist --max_steps=1000
the error occur:
Traceback (most recent call last):
File "train_teachers.py", line 102, in <module>
tf.app.run()
File "D:\Users\10066\Anaconda3\envs\nn\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "train_teachers.py", line 99, in main
assert train_teacher(FLAGS.dataset, FLAGS.nb_teachers, FLAGS.teacher_id)
File "train_teachers.py", line 60, in train_teacher
train_data, train_labels, test_data, test_labels = input.ld_mnist()
File "E:\privacy-master\research\pate_2017\input.py", line 421, in ld_mnist
train_data = extract_mnist_data(local_urls[0], 60000, 28, 1)
File "E:\privacy-master\research\pate_2017\input.py", line 285, in extract_mnist_data
return np.load(file_obj)
File "D:\Users\10066\Anaconda3\envs\nn\lib\site-packages\numpy\lib\npyio.py", line 416, in load
magic = fid.read(N)
File "D:\Users\10066\Anaconda3\envs\nn\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 132, in read
pywrap_tensorflow.ReadFromStream(self._read_buf, length, status))
File "D:\Users\10066\Anaconda3\envs\nn\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 100, in _prepare_value
return compat.as_str_any(val)
File "D:\Users\10066\Anaconda3\envs\nn\lib\site-packages\tensorflow\python\util\compat.py", line 107, in as_str_any
return as_str(value)
File "D:\Users\10066\Anaconda3\envs\nn\lib\site-packages\tensorflow\python\util\compat.py", line 80, in as_text
return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 0: invalid start byte
I try to fix this error by making some change in input.py
. Original code is:
def extract_mnist_data(filename, num_images, image_size, pixel_depth):
"""
Extract the images into a 4D tensor [image index, y, x, channels].
Values are rescaled from [0, 255] down to [-0.5, 0.5].
"""
# if not os.path.exists(file):
if not tf.gfile.Exists(filename+".npy"):
with gzip.open(filename) as bytestream:
bytestream.read(16)
buf = bytestream.read(image_size * image_size * num_images)
data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32)
data = (data - (pixel_depth / 2.0)) / pixel_depth
data = data.reshape(num_images, image_size, image_size, 1)
np.save(filename, data)
return data
else:
with tf.gfile.Open(filename+".npy", mode='r') as file_obj:
return np.load(file_obj)
def extract_mnist_labels(filename, num_images):
"""
Extract the labels into a vector of int64 label IDs.
"""
# if not os.path.exists(file):
if not tf.gfile.Exists(filename+".npy"):
with gzip.open(filename) as bytestream:
bytestream.read(8)
buf = bytestream.read(1 * num_images)
labels = np.frombuffer(buf, dtype=np.uint8).astype(np.int32)
np.save(filename, labels)
return labels
else:
with tf.gfile.Open(filename+".npy", mode='r') as file_obj:
return np.load(file_obj)
I delete judgements of existance, as follows:
def extract_mnist_data(filename, num_images, image_size, pixel_depth):
"""
Extract the images into a 4D tensor [image index, y, x, channels].
Values are rescaled from [0, 255] down to [-0.5, 0.5].
"""
# if not os.path.exists(file):
with gzip.open(filename) as bytestream:
bytestream.read(16)
buf = bytestream.read(image_size * image_size * num_images)
data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32)
data = (data - (pixel_depth / 2.0)) / pixel_depth
data = data.reshape(num_images, image_size, image_size, 1)
np.save(filename, data)
return data
def extract_mnist_labels(filename, num_images):
"""
Extract the labels into a vector of int64 label IDs.
"""
# if not os.path.exists(file):
with gzip.open(filename) as bytestream:
bytestream.read(8)
buf = bytestream.read(1 * num_images)
labels = np.frombuffer(buf, dtype=np.uint8).astype(np.int32)
np.save(filename, labels)
return labels
It could work, but I'm wondering what cause the error, and also not sure if there is any risk to make such changes.
when i was running the code , there is a error as follows " module 'tensorflow.python.keras.losses' has no attribute 'CategoricalCrossentropy'" , i am wondering if it is because i have an old version of tensorflow , if it is ,please tell me which version you are using ,thx.
I implemented the other neural network model and take loss function by dp_optimizer.DPGradientDescentGaussianOptimizer.
In that time, I successed when the num_microbatch is 1. But when the num_microbatch is over 1, I got an error.
Traceback (most recent call last):
File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py", line 686, in _call_cpp_shape_fn_impl
input_tensors_as_shapes, status)
File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension size must be evenly divisible by 2 but is 1 for 'Reshape' (op: 'Reshape') with input shapes: [], [2] and with input tensors computed as partial shapes: input[1] = [2,?].
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "autoencoder_dp.py", line 70, in <module>
population_size=60000).minimize(cost)
File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 399, in minimize
grad_loss=grad_loss)
File "/home/madono/madono/test2/dpgan/privacy/optimizers/dp_optimizer.py", line 68, in compute_gradients
microbatches_losses = tf.reshape(loss, [self._num_microbatches, -1])
File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5782, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3292, in create_op
compute_device=compute_device)
File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3332, in _create_op_helper
set_shapes_for_outputs(op)
File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2496, in set_shapes_for_outputs
return _set_shapes_for_outputs(op)
File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2469, in _set_shapes_for_outputs
shapes = shape_func(op)
File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2399, in call_with_requiring
return call_cpp_shape_fn(op, require_shape_fn=True)
File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py", line 627, in call_cpp_shape_fn
require_shape_fn)
File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py", line 691, in _call_cpp_shape_fn_impl
raise ValueError(err.message)
ValueError: Dimension size must be evenly divisible by 2 but is 1 for 'Reshape' (op: 'Reshape') with input shapes: [], [2] and with input tensors computed as partial shapes: input[1] = [2,?].
I implement like
optimizer = dp_optimizer.DPGradientDescentGaussianOptimizer(
l2_norm_clip=1.0,
noise_multiplier=1.1,
num_microbatches=2,
learning_rate=0.0002,
population_size=60000).minimize(cost)
#optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(cost)
# Initializing the variables
init = tf.initialize_all_variables()
# Explore trainable variables (weight_bias)
var = [v for v in tf.trainable_variables() if 'mimiciii/fc/autoencoder' in v.name] # (784, 128), (128,), (128, 784), (784,)
var_grad = tf.gradients(cost, var) # gradient of cost w.r.t. trainable variables, len(var_grad): 8, type(var_grad): list
norm_gradient_variables = []
# Launch the graph
with tf.Session() as sess:
writer = tf.summary.FileWriter("./graph/my_graph", sess.graph)
sess.run(init)
total_batch = int(mnist.train.num_examples/batch_size)
# Training cycle
for epoch in range(training_epochs):
# Loop over all batches
for i in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
# Run optimization op (backprop) and cost op (to get loss value)
_, c = sess.run([optimizer, cost], feed_dict={X: batch_xs})
var_grad_val = sess.run(var_grad, feed_dict={X: batch_xs})
# var_grad_val = [var_grad_val[0], var_grad_val[2]] # no bias, change for different network
if type(var_grad_val) != type([0]): # if a is not a list, which indicate it contains only one weight matrix
var_grad_val = [var_grad_val]
norm_gradient_variables.append(norm_w(var_grad_val)) # compute the norm of all trainable variables
# Display logs per epoch step
if epoch % display_step == 0:
print("Epoch:", '%04d' % (epoch+1),
"cost=", "{:.9f}".format(c))
For a non-sequential model if number of examples is not a multiple of number of microbatches
(see the data file below) the training with DPAdamOptimizer
fails.
Numpy 1.14.4
Tensorflow 1.13.1
Error:
6/13 [============>.................] - ETA: 0s - loss: 6.5262Traceback (most recent call last):
File "/home/andrey/Documents/experiments/dp.py", line 114, in <module>
epochs=epochs)
File "/home/andrey/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 880, in fit
validation_steps=validation_steps)
File "/home/andrey/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 329, in model_iteration
batch_outs = f(ins_batch)
File "/home/andrey/.local/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3076, in __call__
run_metadata=self.run_metadata)
File "/home/andrey/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
run_metadata_ptr)
File "/home/andrey/.local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 1 values, but the requested shape requires a multiple of 3
[[{{node training/TFOptimizer/Reshape}}]]
[[{{node loss/mul}}]]
Code:
import tensorflow as tf
from tensorflow.keras import losses
from privacy.optimizers.dp_optimizer import DPAdamOptimizer
from privacy.optimizers.gaussian_query import GaussianAverageQuery
batch_size = 6
microbatches = 3
noise_multiplier = 1.1
l2_norm_clip = 1.0
epochs = 30
names = pd.read_csv('names.csv', delimiter=',')
names = names['ch1'].values
names.shape = names.shape + (1,)
my_input = tf.keras.layers.Input(shape=(1,))
my_dense = tf.keras.layers.Dense(7)(my_input)
model = tf.keras.Model(my_input, my_dense)
dp_average_query = GaussianAverageQuery(l2_norm_clip, l2_norm_clip * noise_multiplier, microbatches)
optimizer = DPAdamOptimizer(dp_average_query,microbatches,learning_rate=learning_rate, unroll_microbatches=True)
loss = losses.CategoricalCrossentropy(from_logits=False, reduction=tf.losses.Reduction.NONE)
model.fit(x=names,y=names,batch_size=batch_size,epochs=epochs)
Dataset - names.csv:
id,ch1
1,1
2,3
3,4
4,1
5,2
6,3
7,3
8,1
9,1
10,2
11,3
12,4
13,4
If there is only 12 examples in the dataset, the model trains for batch_size = 6
and microbatches = 3
The TF version in dp_optimizer.py is required to be 2.0, but for other .py files, 1.X is required (tf.contrib is needed for them).
2 examples in the tutorials ('mnist_dpsgd_tutorial_eager.py' and 'mnist_dpsgd_tutorial_keras.py') contain the import statement 'from privacy.optimizers.gaussian_query import GaussianAverageQuery', resulting in an error "ImportError: No module named 'privacy.optimizers.gaussian_query'".
What is the range of L2 clipping and noise multiplier?
I read from the tutorial, it was mentioned to use L2 = [0.3, 1.0] and noise = [0.3, ]
Are these parameters data independent?
Is there any relationship between them?
If I use L2/noise > 10, does it make any sense?
In lm_dpsgd_tutorial.py:
File "privacy/tutorials/lm_dpsgd_tutorial.py", line 45, in <module>
from privacy.optimizers import dp_optimizer
File "privacy/optimizers/dp_optimizer.py", line 22, in <module>
from privacy.analysis import privacy_ledger
File "privacy/analysis/privacy_ledger.py", line 29, in <module>
nest = tf.contrib.framework.nest
AttributeError: module 'tensorflow' has no attribute 'contrib'
The contrib module seems to have been deleted in tensorflow 2.0.
Is it still possible to call framework.nest from another module or should I downgrade tensorflow ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.