yuxiangw / autodp Goto Github PK

View Code? Open in Web Editor NEW

255.0 10.0 52.0 2.78 MB

autodp: A flexible and easy-to-use package for differential privacy

License: Apache License 2.0

Python 24.71% Jupyter Notebook 75.14% Starlark 0.15%

autodp's Introduction

autodp: Automating differential privacy computation

Highlights

Advanced DP techniques (e.g., Renyi DP, Moments Accountant, f-DP) working behind the scene.
Easily customizable. Bring your own mechanism in any way you want.
Strong composition over heterogeneous mechanisms.

All new autodp "Mechanism" API

New features that come with the new API

Object oriented design: see check out autodp_core.py
Zoos are open with many private animals: mechanism_zoo, transformer_zoo, calibrator_zoo.
Added support for f-DP and privacy profile alongside RDP.
Stronger RDP to (eps,delta)-DP conversion.
Privacy amplification by X.
Exactly tight privacy accounting for Gaussian mechanisms and their compositions.
Interpretable privacy guarantee via Hypothesis testing interpretation for any Mechanism.

The new API makes it extremely easy to obtain state-of-the-art privacy guarantees for your favorite randomized mechanisms, with just a few lines of codes.

How to use?

It's easy. Just run:

pip install autodp

pip3 install autodp

Check out the Jupyter notebooks in the tutorials folder to get started.

Notes:

pip should automatically install all the dependences for you.
Currently we support only Python3.
You might need to run pip install autodp --upgrade

To use the current version at the master branch

Install it locally by:

pip install -e .

Research Papers:

Yu-Xiang Wang, Borja Balle, and Shiva Kasiviswanathan. (2019) "Subsampled Renyi Differential Privacy and Analytical Moments Accountant.". in AISTATS-2019 (Notable Paper Award).
Yuqing Zhu, Yu-Xiang Wang. (2019) "Poisson Subsampled Renyi Differential Privacy". ICML-2019.
Yuqing Zhu, Yu-Xiang Wang. (2020) "Improving Sparse Vector Technique with Renyi Differential Privacy". in NeurIPS-2020.

How to Contribute?

Follow the standard practice. Fork the repo, create a branch, develop the edit and send a pull request. One of the maintainers are going to review the code and merge the PR. Alternatively, please feel free to creat issues to report bugs, provide comments and suggest new features.

At the moment, contributions to examples, tutorials, as well as the RDP of currently unsupported mechanisms are most welcome (add them to RDP_bank.py)! Also, you may add new mechanisms to mechanism_zoo.py. Contributions to transformer_zoo.py and calibrator_zoo.py are trickier, please email us!

Please explain clearly what the contribution is about in the PR and attach/cite papers whenever appropriate.

Legacy: the moments accountant API from autodp v.0.11 is still supported:

An RDP (Renyi Differential Privacy) based analytical Moment Accountant implementation that is numerically stable.
Supports privacy amplification for generic RDP algorithm for subsampling without replacement and poisson sampling.
Stronger composition than the optimal composition using only (ε,δ)-DP.
A privacy calibrator that numerically calibrates noise to privacy requirements using RDP.
Bring Your Own Mechanism: Just implement the RDP of your own DP algorithm as a function.

Examples：

Figure 1: Composing subsampled Gaussian Mechanisms. Left: High noise setting with σ=5, γ=0.001, δ=1e-8. Right: Low noise setting with σ=0.5, γ=0.001, δ=1e-8.

Figure 2: Composing subsampled Laplace Mechanisms. Left: High noise setting with b=2, γ=0.001, δ=1e-8. Right: Low noise setting with b=0.5, γ=0.001, δ=1e-8.

autodp's People

Contributors

Stargazers

Watchers

autodp's Issues

`bounds` cannot be used together with `method=Brent` in latest version of scipy (>= v1.10.1)

SciPy (>= v1.10.1) will complain about this line

autodp/autodp/converter.py

Line 95 in 5fad5e1

 results = minimize_scalar(fun, method='Brent', bracket=(1, 2), bounds=[1, 100000]) 

because it now does not support using Brent when a bound is given (scipy source)

if bounds is not None and meth in {'brent', 'golden'}:
    message = f"Use of `bounds` is incompatible with 'method={method}'."
    raise ValueError(message)

Can switch to method='Bounded' to bypass this issue.

An issue when I installed "autodp": Preparing metadata (setup.py) ... error

The following issue occurred when I installed “autodp” by "pip install autodp" and I'm not sure how to solve it.

Collecting autodp
Using cached autodp-0.2.3.1.tar.gz (56 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\Administrator\AppData\Local\Temp\pip-install-c9_gcmpt\autodp_184d6ab919d64a7f98792f3b252bbe16\setup.py", line 9, in
long_description = f.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x9a in position 3594: illegal multibyte sequence
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Amplification with sampling without replacments is throwing following error.

Hi everyone,

When doing gaussian mechanism amplification by sampling without replacements it is throwing AssertionError: mechanism's add-remove notion of DP is incompatible with Privacy Amplification by subsampling without replacements. Here is the code snippet to reproduce the error. Is there anything that I am doing wrong ?

subsample = transformer_zoo.AmplificationBySampling(PoissonSampling=False)
mech = mechanism_zoo.GaussianMechanism(sigma=0.1)
prob = 0.1

SubsampledGaussian_mech = subsample(mech,prob,improved_bound_flag=True)

Looseness in analytic Gaussian mechanism?

Here's a minimal example to demonstrate the issue:

from autodp import privacy_calibrator, dp_bank
import numpy as np

sigma = privacy_calibrator.ana_gaussian_mech(1.0, 1e-6)['sigma']
delta = np.exp(dp_bank.get_logdelta_ana_gaussian(1.0, sigma))

1.901276833828726e-05

I expect the delta = 1e-6, but it is nearly 20X larger according to DP bank.

difference between eps using this method and abadi

Using the implementation of Abadi et al computes smaller eps compare to this method. I would appreciate your opinion about it. Is their method tighter ?

https://github.com/tensorflow/privacy/tree/master/tutorials

About the bisection method used for converting RDP to approximate DP

Thanks for the great work!
Not sure if I should submit the issue here, but let me just do it anyway.

In the paper, it is suggested that one should use bisection to solve Equation 2 efficiently, inferring that the sum of a monotonically increasing function and a monotonically decreasing function is quasi-convex/unimodal (Corollary 38).
This however does not seem to be correct as the sum of these functions is not always quasi-convex/unimodal. See this example.

Therefore, it seems to me that one could not use bisection to convert RDP to approximate DP to arbitrary precision since the optimization is not quasi-convex/unimodal?

Can PATE be used in knowledge distillation to calculate privacy budgets?

Can PATE be used in knowledge distillation to calculate privacy budgets?
if temperature is too high, can we use the pate?

Composing different mechanism with different sensitivies

@yuxiangw Great talk at MIT!. I have a question regarding composing difference mechanisms with different rounds and with different sensitivities. Assume that am we are composing in following way

Gaussian mechanism with sensitivity L1 and with noise level sigma1 for T1 rounds
Subsampled gaussian mechanism with sensitivity L2 and noise level Sigma2 with sampling rate Gamma for T2 rounds

Then to get epsilon for delta = 1e-6, this is right way to pass parameter configs right

class TestMech(Mechanism):
    def __init__(self, params, name="TestMech"):
        Mechanism.__init__(self)
        subsample = AmplificationBySampling(PoissonSampling=False)
        mech1 = GaussianMechanism(sigma=params["sigma1"] )
        mech2 = GaussianMechanism(sigma=params["sigma2"] )
        mech2.neighboring = "replace_one"
        submech2 = subsample(mech2, params["prob"], improved_bound_flag=True)
        compose = Composition()
        mech = compose([mech1, submech2], [params["T1"] , params["T2"]])
        rdp_total = mech.RenyiDP
        self.propagate_updates(rdp_total, type_of_update="RDP")

params = {}
params["sigma1"] = sigma1/(L1)  # This is correct right ?
params["sigma2"] = sigma2/(L2)  # This is correct right ?
params["T1"] = T1
params["T2"] = T2
mech = TestMech(params)
mech.get_approxDP(delta=1e-6)

My main question is about scaling of sigma parameters params["sigma1"] = sigma1/(L1) and params["sigma2"] = sigma2/(L2), as far as I can understand this seems necessary right? Thanks!

Can't install in GBK locale

pip install autodp will fail like below. The setup script should specify an encoding in open(...).

C:\Users\xxx>pip install autodp
Looking in indexes: https://mirror.baidu.com/pypi/simple
Collecting autodp
  Using cached https://mirror.baidu.com/pypi/packages/78/7c/63aa6d37b9d9f0f68d1231e1b3247c3ac83c634f451f8bcbd9a5c7a55db0/autodp-0.2.tar.gz (39 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\xxx\AppData\Local\Temp\pip-install-l8spogwd\autodp_e20c1a3119ab4b0c8d685149702e4657\setup.py", line 6, in <module>
          long_description = f.read()
      UnicodeDecodeError: 'gbk' codec can't decode byte 0x9a in position 3594: illegal multibyte sequence
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
WARNING: You are using pip version 22.0.4; however, version 22.1.2 is available.
You should consider upgrading via the 'd:\pysyft\Scripts\python.exe -m pip install --upgrade pip' command.

the noise is much greater than the probability. How should I use it correctly?

I want to add noise to the probability distribution of words, but it seems that the pate framework provided is for counting tasks. If I use it directly, the noise will be much greater than the probability. How should I use it correctly? If I try to reduce Sigma to 0.5, the calculated eps will be very large.

huge fan of this work

Not listing a problem - just saying that I think this library is extremely cool and I'm very glad you've taken the time to make it.

Update Installation Instructions

The latest version on PyPi is pretty outdated, so the pip install is going to leave folks with issues that have since been fixed in the code.

We should either update the PyPi version (i.e. do a v0.3 release) or if development is ongoing, update the install instructions to use `python setup.py install'.

AFA of composition of subsampled Laplace Mechanism breaks down

Following the tutorial here, I tried to compute optimal accounting of composition of subsampled Gaussian and Laplace Mechanisms:

from autodp.mechanism_zoo import GaussianMechanism, LaplaceMechanism
from autodp.transformer_zoo import ComposeAFA
from autodp.transformer_zoo import AmplificationBySampling_pld

sigma = 1.0
b = 1.0
delta = 1e-5
prob=.1

gm1 = GaussianMechanism(sigma, phi_off=False, name='phi_GM1')
lm1 = LaplaceMechanism(b, phi_off=False, name='phi_LM1')


transformer_remove_only = AmplificationBySampling_pld(PoissonSampling=True, neighboring='remove_only')
transformer_add_only = AmplificationBySampling_pld(PoissonSampling=True, neighboring='add_only')
sample_gau_remove_only =transformer_remove_only(gm1, prob)
sample_lap_remove_only =transformer_remove_only(lm1, prob)
compose_gm = ComposeAFA()
compose_lm = ComposeAFA()
composed_gm_afa = compose_gm([sample_gau_remove_only], [10])
composed_lm_afa = compose_lm([sample_lap_remove_only], [10])

eps_gm_afa = composed_gm_afa.get_approxDP(delta)
eps_lm_afa = composed_lm_afa.get_approxDP(delta)

The Gaussian proceeds normally. The Laplace breaks down with the following error:

  File "AUTODPHOME/gmvslm.py", line 25, in <module>
    eps_lm_afa = composed_lm_afa.get_approxDP(delta)
  File "AUTODPHOME/autodp/autodp_core.py", line 113, in get_approxDP
    return self.approxDP(delta)
  File "AUTODPHOME/autodp/converter.py", line 1118, in min_f1_f2
    return np.minimum(f1(x), f2(x))
  File "AUTODPHOME/autodp/converter.py", line 824, in approxdp
    t = exp_eps(1 - delta)
  File "AUTODPHOME/autodp/converter.py", line 1080, in inv_f
    results = minimize_scalar(normal_equation, bounds=bounds, bracket=[1,2], tol=tol)
  File "AUTODPHOME/venv/lib/python3.8/site-packages/scipy/optimize/_minimize.py", line 879, in minimize_scalar
    return _minimize_scalar_brent(fun, bracket, args, **options)
  File "AUTODPHOME/venv/lib/python3.8/site-packages/scipy/optimize/_optimize.py", line 2511, in _minimize_scalar_brent
    brent.optimize()
  File "AUTODPHOME/venv/lib/python3.8/site-packages/scipy/optimize/_optimize.py", line 2281, in optimize
    xa, xb, xc, fa, fb, fc, funcalls = self.get_bracket_info()
  File "AUTODPHOME/venv/lib/python3.8/site-packages/scipy/optimize/_optimize.py", line 2257, in get_bracket_info
    xa, xb, xc, fa, fb, fc, funcalls = bracket(func, xa=brack[0],
  File "AUTODPHOME/venv/lib/python3.8/site-packages/scipy/optimize/_optimize.py", line 2765, in bracket
    fa = func(*(xa,) + args)
  File "AUTODPHOME/autodp/converter.py", line 1077, in normal_equation
    return abs(fun(x))
  File "AUTODPHOME/autodp/converter.py", line 1073, in fun
    return f(x) - y
  File "AUTODPHOME/autodp/converter.py", line 818, in trade_off
    result = cdf_p(log_e) + x*cdf_q(-log_e)
  File "AUTODPHOME/autodp/autodp_core.py", line 324, in <lambda>
    cdf_p2q = lambda x: converter.phi_to_cdf(log_phi_p2q, x, n_quad = n_quad)
  File "AUTODPHOME/autodp/converter.py", line 924, in phi_to_cdf
    res = integrate.fixed_quad(inte_f, -1.0, 1.0, n =n_quad)
  File "AUTODPHOME/venv/lib/python3.8/site-packages/scipy/integrate/_quadrature.py", line 151, in fixed_quad
    return (b-a)/2.0 * np.sum(w*func(y, *args), axis=-1), None
  File "AUTODPHOME/autodp/converter.py", line 923, in <lambda>
    inte_f = lambda t: qua(t) * (1 + t ** 2) / ((1 - t ** 2) ** 2)
  File "AUTODPHOME/autodp/converter.py", line 919, in qua
    phi_result = [log_phi(x) for x in new_t]
  File "AUTODPHOME/autodp/converter.py", line 919, in <listcomp>
    phi_result = [log_phi(x) for x in new_t]
  File "AUTODPHOME/autodp/transformer_zoo.py", line 111, in new_log_phi_p2q
    return sum([c * mech.log_phi_p2q(x) for (mech, c) in zip(mechanism_list, coeff_list)])
  File "AUTODPHOME/autodp/transformer_zoo.py", line 111, in <listcomp>
    return sum([c * mech.log_phi_p2q(x) for (mech, c) in zip(mechanism_list, coeff_list)])
TypeError: unsupported operand type(s) for *: 'int' and 'NoneType'

Does autodp support arbitrary group size?

Hi! I am wondering how we should use auto-dp when the adjacent datasets differ by more than one data point. I noticed there is a parameter called group_size when initializing the Mechanism, but I cannot find any other usage of this parameter. Is it left on purpose for future use, or am I missing something here?

For now, I am manually increasing my noise scale sqrt(n) times if the adjacent datasets differ by n points, but I would really appreciate any advice on how to achieve this goal in a smarter way. Thanks!

Pure Fdp gaussian mechanism doesn't work under composition of multiple rounds

Computing the get_fDP(delta) for a gaussian mechanism with pure Fdp works fine, but trying to compose the pure-fdp gaussian mechanism for several rounds, the function get_approxDP(delta) always returns inf as the result of composition.

Fdp seems not to work under Composition or AmplificationBySampling

def compute_amplified_fl_privacy(num_rounds=60, noise_multiplier=20, num_users=500, users_per_round=100):
    gm1 = GaussianMechanism(sigma=noise_multiplier, RDP_off=True, approxDP_off=True, fdp_off=False)

    compose = Composition()
    num_rounds = [num_rounds]
    q = users_per_round / num_users
    delta = num_users ** (-1)

    composed_fdp = compose([gm1], num_rounds)

    composed_fdp_eps = composed_fdp.get_fDP(delta)
    composed_fdp_approxdp = composed_fdp.get_approxDP(delta)

    mechanism_fdp_eps = gm1.get_fDP(delta)
    mechanism_fdp_approxdp = gm1.get_approxDP(delta)
    print('---------------------------------------------------')
    print('composed fdp eps = ', composed_fdp_eps, ', at delta = ', delta)
    print('composed fdp eps_approxdp = ', composed_fdp_approxdp, ', at delta = ', delta)

    print('mechanism fdp eps = ', mechanism_fdp_eps, ', at delta = ', delta)
    print('mechanism fdp approxdp = ', mechanism_fdp_approxdp, ', at delta = ', delta)


def main():
    compute_amplified_fl_privacy(num_rounds=60, noise_multiplier=20, num_users=500, users_per_round=100)


if __name__ == '__main__':
    main()
    print('DONE')

issue with "privacy_calibrator.subsample_epsdelta_inverse(eps,delta,prob=gamma)"

Thanks for making this tool available for DP research, I appreciate the great work.

I was going through your tutorial on privacy calibrator (section 4) https://github.com/yuxiangw/autodp/blob/master/tutorials/tutorial_privacy_calibrator.ipynb

Not sure if the function "privacy_calibrator.subsample_epsdelta_inverse(eps, delta, gamma) is giving the right answer. For example

eps = 1
delta = 1e-6
gamma = 0.01

First, apply subsampling lemma to calibrate the basic privacy needed

eps0,delta0 = privacy_calibrator.subsample_epsdelta_inverse(eps,delta,prob=gamma)

Then we can get the amount of noise needed from the base mechanism

print((eps0,delta0))
params = privacy_calibrator.gaussian_mech(eps0,delta0)
print(f'Gaussian: eps,delta,gamma = ({eps},{delta},{gamma}) ==> Noise level sigma=',params['sigma'])

It gives the answer

Gaussian: eps,delta,gamma = (1,1e-06,0.01) ==> Noise level sigma= 0.9366237019634324

However, I was expecting sigma = 1.258483615711703
similar to the result when we try

params = privacy_calibrator.gaussian_mech(eps,delta,prob=gamma)
print(f'Gaussian: eps,delta,gamma = ({eps},{delta},{gamma}) ==> Noise level sigma=',params['sigma'])

Gaussian: eps,delta,gamma = (1,1e-06,0.01) ==> Noise level sigma= 1.258483615711703

Slow privacy calibration

noise calibration takes a very long time, and doesn't return a result after 22 minutes(at least when prob < 1 and eps is small) --- any fix for this?

%time ans = privacy_calibrator.gaussian_mech(0.1, 1e-9, k=128, prob=0.1)

/usr/local/lib/python3.6/dist-packages/autodp/utils.py:21: RuntimeWarning: divide by zero encountered in log
  mag = y + np.log(1 - np.exp(x - y))
/usr/local/lib/python3.6/dist-packages/autodp/utils.py:24: RuntimeWarning: divide by zero encountered in log
  mag = x + np.log(1 - np.exp(y - x))
CPU times: user 22min 9s, sys: 2.31 s, total: 22min 11s
Wall time: 22min 12s

documentation question

Hi,

thanks a lot for this work. It is very helpful.

Just one quick issue : what is the role of coeff in the compose_subsampled_mechanism?

thanks a lot

Issue with SSP_scale and AdaSSP_scale inheritance

I was running the tutorial_AdaSSP_vs_noisyGD.ipynb Tutorial Notebook on Google Colab. I encountered the following issue while running the 4th Cell Block of the notebook:

AttributeError: 'SSP_scale' object has no attribute 'set_all_representation'.

The expanded error is as follows:

Kindly have a look at the earliest @yuxiangw. Thanks in advance!