GithubHelp home page GithubHelp logo

leriomaggio / ppml-tutorial Goto Github PK

View Code? Open in Web Editor NEW
35.0 3.0 4.0 13.25 MB

Privacy-Preserving Machine Learning (PPML) Tutorial

License: Apache License 2.0

Jupyter Notebook 99.59% Python 0.41%
data-science deep-learning machine-learning privacy-enhancing-technologies privacy-preserving privacy-preserving-machine-learning trustworthy-machine-learning tutorial

ppml-tutorial's Introduction

PPML: Machine Learning on Data you cannot see

Repository for the tutorial on Privacy-Preserving Machine Learning (PPML) presented at SciPy 2023

Intro

Privacy guarantee is the most crucial requirement when it comes to analyse sensitive data. However, data anonymisation techniques alone do not always provide complete privacy protection; moreover Machine Learning models could also be exploited to leak sensitive data when attacked, and no counter-measure is applied. Privacy-preserving machine learning (PPML) methods hold the promise to overcome all these issues, allowing to train machine learning models with full privacy guarantees. In this tutorial we will explore several methods for privacy-preserving data analysis, and how these techniques can be used to safely train ML models without actually seeing the data.

Description

Privacy guarantees are the most crucial requirement when it comes to analyse sensitive data. These requirements could be sometimes very stringent, so that it becomes a real barrier for the entire pipeline. Reasons for this are manifold, and involve the fact that data could not be shared nor moved from their silos of resident, let alone analysed in their raw form. As a result, data anonymisation techniques are sometimes used to generate a sanitised version of the original data. However, these techniques alone are not enough to guarantee that privacy will be completely preserved. Moreover, the memoisation effect of Deep learning models could be maliciously exploited to attack the models, and reconstruct sensitive information about samples used in training, even if these information were not originally provided.

Privacy-preserving machine learning (PPML) methods hold the promise to overcome all those issues, allowing to train machine learning models with full privacy guarantees.

This workshop will be mainly organised in three main parts. In the first part, we will introduce the main concepts of differential privacy: what is it, and how this method differs from more classical anonymisation techniques (e.g. k-anonymity). In the second part, we will focus on Machine learning experiments. We will start by demonstrating how DL models could be exploited (i.e. inference attack ) to reconstruct original data solely analysing models predictions; and then we will explore how differential privacy can help us protecting the privacy of our model, with minimum disruption to the original pipeline. Finally, we will conclude the tutorial considering more complex ML scenarios to train Deep learning networks on encrypted data, with specialised distributed federated learning strategies.

Outline

  • Introduction: Brief Intro to PPML and to the workshop (10 mins) SLIDES

  • Part 1: Programming Privacy (90 mins)

    • De-identification
    • K-anonimity and limitations
    • Differential Privacy
    • Intro to Differential Privacy for Machine Learning
  • Break (10 mins)

  • Part 2: Strengthening Deep Neural Networks (60 mins)

    • ML Model vulnerabilities: Adversarial Examples and inference attack
    • DL training with Differential Privacy
  • Break (5 mins)

  • Part 3: Primer on Privacy-Preserving Machine Learning (60 mins)

    • DL training on (Homomorphically) Encrypted Data
    • Federated Learning
  • Closing Remarks (5 mins)

Notebooks

Quick access to each notebooks, also to open on Anaconda Notebooks

SPECIAL CODE: Use the Code SCIPY23 to get special access to 30-days free trial to the Starter Tier (valid until August 18th, 2023).

1 Data Anonimisation

  • References: open_in_anaconda

  • De-identification: open_in_anaconda

  • K-Anonimity: open_in_anaconda

2 Differential Privacy

  • Differential Privacy open_in_anaconda

  • Properties of Differential Privacy open_in_anaconda

  • Approx Differential Privacy open_in_anaconda

  • Differential Privacy ML Models open_in_anaconda

3 ML Models Attacks

  • FSGM Attack open_in_anaconda

  • MIA Training open_in_anaconda

  • MIA Reconstruction open_in_anaconda

  • MIA Training with DP open_in_anaconda

  • MIA Reconstruction with DP open_in_anaconda

4 Federated Learning

  • Intro to Federated Learning open_in_anaconda

  • Homomorphic Encryption open_in_anaconda

  • Flower FL open_in_anaconda

Get the material

Clone the current repository by running the following instructions:

cd $HOME  #Β This will make sure you'll be in your HOME folder
git clone https://github.com/leriomaggio/ppml-tutorial.git

Note: This will create a new folder named ppml-tutorial. Move into this folder by typing:

cd ppml-tutorial

Well done! Now you should do be in the right location. Bear with me for another few seconds, following instructions reported below πŸ™

Installation Instructions (or not πŸ™ƒ)

All the materials in this tutorial (code, and lecture notes) are made available as Jupyter notebooks.

(1) There is no specific hardware requirement to execute the code, i.e. running everything on your laptop should be more than fine 😊.

(2): As for the software requirements, we will be using a pretty standard Python/PyData stack: numpy, pandas, matplotlib, and scikit-learn for all the data science and Machine learning parts, along with pytorch and torchvision to work on the Deep Learning examples.

Moreover, a few extra / specialised packages will be also featured:

  • Opacus: A library to train PyTorch models with differential privacy
  • PHE: A Python 3 library implementing the Paillier Partially Homomorphic Encryption
  • Flower: A Federated Learning library for PyTorch

To get ready to run the code in this tutorial you could either (a) install and configure a (conda) environment on your computer with all the necessary dependency; or (b) use Anaconda Notebooks and run everything without installing anything at all on your computer.

Please refer to the setup.md document for step-by-step instructions, or to get a special discount code to access Anaconda Notebooks.

If you spot any error/mistake, please feel free to reach out directly to me, or to open an Issue on the repository.

Any feedback will be very much appreciated!

Thank you! πŸ™

Colophon

Author: Valerio Maggio (@leriomaggio), Researcher, SSI Fellow, and Data Scientist Advocate at Anaconda.

All the Code material is distributed under the terms of the Apache License. See LICENSE file for additional details.

All the instructional materials in this repository are free to use, and made available under the Creative Commons Attribution license. The following is a human-readable summary of (and not a substitute for) the full legal text of the CC BY 4.0 license.

You are free:

  • to Share---copy and redistribute the material in any medium or format
  • to Adapt---remix, transform, and build upon the material

for any purpose, even commercially.

The licensor cannot revoke these freedoms as long as you follow the license terms.

Under the following terms:

  • Attribution --- You must give appropriate credit, and provide a link to the LICENSE cc-by-human, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

  • No additional restrictions --- You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

Acknowledgment and funding

The material developed in this tutorial has been supported by Anaconda, and the Software Sustainability Institute (SSI), as part of my SSI fellowship on PETs (Privacy Enhancing Technologies).

Please see this deck to know more about my fellowship plans.

Public shout out to all the people at OpenMined for all the encouragement and support with the preparation of this tutorial. I hope the material in this repository could contribute to raise awareness about all the amazing work on PETs it's being provided to the Open Source and the Python communities.

Anaconda Logo OpenMined

Contacts

For any questions or doubts, feel free to open an issue in the repository, or drop me an email @ vmaggio_at_anaconda_dot_com

ppml-tutorial's People

Contributors

leriomaggio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.