GithubHelp home page GithubHelp logo

greenelab / sprint_gan Goto Github PK

View Code? Open in Web Editor NEW
106.0 10.0 34.0 14.2 MB

Privacy-preserving generative deep neural networks support clinical data sharing

Home Page: http://www.biorxiv.org/content/early/2017/07/05/159756

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 75.83% Python 1.06% Shell 0.02% HTML 23.07% Dockerfile 0.02%
gan generative-adversarial-network differential-privacy data-sharing neural-network supplement

sprint_gan's Introduction

Privacy-preserving generative deep neural networks support clinical data sharing

Brett K. Beaulieu-Jones1, Zhiwei Steven Wu2, Chris Williams3, Casey S. Greene3*

1Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

2Computer and Information Science, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

3Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

To whom correspondence should be addressed: csgreene (at) upenn.edu

Introduction

Though it is widely recognized that data sharing enables faster scientific progress, the sensible need to protect participant privacy hampers this practice in medicine. We train deep neural networks that generate synthetic subjects closely resembling study participants. Using the SPRINT trial as an example, we show that machine-learning models built from simulated participants generalize to the original dataset. We incorporate differential privacy, which offers strong guarantees on the likelihood that a subject could be identified as a member of the trial. Investigators who have compiled a dataset can use our method to provide a freely accessible public version that enables other scientists to perform discovery-oriented analyses. Generated data can be released alongside analytical code to enable fully reproducible workflows, even when privacy is a concern. By addressing data sharing challenges, deep neural networks can facilitate the rigorous and reproducible investigation of clinical datasets.

First run Preprocess.ipynb on the SPRINT clinical trial data (https://challenge.nejm.org/pages/home)

The results in the paper use:

python dp_gan.py --noise 1 --clip_value 0.0001 --epochs 500 --lr 2e-05 --batch_size 1 --prefix 1_0.0001_2e-05_

Due to GPU threading, the current lack of an effective way to set a random seed for tensorflow, and the fact that we are optimizing two neural networks (unlikely to find global minima) it may require running multiple times to hit a good local minimum.

Fig 2AFig 2B Fig 2CFig 2D

Median Systolic Blood Pressure Trajectories from initial visit to 27 months. A.) Simulated samples (private and non-private) generated from the final (500th) epoch of training. B.) Simulated samples generated from the epoch with the best performing logistic regression classifier. C.) Simulated samples from the epoch with the best performing random forest classifier. D.) Simulated samples from the top five random forest classifier epochs and top five logistic regression classifier epochs.

Feedback

Please feel free to email us - (brettbe) at med.upenn.edu with any feedback or raise a github issue with any comments or questions.

Acknowledgements

Acknowledgments: We thank Jason H. Moore (University of Pennsylvania), Aaron Roth (University of Pennsylvania), Gregory Way (University of Pennsylvania), Yoseph Barash (University of Pennsylvania) and Anupama Jha (University of Pennsylvania) for their helpful discussions. We also thank the participants of the SPRINT trial and the entire SPRINT Research Group for providing the data used in this study. Funding: This work was supported by the Gordon and Betty Moore Foundation under a Data Driven Discovery Investigator Award to C.S.G. (GBMF 4552). B.K.B.-J. Was supported by a Commonwealth Universal Research Enhancement (CURE) Program grant from the Pennsylvania Department of Health and by US National Institutes of Health grants AI116794 and LM010098. Z.S.W is funded in part by a subcontract on the DARPA Brandeis project and a grant from the Sloan Foundation.

sprint_gan's People

Contributors

brettbj avatar cgreene avatar steven7woo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sprint_gan's Issues

Noise scaled by extra clip_norm

I think that in noisy_optimizers.py line 36 you scale the standard deviation of the noise by one factor of clip_norm too many.

            grads = [(g + K.random_normal(g.shape, mean=0,
                                          stddev=(self.noise * (self.clipnorm ** 2))))

The stddev is getting squared to become variance. Variance should be proportional to self.clipnorm ** 2, but in current code it will be proportional to self.clipnorm ** 4.

(Also, I think the differential privacy guarantees will not work when using Adam because the first and second order moments affect future gradients, but I am less sure of this. Abadi et al. derive their moments accountant for basic SGD without momentum)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.