GithubHelp home page GithubHelp logo

dssyntheticclient's Introduction

dsSyntheticClient

dsSyntheticClient is a DataSHIELD client side package for generating synthetic data.

DataSHIELD is a client-server framework for privacy preserving computation.

Please also look at the server side package dsSynthetic

https://github.com/tombisho/dsSynthetic

Quick start

install.packages('devtools')
library(devtools)

devtools::install_github('tombisho/dsSyntheticClient')

Please see this bookdown for detailed guidance

This uses the Opal demo server which has all server side packages installed

https://opal-sandbox.mrc-epid.cam.ac.uk/

Installation

See the link below on how to install a package in Opal

https://opaldoc.obiba.org/en/latest/web-user-guide/administration/datashield.html#add-package

  • Install the package in R
install.packages('devtools')
library(devtools)

devtools::install_github('tombisho/dsSyntheticClient')
  • Follow the bookdown which as executable code and synthetic data

https://tombisho.github.io/synthetic_bookdown/

This uses the Opal demo server which has all server side packages installed

https://opal-sandbox.mrc-epid.cam.ac.uk/

Usage

Please see this bookdown for detailed guidance

This uses the Opal demo server which has all server side packages installed

https://opal-sandbox.mrc-epid.cam.ac.uk/

Acknowledgements

Thanks to the DataSHIELD team for providing the plaform on which these functions are based.

Thanks to OBiBa and Epigeny for the Opal data warehouse which we use to run DataSHIELD

Contact

Tom R.P. Bishop and Soumya Banerjee

[email protected]

Citation

If you like or use this work, please cite the following manuscript

Banerjee S, Bishop TRP. dsSynthetic: Synthetic data generation for the DataSHIELD federated analysis system. BMC Res. Notes. 2022;15 (1) :230

https://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-022-06111-2

or

dsSynthetic: Synthetic data generation for the DataSHIELD federated analysis system

https://osf.io/tkxqm/

dssyntheticclient's People

Contributors

tombisho avatar neelsoumya avatar davraam avatar stuartwheater avatar datashield-admin avatar

Stargazers

 avatar Owain  gaunders avatar

Watchers

 avatar

Forkers

neelsoumya

dssyntheticclient's Issues

better example of use

e.g. generating a variable for prevalent diabetes?

Date > something, diabetes = 1?

Would expect a mix of 0 and 1 but with DS no way of telling if this is correct or not (e.g. from mean)

perhaps a problem with the logic test and date or with a indicator variable as a character/factor?

Boltzmann machines

Stefan's work about generating data. Might be useful for larger datasets. But more complex

MagmaScript.min.js should be somewhere else

files that will be installed at the root of the package should be in the "inst" folder in the source: then "MagmaScript.min.js" could be moved to "inst/MagmaScript.min.js" and then you will get the path to this file using:

system.file("MagmaScript.min.js", package = "dsSyntheticClient")

simstudy helper functions

To:

  • break factors into binaries, then convert to numerics
  • "glues" the binaries back into the original factors again
  • provides a looped version of ds.mean and ds.var

Stop users destroying the server

Generating synthetic data is resource intensive.

A longer term option might be to make synthetic data generation a task run by data owners.

In the shorter term, maybe limit the number of variables that can be processed (10?)

improve chapter 6

  • example could be improved
  • make it less chatty
  • more intro to DSLite
  • reference to DSLite
  • What to do if they want more packages
  • what about profiles - arg!

similarity and disclosure risk

You might want to add some functions that measure the similarity between the real and the synthetic data and functions that measure the risk of disclosure in the synthetic data. Maybe there are such functions in the synthop package?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.