GithubHelp home page GithubHelp logo

baukebrenninkmeijer / on-the-generation-and-evaluation-of-synthetic-tabular-data-using-gans Goto Github PK

View Code? Open in Web Editor NEW
40.0 4.0 4.0 49.26 MB

Repository for the results of my master thesis, about the generation and evaluation of synthetic data using GANs

License: MIT License

Jupyter Notebook 98.18% Python 1.82%
synthetic-data synthetic-dataset-generation gan generative-adversarial-networks tabular-data data-synthesis data-evaluation

on-the-generation-and-evaluation-of-synthetic-tabular-data-using-gans's Introduction

On the Generation and Evaluation of Synthetic Tabular Data using GANs

Overview

  • Master Thesis Data Science, Radboud University 2019
  • License: MIT
  • Based on the awesome work from the guys at MIT Data to AI Lab. (TGAN, SDGym)

Abstract

With privacy regulations becoming stricter, the opportunity to apply synthetic data is growing rapidly. Synthetic data can be used in any setting where access to data with personal information is not strictly necessary. However, many require the synthetic data to present the same relations as the original data. Existing statistical models and anonymization tools often have adverse effects on the quality of data for downstream tasks like classification. Deep learning based synthesization techniques like GANs provide solutions for cases where it is vital these relations are kept. Inspired by GANs, we propose an improvement in the state-of-the-art in maintaining these relations in synthetic data. Our proposal includes three contributions. First, we propose the addition of skip connections in the generator, which increases gradient flow and modeling capacity. Second, we propose using the WGAN-GP architecture for training the GAN, which suffers less from mode-collapse and has a more meaningful loss. And finally, we propose a new similarity metric for evaluating synthetic data. This metric better captures different aspects of synthetic data when comparing it to real data. We study the behaviour of our proposed model adaptations against several baseline models on three datasets. Our results show that our proposals improve on the state-of-the-art models, by creating higher quality data. Our evaluation metric captures quality improvements in synthetic data and gives detailed insight into the strengths and weaknesses of evaluated models. We conclude that our proposed adaptations should be used for data synthesis, and our evaluation metric is precise and gives a balanced view of different aspects of the data.

The data evaluation library can be found in an additional repository: https://github.com/Baukebrenninkmeijer/Table_Evaluator.

Motivation

To see the motivation for my decisions, please have a look at my master thesis, found at https://www.ru.nl/publish/pages/769526/z04_master_thesis_brenninkmeijer.pdf

Using this work?

If you're using this work, please cite the following work:

@article{brenninkmeijer2019synthetic,
  title={On the Generation and Evaluation of Synthetic Tabular Data using GANs},
  author={Bauke Brenninkmeijer, Youri Hille, Arjen P. de Vries},
  year={2019}
}

on-the-generation-and-evaluation-of-synthetic-tabular-data-using-gans's People

Contributors

baukebrenninkmeijer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

on-the-generation-and-evaluation-of-synthetic-tabular-data-using-gans's Issues

Tensorpack version

Hi,

When using the latest version of tensor pack I am getting the following error,

from tensorpack import (
ImportError: cannot import name 'InputDesc' from 'tensorpack' (/Users/user/anaconda3/lib/python3.8/site-packages/tensorpack/__init__.py)

On the tensorpack GitHub page, it is mentioned that it is not stable and you must use the exact version of Tensorpack as used in a project to run it.

So could you please specify the tensor pack version you are using?

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.