GithubHelp home page GithubHelp logo

raregan's Introduction

RareGAN: Generating Samples for Rare Classes

[paper (AAAI 2022)] [paper (arXiv)] [code]

Authors: Zinan Lin, Hao Liang , Giulia Fanti, Vyas Sekar

Abstract: We study the problem of learning generative adversarial networks (GANs) for a rare class of an unlabeled dataset subject to a labeling budget. This problem is motivated from practical applications in domains including security (e.g., synthesizing packets for DNS amplification attacks), systems and networking (e.g., synthesizing workloads that trigger high resource usage), and machine learning (e.g., generating images from a rare class). Existing approaches are unsuitable, either requiring fully-labeled datasets or sacrificing the fidelity of the rare class for that of the common classes. We propose RareGAN, a novel synthesis of three key ideas: (1) extending conditional GANs to use labelled and unlabelled data for better generalization; (2) an active learning approach that requests the most useful labels; and (3) a weighted loss function to favor learning the rare class. We show that RareGAN achieves a better fidelity-diversity tradeoff on the rare class than prior work across different applications, budgets, rare class fractions, GAN losses, and architectures.


This repo contains the codes for reproducing the experiments of our RareGAN in the paper. The codes were tested under Python 3.6.9 + TensorFlow 1.15.2 and Python 3.7.13 + TensorFlow 2.8.2.

The code can be easily extended to your own applications, like synthesizing images from rare classes, or synthesizing data of more general formats (e.g., network packets, texts) for rare events (e.g., attacks).

Prerequisites

The codes are based on GPUTaskScheduler library, which helps you automatically schedule the jobs among GPU nodes. Please install it first. You may need to change GPU configurations according to the devices you have. The configurations are set in config_generate_data.py in each directory. Please refer to GPUTaskScheduler's GitHub page for the details of how to make proper configurations.

To run with TensorFlow 2, please install TensorFlow-Slim by pip install tf-slim.

Image Experiments: Generating Rare Samples for CIFAR10 and MNIST

CIFAR10

  • Preparing the data according to the instructions here.
  • Run
cd for_images
python -m scripts.CIFAR10.main_generate_data

MNIST

  • Preparing the data according to the instructions here.
  • Run
cd for_images
python -m scripts.MNIST.main_generate_data

Your Own Image Dataset

Simply add the data loading logic for your dataset here, and modify the training configuration file accordingly (example).

System Experiments: Generating Network Packets for DNS Amplification Attacks and Packet Classifier Attacks

DNS Amplification Attacks

cd for_systems
python -m scripts.DNS.main_generate_data

WARNING: During training, the code will generate a large number of DNS queries to the specified DNS server. Please make sure to use your own DNS servers in a sandboxed environment to avoid harming the public Internet.

Generating Packets that Trigger Long Processing Time for Packet Classifiers

To get an accurate evaluation of the packet processing time, we used separate servers for running RareGAN training and evaluating the processing time.

  • On the server for evaluation, run
cd for_systems
python3 -m blackboxes.main_start_rpc_runner_server
cd for_systems
python -m scripts.PC.main_generate_data

Your Own Dataset or Application

The code supports a general data format and can be extended to any applications that want samples to have a large metric (e.g., packet amplification ratio in amplification attacks, or processing time of a system).

The following is all you need to do:

Results

The code generates the following result files/folders:

  • <code folder>/results/<hyper-parameters>/worker.log: Standard output and error from the code.
  • <code folder>/results/<hyper-parameters>/generated_data/data.npz: Generated data from the rare class.
  • <code folder>/results/<hyper-parameters>/sample/*.png (for image experiments only): Generated images during training.
  • <code folder>/results/<hyper-parameters>/checkpoint/*: TensorFlow checkpoints and customized checkpoints.
  • <code folder>/results/<hyper-parameters>/time.txt: Training iteration timestamps.

raregan's People

Contributors

fjxmlzn avatar rogerni avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.