GithubHelp home page GithubHelp logo

sharadasowmya14 / spring-spark-k-anonymity Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 29.91 MB

Spring Application to apply K-Anonymity in a distributed manner using Apache Spark and Executor Service.

License: MIT License

Java 100.00%
spring-boot spark k-anonymity maven

spring-spark-k-anonymity's Introduction

Spring-Spark-K-Anonymity

This repository contains a Spring-boot application with services exposed to generate and anonymize data sets in a distributed manner.

Description

Implements K-Anonymization in a distributed manner using:

  • Apache Spark for partitioning and merging of data set.
  • Executor Service for asynchronous anonymization of partitions.
  • ARX library for applying K-Anonymization on partitioned data set.

Prerequisites

  1. Install Java as described in How do I install Java?
  2. Install Maven as described in Installing Apache Maven
  3. Install Apache Spark in Spark Overview

Project Structure

├───java
│   └───com
│       └───spark
│           ├───arx 
│           ├───car_basic
│           ├───controllers
│           ├───data_generator
│           ├───file_utils
│           └───spark_config
└───resources
    ├───input-data
    └───output-data

Build and Run

  • Build the project by installing all dependencies required using the below command:
mvn clean install
  • Run the spring-boot application using the below command:
mvn spring-boot:run

The application will run on port 8080 by default.

Data Set Generation

  1. For generating data set, run the below Standalone Java application:
javac DataSetGenerator.java
java DataSetGenerator.java
  1. To increase the number of records in the data set, the below-mentioned strategies can be used:
    • Increasing the number of cars.
    • Capturing latitude and longitude for shorter distance intervals.

Anonymization

  1. Below endpoints are exposed as part of the Spring application.
Method Request Description
POST /api/health Reports if the application has started.
POST /api/anonymize Anonymizes the created data set.
  1. The number of partitions can be altered as shown below.
sparkConfigurator.loadDataSource(carSchema, 20);
  1. The K-value can be specified as shown below.
config.addPrivacyModel(new KAnonymity(10));

spring-spark-k-anonymity's People

Contributors

sharadasowmya14 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.