GithubHelp home page GithubHelp logo

mit-ll / pymasq Goto Github PK

View Code? Open in Web Editor NEW
6.0 11.0 1.0 19.67 MB

PyMASq is an easy-to-use, Python based software tool with enhanced SDC capabilities

Python 100.00%
disclosure statistical anonymization anonymization-metrics risk-analysis sdc statistical-disclosure-control

pymasq's Introduction

PyMASq

Python-based Mitigation Application and Assessment (MASq)

Introduction

In recent years, the advancement of computational technologies and artificial intelligence/machine learning (AI/ML) capabilities have resulted in vast amounts of data becoming publicly available (intentionally disclosed or not) degrading the privacy of individuals and organizations potentially leading to hacking, discrimination, ransoming, and exploitation. To gain decision advantage over the US government and financial leverage over US persons adversaries have disclosed sensitive information. In addition, data aggregation efforts expose patterns of sensitive activities by providing additional context about sensitive records and attributes. As more datasets are involved in transparency and aggregation, institutions often do not have enough subject matter experts to assess and mitigate risks of disclosure, the proper tools to effectively and efficiently de-identify data, or the time to fully vet risk and utility of the data to missions and stakeholders. Academic research contains a number of viable approaches for mitigating risk of exposing sensitive information, but tools are either not automated, no longer supported, or designed for experts in the field. An automated decision support tool is required to empower non-expert users to explore and mitigate risk in their data to protect the privacy of individuals and groups.

With funding from the Department of Defense and Lincoln Laboratory’s New Technologies Initiative (NTI), a team of researchers developed the Mitigation Application and Assessment (MASq) software tool which provides situational awareness to data owners and mission stakeholders about the disclosure risk contained within their dataset and provides methods for mitigating said risk. MASq combines standard and novel techniques in Artificial Intelligence and Statistical Disclosure Control (SDC) to facilitate some – or all – of the procedures and workflows associated with data de-identification prior to release, including:

  • identifying which data elements reveal an organization’s activities as a group in their dataset, thus creating risk for their mission
  • providing a comprehensive collection of mitigation techniques for generalizing or suppressing elements within their dataset
  • providing quantitative metrics, grounded in well-supported literature, to evaluate the disclosure risk contained within a dataset and the information loss associated with mitigations (modifications) made to the dataset which reduce disclosure risk. Furthermore, MASq can automate the aforementioned procedures by applying hundreds of combinations of mitigations, evaluating their impact with respect to disclosure risk and information loss, and generating a report which ranks the most effective mitigations strategies identified for a particular dataset. To date, MASq has been transitioned to a government sponsor and is in the process of being released as an open-source software package.

Link to User Guide

PyMASq User Guide

Installating from Git

pip install .
git clone [email protected]:mit-ll/pymasq.git
cd pymasq

Installing into a Conda Environment

conda create -n masq python=3.8 -y
conda activate masq
pip install .

To generate the docs

python -m pip install -r ./doc-requirements.txt

Distribution Statement

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

© 2021 Massachusetts Institute of Technology.

Subject to FAR 52.227-11 – Patent Rights – Ownership by the Contractor (May 2014)
SPDX-License-Identifier: Insert SPDX ID

This material is based upon work supported under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the U.S. Air Force.

The software/firmware is provided to you on an As-Is basis

pymasq's People

Contributors

howardgershon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

bblease

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.