GithubHelp home page GithubHelp logo

yevh / anonymizer Goto Github PK

View Code? Open in Web Editor NEW
9.0 1.0 1.0 1.18 MB

Anonymize sensitive data in your datasets.

Python 100.00%
anonymize anonymized anonymizer crypto cryptography data-anonymization data-anonymized data-science data-security dataset

anonymizer's Introduction

Data Anonymizer

Anonymize sensitive data in your datasets. It generates pseudonyms for specified columns in a CSV file using a salted SHA-256 hashing function. Integrity checks with HMAC. The script allows for reverting the data back to its original form using encrypted mapping files that are generated during the anonymization process.

How It Works

  • Load or Generate Secret Key (if not exist)
    • Key should be base64-encoded 32 bytes (256 bits) long
  • Process Input Data File
  • Data Pseudonymization or Reversion
  • Encrypted Mapping Files
    • During the anonymize operation, for each specified column, the script creates an encrypted file that maps the pseudonyms back to the original data.
    • These mapping files are encrypted using the Fernet symmetric encryption scheme, and an HMAC is appended to ensure data integrity.
  • Data Integrity
    • When reverting data, the script first checks the integrity of the encrypted mapping files by comparing a stored HMAC with a computed HMAC.
  • Output

Install dependencies

pip3 install -r requirements.txt

How to Use

python3 anonymizer.py file_path operation --cols column_names --key_path secret_key_path
  • file_path: Path to the data file (CSV format)
  • operation: anonymize or revert
  • --cols: Specific columns to anonymize or revert (all columns by default)
  • --key_path: Path to the secret key file (required)

Usage Example

  1. Generate a data example
python3 data.py

data!

  1. Anonymize
python3 anonymizer.py data.csv anonymize --key_path secret_key.key

anonymized!

  1. Revert
python3 anonymizer.py data.csv revert --key_path secret_key.key

reverted!

Security

  • Secret Key Storage - ensure the secret key file is stored securely. If compromised, an attacker could decrypt the pseudonym mappings and de-anonymize the data.
  • Encrypted Mapping Files - ensure that these files are stored in a secure location with restricted access. Access to these files and the secret key allows data de-anonymization.

anonymizer's People

Contributors

yevh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

chesterzc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.