GithubHelp home page GithubHelp logo

ppomes / myanon Goto Github PK

View Code? Open in Web Editor NEW
85.0 5.0 12.0 1.62 MB

A mysqldump anonymizer

Home Page: https://ppomes.github.io/myanon/

License: Other

Makefile 1.08% Shell 0.02% M4 6.41% C 74.51% Yacc 11.53% Lex 6.09% Dockerfile 0.21% Python 0.14%
mysqldump anonymizer database-anonymizer gpdr rgpd anonymization mysql anonymized-data anonymized-database maskingdb

myanon's Introduction

Myanon

Myanon is a MySQL dump anonymizer, reading a dump from stdin, and producing an anonymized version to stdout.

Anonymization is done through a deterministic hmac processing based on sha-256. When used on fields acting as foreign keys, constraints are kept.

However, an optional python support can be used to define custom anonymization rules (python faker for example)

Myanon works by default on flat (numeric and text) fields. An optional json module is available for json fields.

A configuration file is used to store the hmac secret and to select which fields need to be anonymized. A self-commented sample is provided (main/myanon-sample.conf)

This tool is in alpha stage. Please report any issue.

Simple use case

Example to create both a real crypted (sensitive) backup and an anonymized (non-sentitive) backup from a single mysqldump command:

mysqldump mydb | tee >(myanon -f myanon.cfg | gzip > mydb_anon.sql.gz) | gpg -e -r [email protected] > mydb.sql.gz.gpg

Installation from sources

Build Requirements

  • autoconf
  • automake
  • make
  • a C compiler (gcc or clang)
  • flex
  • bison
  • python (optional)
  • jq (optional)

Example on a Fedora system:

$ sudo dnf install autoconf automake gcc make flex bison
$ sudo dnf install python3-devel jq-devel # For optional python and json support
[...]

Example on a Debian/Ubuntu system:

$ sudo apt-get install autoconf automake flex bison build-essential
$ sudo apt-get install python3-dev libjq-dev # For optional python and json support
[...]

On macOS, you need to install Xcode and homebrew, and then:

$ brew install autoconf automake flex bison m4
$ brew install python3 jq # For optional python and json support
[...]

(Please ensure binaries installed by brew are in your $PATH)

If you are using zsh, you may need to add the following to your .zshrc file:

export PATH="/usr/local/opt/m4/bin:$PATH"
export PATH="/usr/local/opt/flex/bin:$PATH"
export PATH="/usr/local/opt/bison/bin:$PATH"

For Apple Silicon at build time, you may need to adjust include and library search path:

export CFLAGS=-I/opt/homebrew/include
export LDFLAGS=-L/opt/homebrew/lib

Build/Install

./autogen.sh
./configure                             # Minimal build
./configure --enable-python             # Optional python support
./configure --enable-jq                 # Optional json support
./configure --enable-python --enable-jq # Both
make
make install

Compilation/link flags

Flags are controlled by using CFLAGS/LDFLAGS when invoking make. To create a debug build:

make CFLAGS="-O0 -g"

To create a static executable file on Linux and mimimal build only

make LDFLAGS="-static"

Run/Tests

main/myanon -f tests/test1.conf < tests/test1.sql
zcat tests/test2.sql.gz | main/myanon -f tests/test2.conf

The tests directory contains examples with basic hmac anonymization, and with python rules (faker).

Installation from packages (Ubuntu)

A PPA is available at: https://launchpad.net/~pierrepomes/+archive/ubuntu/myanon

Docker Build / Run

tl;dr:

docker build --tag myanon .
docker run -it --rm -v ${PWD}:/app myanon sh -c '/bin/myanon -f /app/myanon.conf < /app/dump.sql | gzip > /app/dump-anon.sql.gz'

Why Docker?

An alternative to the above build or run options is to use the provided Dockerfile to build inside an isolated environment, and run myanon from a container.

It's useful when:

  • you can't or don't want to install a full C development environment on your host
  • you want to quickly build for or run on a different architecture (e.g.: amd64 or arm64)
  • you want to easily distribute a self-contained myanon (e.g.: for remote execution & processing on a Kubernetes cluster)

The provided multistage build Dockerfile is using the alpine Docker image.

Build using Docker

Build a binary using the provided Dockerfile:

# recommended, to start from a clean state 
make clean
# build using your default architecture
docker build --tag myanon .

For Apple Silicon users who want to build for amd64:

# recommended, to start from a clean state 
make clean
# build using the amd64 architecture
docker build --tag myanon --platform=linux/amd64 .

Run using Docker

In this example we will:

  • use a myanon configuration file (myanon.conf)
  • use a MySQL dump (dump.sql)
  • generate an anonymized dump (dump-anon.sql) based on the configuration and the full dump.

Sharing the local folder as /app on the Docker host:

docker run -it --rm -v ${PWD}:/app myanon sh -c '/bin/myanon -f /app/myanon.conf < /app/dump.sql > /app/dump-anon.sql'

For Apple Silicon users who want to run as amd64:

docker run -it --rm --platform linux/amd64 -v ${PWD}:/app myanon sh -c '/bin/myanon -f /app/myanon.conf < /app/dump.sql > /app/dump-anon.sql' 

myanon's People

Contributors

asgrim avatar jerome-h-dev avatar pierrepomes avatar ppomes avatar sjourdan avatar trilliot avatar yurimb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

myanon's Issues

Config errors

Would be nice to get some feedback if a column specified in the config file didn't exist rather than silently ignore... had misspelled one column so had the potential to leak PII data (which we are trying to avoid).

NULL values are not maintained for rows with both NULL and non-NULL values

When anonymizing a column that contains both NULL and non-NULL values the NULL values are hashed. I would expect only the non-null values to be hashed and the NULL rows to maintain their current NULL value.

I've attached a tar.gz with an example:

  • null-example.conf = the myanon config
  • null-example-mysqldump.sql = the mysqldump of the database I'm using as an example
  • null-example-anonymized.sql = the myanon result after anonymizing null-example-mysqldump.sql

On line 48 of null-example-anonymized.sql you can see that rows 3 & 5 hash the NULL value to 'ahavykafkojauwmdriqpohobuuttmiif'. I would expect those values to remain NULL.
null-example.tar.gz

Issues on JSON fields with backslashes and double quotes

I'm using version 8.0 of MySQL. I found two edge cases when handling JSON fields and both produce invalid MySQL dump files. I can split the issue in 2 if it will be easier to handle.

Backslashes in the JSON field are stripped

Table 11.1 from https://dev.mysql.com/doc/refman/8.4/en/string-literals.html contains all the characters which should have a backslash in front of them.

Initial data

INSERT INTO `anontest` VALUES (1,'{\"name\":\"John Doe\",\"title\": \"It\'s time for \\fun\\!\"}');

Output

INSERT INTO `anontest` VALUES (1,'{\"name\":\"joswz\",\"title\":\"It's time for fun!\"}');

Double quotes in the JSON value produce error

WARNING! Table/field anontest:metadata: Unable to parse json field, skip anonymization
WARNING! Field anontest:metadata - JQ filter '.name' from config file has not been found in dump. Maybe a config file error?

Initial data

INSERT INTO `anontest` VALUES (1,'{\"name\": \"John Doe\", \"title\": \"It is time for \\\"fun\\\"!\"}');

Output
As a result the JSON value is left intact, but it has two single quotes at beginning and the end of the value.

INSERT INTO `anontest` VALUES (1,''{\"name\": \"John Doe\", \"title\": \"It is time for \\\"fun\\\"!\"}'');

Database Table

CREATE TABLE `anontest` (
  `id` int NOT NULL AUTO_INCREMENT,
  `metadata` json DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

myanon.conf

tables = {
   `anontest` = {
      `metadata` = json {
        path 'name' = texthash 5
      }
   }
}

mysqldump
full-dump.sql.zip

Randomise the seed

After running this a few times, on different environments, I've noted that my first "texhhash" username is always the same random value - seems like need to add some randomisation onto the base?

Can't parse "set"

Can't parse set, like this:
CREATE TABLE some_table (
some_field int NOT NULL AUTO_INCREMENT,
some_field timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
some_field timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
some_field int NOT NULL,
some_field int NOT NULL,
some_field decimal(28,8) NOT NULL,
flags set('vat_included') CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
some_field int NOT NULL,
fails with the error:
Dump parsing error at line 8: Unable to read table definition - Unexpected [s]
Could you fix it please?

Adding column names results in syntax error

Using SQL:

CREATE TABLE `test_with_column_names` (
    `a` int(10) unsigned NOT NULL
) ENGINE=InnoDB;
INSERT INTO `test_with_column_names` (`a`) VALUES (1);

And configuration:

# Config file for test1.sql
secret = 'lapin'
stats  = 'no'

tables = {
   `test_with_column_names` = {
     `a` = inthash 2
   }
}

Gives a syntax error, running with myanon -d, the output is:

main/myanon -d -f tests/test1.conf
CREATE TABLE `test_with_column_names` (
    `a` int(10) unsigned NOT NULL
) ENGINE=InnoDB;
INSERT INTO `test_with_column_names` (FOUND TABLE `test_with_column_names`

ENTERING STATE ST_TABLELOOKING FOR  `test_with_column_names`:`a`

ENTERING STATE INITIALFOUND TABLE `test_with_column_names`

ENTERING STATE ST_VALUES
Dump parsing error at line 3: syntax error - Unexpected [(]

Process finished with exit code 1

README - Additional support for MacOS

Request

Add additional support for MacOS users in README

  • Add m4 to brew install as OS version is incompatible.
  • Add support for adding correct $PATH to .zshrc to point to homebrew installs and not OS versions.

Reason

I found it difficult and time consuming getting this to build and test on my mac.

After adding the correct paths to the .zshrc and also using homebrew m4 package I was able to build without issue.

Without these additions I was getting convoluted error messages which did not hint to a resolution.

Documentation

Please update the documentation for Ubuntu to include the following:

sudo apt-get install build-essential

Once the user runs that step on a fresh Ubuntu 20.04 server the ./configure process runs without error.

Fixed text truncated

we have a rule
``key = fixed '0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF'
64 chars, going into a varchar(64) column, but the last character is being truncated, which then effects all our other data. If we try and extend this, we get an error as > 64 characters. Really love a fix :) or push of new code if you have any

Does fixed even replace if null?

Got some areas where I have to use fixed to set the value, but looks like if this was null it still replaces it with fixed (I believe?) text hash only seems to replace when it's not null, so maybe need a fixed leave null flag / option? fixed "1234567" true (not default, don't replace null)

Support for complex fields

I have field data in JSON Arrays and simple coma separated list in string, right now whole field is anonymized which "destroys" the array format of it.

Using mysqldump with `--hex-blob` flag breaks myanon

Given the following SQL:

DROP TABLE IF EXISTS `the_blobs`;
CREATE TABLE `the_blobs` (
  `blob1` blob,
  `blob2` tinyblob,
  `blob3` mediumblob,
  `blog4` longblob
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO `the_blobs` VALUES (
  0x68656c6c6f,
  0x68656c6c6f,
  0x68656c6c6f,
  0x68656c6c6f
);

And the following myanon configuration:

# Config file for test1.sql
secret = 'lapin'
stats  = 'no'

tables = {
   `the_blobs` = {
     `blob1`     = fixed '0x0000000000'
   }
}

When I run build/main/myanon -f tests/test1.conf < tests/test1.sql, I should expect to see:

DROP TABLE IF EXISTS `the_blobs`;
CREATE TABLE `the_blobs` (
  `blob1` blob,
  `blob2` tinyblob,
  `blob3` mediumblob,
  `blog4` longblob
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO `the_blobs` VALUES (
  0x0000000000,
  0x68656c6c6f,
  0x68656c6c6f,
  0x68656c6c6f
);

But I actually see:

DROP TABLE IF EXISTS `the_blobs`;
CREATE TABLE `the_blobs` (
  `blob1` blob,
  `blob2` tinyblob,
  `blob3` mediumblob,
  `blog4` longblob
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO `the_blobs` VALUES (
  0x0000000000x68656c6c6f,
  0x68656c6c6f,
  0x68656c6c6f,
  0x68656c6c6f
);

Dump parsing error at line 7: syntax error - Unexpected []

Support for data-only dumps (--no-create-info)

It would be a really nice feature for myself if it could parse a data-only dump (mysqldump --no-create-info). I fiddled with it for a while and couldn't get it to work.

An easy workaround for now is to process the entire dump, and then just remove the create statements from the resulting script. This is a great little tool, thank you!

Generate fields with data from other fields

Given a field like id and a field like username, I'd like to keep id the same but set username to user<id>.

I imagine something like:

tables = {
   `people` = {
     `id`   = texthash 10
     `username` = sql CONCAT('user', id);
   }
}

Consider this a feature request. Thanks :)

Blob values are not quoted

Given the following config:

# Config file for test1.sql
secret = 'lapin'
stats  = 'no'

tables = {
   `lottypes` = {
     `int1`      = inthash 2
     `int2`      = fixed '9'
     `datetime1` = fixed '1970-01-01 12:00:00'
     `text1`     = texthash 5
     `text2`     = fixed null
     `blob1`     = fixed 'hello'
     `blob2`     = texthash 5
#      `blob3`     = fixed '\'hi\''
   }
}

When I run

build/main/myanon -f tests/test1.conf < tests/test1.sql

I expect to see see

INSERT INTO `lottypes` VALUES (... ,'hello','migez', ...);

But I actually see

INSERT INTO `lottypes` VALUES (... ,hello,migez, ...);

I tried quoting/escaping (by uncommenting the config line for blob3) , but I received the error:

Config parsing error at line 14: Syntax error - Unexpected [h]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.