GithubHelp home page GithubHelp logo

isabella232 / faldisco Goto Github PK

View Code? Open in Web Editor NEW

This project forked from facebookarchive/faldisco

0.0 0.0 0.0 22 KB

FalDisco uses a new approach to classify data that has traditionally been difficult to classify (small integers, dates and numbers) by aligning data to be classified (target data) to the known reference data on alignment key and finding functional alignment between reference fields and target fields .

License: MIT License

Python 100.00%

faldisco's Introduction

A command line tool to find sensitive data by aligning values in columns

FalDisco - Functional ALignment DISCOvery helps us find sensitive data by leverages the fact that we have examples of sensitive data and can use those examples to detect additional sensitive data by aligning rows based on known fields(say userid) and then aligning fields using the techniques described below.

Requirements

FalDisco requires Python 3.9, and has been tested on macOS and Linux. Due to its use of SQLAlchemy, it should work with multiple databases, though it has been tested only with MySQL. The only other dependency is Pandas.

Building and running FalDisco

Install and configure MySQL.

To build from the command line:

$ pip install pandas
$ pip install sqlalchemy
$ pip install mysqlclient
$ pip install pymysql

To run

$ python3.9 faldisco.py mysql.db mysql.db Db

Usage:
python faldisco.py <ref db.ref table> <target db.target table> <ref_join_keys> [target_join_keys]

The aligned column details will be added to the file(s) out/{ref_table}to{target_table}*

How FalDisco works

Functional Alignment relies on three simple components:

  • A reference table that contains the data that we are looking for
  • Row alignment - a set of keys that help us align the rows in the reference table to the rows in the target table that we are trying to run discovery on (i.e., the join key between the two tables).
  • Field alignment - once we align the rows, we try to align the fields through functional alignment that finds any target_field = f(reference field) where f is a deterministic function, so the same value of a reference field always produces the same value of the target field. In the example above, function is an exact match, but it can actually be any deterministic function.

For SQL native speakers, you can think of it as
SELECT COUNT(*) AS alignment_count FROM reference_table r JOIN target_table t ON r.alignment_key = t.alignment_key WHERE t.target_field = deterministic_function(r.reference_field)

Contributing

See the CONTRIBUTING file for how to help out.

License

FalDisco is MIT licensed, as found in the LICENSE file.

faldisco's People

Contributors

sadatta avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.