GithubHelp home page GithubHelp logo

Comments (8)

terrytangyuan avatar terrytangyuan commented on July 21, 2024

+1 to this idea and incorporating into this project as we discussed earlier today.

From maturity level, it seems like Presidio is a good option. It makes more sense to reuse the existing solution since a lot of domain knowledge and often business-specific.

Regarding anonymization, Presidio seems to also support some level of customizable anonymization. Do we want to leverage that? It looks like there isn't much popularity for Trumania. Perhaps we can just use Faker and build something suitable for our project?

from data-describe.

brianray avatar brianray commented on July 21, 2024

GCP DLP does this but is cloud native. I like the API https://cloud.google.com/dlp/docs/apis. One thing I thought of was some sort of abstract data scheme maybe also with an architecture behind (like Apache Arrow) that enforced end-to-end handling of PII/PHI/Financial data. Can we:

from data-describe.

terrytangyuan avatar terrytangyuan commented on July 21, 2024

GCP DLP does this but is cloud native. I like the API https://cloud.google.com/dlp/docs/apis. One thing I thought of was some sort of abstract data scheme maybe also with an architecture behind (like Apache Arrow) that enforced end-to-end handling of PII/PHI/Financial data.

Could you elaborate a bit on this? What do you mean by "abstract data scheme"?

Can we:

* [ ]   Detect where unknown

* [ ]  Hash and encrypt where possible

* [ ]  keep the general Info Types in tact (ref https://cloud.google.com/dlp/docs/infotypes-reference)

These sound great to me.

from data-describe.

brianray avatar brianray commented on July 21, 2024

"abstract data scheme", maybe something like what's found in PEP438 where we define a typed schema, ie:

from typing import NewType, TypedDict

EMAIL_ADDRESS = NewType('EMAIL_ADDRESS', string)

class Person(TypedDict):
    first: int
    last: int
    email: EMAIL_ADDRESS

And then when writing to the Person dict, we are real careful to secure the data by whatever means is decided acceptable. DLP has a number of ways to do this.

from data-describe.

terrytangyuan avatar terrytangyuan commented on July 21, 2024

I see. That is a good idea.

from data-describe.

terrytangyuan avatar terrytangyuan commented on July 21, 2024

I just created a new issue #34 for adding a design doc for this and assigned to you @truongc2. Please follow the template I put together in #33. It's easier to review designs on pull requests since we can comment in-line.

from data-describe.

truongc2 avatar truongc2 commented on July 21, 2024

Did you want to merge the template into master? Or was it intended for something else?

from data-describe.

terrytangyuan avatar terrytangyuan commented on July 21, 2024

Yes, but needs at least one approval before merging. I just requested reviews.

from data-describe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.